A closer look at Folding@home on the GPU

Points and power consumption explored
— 12:33 AM on October 16, 2006

MUCH HAS BEEN MADE of ATI's recently announced stream computing initiative, which aims to exploit graphics hardware for more general purpose computing tasks. On paper, stream computing has incredible potential. ATI claims the 48 pixel shaders in its top-of-the-line Radeon offer roughly 375 gigaflops of processing power with 64 GB/s of memory bandwidth. But gobs of horsepower isn't the only thing that matters—you need to be able to apply that power to the road if you want to go anywhere.

Stanford's Folding@home project is already putting the Radeon's pixel processing horsepower to use with a beta GPU client that performs protein folding calculations on the graphics processor. According to Stanford, the GPU client runs between 20 and 40 times faster on newer Radeons than it does on a CPU, a claim that no doubt sends folding enthusiasts' hearts aflutter.

Such an increase in folding performance is certainly tantalizing. We decided to give the GPU client a spin to see what we could find out about it.

The GPU client's optional eye candy is a nice perk

For our little experiment, we ran Folding@home's CPU and GPU clients for several days on the same system. We used the latest beta command line clients in both instances, which worked out to version 5.04 for the CPU and 5.05 for the GPU. Our test system consisted of an Abit Fatal1ty AN8 32X motherboard with a dual-core Opteron 180 2.4GHz processor, 2GB of DDR400 Corsair XMS PRO memory, and a Radeon X1900 XTX graphics card with ATI's Catalyst 6.5 drivers (the only revision that Stanford has extensively tested with the GPU client).

With our test system configured, we pitted the CPU and GPU clients against each other in a virtual race. A single CPU client was set to run on CPU 0, leaving the GPU client with the remainder of the system's resources. According to Stanford's GPU folding FAQ, at least 25% of a system's CPU resources should be made available to the GPU client. With our test system crunching the CPU client on only one core, the second core, or 50% of the system's CPU resources, was available to the GPU client. Unfortunately, we were only able to run the system with a single Radeon X1900 XTX graphics card, since Stanford's GPU folding client doesn't currently support CrossFire configurations.

Five days after releasing our test system into the wild, we checked on each client's scores, with surprising results. Conveniently, both clients had recently completed work units when we tallied the totals.

Total points 24-hour average Work units
GPU client26403778
CPU client8991286
CPU client x 2*179825612

Over five days, our Radeon X1900 XTX crunched eight work units for a total or 2,640 points. During the same period, our single Opteron 180 core chewed its way through six smaller work units for a score of 899—just about one third the point production of the Radeon. However, had we been running the CPU client on both of our system's cores, the point output should have been closer to 1800, putting the Radeon ahead by less than 50%.

Either way, that's a far cry from a 20 to 40-fold increase. That's not entirely surprising, though. Stanford's own GPU folding FAQ explains that the points awarded for GPU client cores aren't exactly comparable to those running on the CPU client:

We will continue to award points in the same method as weÂ’ve always used in Folding@Home. To award points for a WU, the WU is run on a benchmark machine. The points are currently awarded as 110 points/day as timed on the benchmark machine. We will continue with this method of calibrating points by adding an ATI X1900XT GPU to the new benchmark machine (otherwise, without a GPU, we could not benchmark GPU WU's on the benchmark machine!). Since Core_10 GPU WU's cannot be processed on the CPU alone, we must assign a new set of points for GPU WUs, and we are setting that to 440 points/day to reflect the added resources that GPU donors give to FAH. In cases where we need to use CPU time in addition to the GPU (as in the current GPU port), we will give extra points to compensate donors for the additional resources used. Right now, GPU WU's are set to 660 points/day. As we go through the beta process, we will examine the issue of points for WUs, as we understand the significance of this in compensating donor contributions.
So point totals don't necessarily reflect the relative folding power of the Radeon X1900 XTX. CPU and GPU clients draw from different pools of work units, and points are based on the performance of a benchmark system rather than how many calculations have actually been completed. The GPU client may be doing between 20 and 40 times more work, but the points Stanford awards don't reflect that. It will take more than a farm full of Radeons to dominate the Folding@home leaderboard.

Not content to stop our investigation at point totals, we fired up our watt meter to see just how much juice our test system consumed with several folding configurations. The system's power consumption was measured at idle with and without Cool'n'Quiet clock throttling, and then with various Folding@home client combinations.

System power consumption
Idle with Cool'n'Quiet98.6W
Idle without Cool'n'Quiet113.0W
CPU client x1160.2W
CPU client x2185.6W
GPU client195.6W
CPU client + GPU client228W

Clearly, the GPU client is much more power-hungry running on a Radeon X1900 XTX than the CPU client is with an Opteron 180. Even when compared with two cores running at full steam, the GPU client still pulls an extra 10W at the wall socket. Still, given our point totals, the GPU client appears to be the more power-efficient of the two.

Using an extrapolated point total for two CPU clients running in parallel, which is pretty realistic given how Folding@home burdens the CPU, we'd expect to generate around 1,798 points while pulling 185.6W, which is good for close to 9.7 points per watt. The GPU client, on the other hand, generated 2,640 points and pulling 195.6W, yielding close to 13.5 points per watt.

Interestingly, with our test system running one CPU and one GPU client, we generated a total of 3,539 points pulling 228W, or 15.5 points per watt.

Unfortunately, the scoring scheme for Stanford's GPU folding client doesn't reflect the apparent processing power advantage of graphics processors like the Radeon X1900 XTX. The use of a benchmark system is consistent with how points are awarded with the CPU client. Still, if a GPU really is doing vastly more folding work than a CPU, perhaps the points system should weight GPU time more heavily. 

Corsair's Hydro GFX GeForce GTX 1080 Ti graphics card reviewedNo assembly required 28
The Tech Report System Guide: May 2017 editionRyzen 5 takes the stage 111
Aorus' GeForce GTX 1080 Ti Xtreme Edition 11G graphics card reviewedThe eagle has landed 36
AMD's Radeon RX 580 and Radeon RX 570 graphics cards reviewedIteration marches on 162
Nvidia's GeForce GTX 1080 Ti graphics card reviewedI like big chips and I cannot lie 191
Where minimum-FPS figures mislead, frame-time analysis shinesA new way to go Inside the Second 250
Aorus' GeForce GTX 1080 Xtreme Edition 8G graphics card reviewedFlying high 29
The curtain comes up on AMD's Vega architectureRadeons get ready for the workloads of the future 156

Tags: Graphics