ATI GPU2 Optimization Guide

Come join the... uh... er... fold.

Moderators: just brew it!, farmpuma

ATI GPU2 Optimization Guide

Postposted on Thu Apr 09, 2009 8:00 am

As far as I can tell, the ATI GPU2 client has not received much attention among TR folders, mainly because of lower PPD of AMD graphics cards, and partly because of the fact that the client is not as mature as the Nvidia GPU2 clients.

But the most recent cores now allow for a series of optimizations that yield slight PPD increases (10-20%), and drop CPU usage to around 0-5% on XP, Vista, and 7, like the Nvidia clients on XP. There are three or four (If you count an experimental one) optimizations that can be used. These optimizations are system environmental variables that need to be added through the Control panel.

Procedure to add Environmental Variables:

Go to start, control panel, System, Advanced System Settings, a new window called System Properties should pop up, go to the Advanced tab, and click Environmental Variables..., under the Systems variables section, click new, and then set the name (Names are case sensitive now "" needed, include the underscores), and values for the variables. Click Apply and it should be set. The GPU2 Client needs to be restarted in order for it to apply, XP may require a system restart.

The first variable is "FLUSH_INTERVAL" minimum requirements are Core version 1.22 and Catalyst 8.12 (At least 8.12 is the oldest it seems stable on). FLUSH_INTERVAL requires a value between 1 and 1024. For Core 1.22 and 1.23 and Catalyst 8.12-9.2, FL setting vary a lot depending on the graphics card used. For instance, something like a HD 4550 shouldn't need more than 64, a HD 4670 may need 128 or higher, while an HD 4870 could take 512. What FL does is reduce CPU usage, (Changes the size of packets sent to the CPU), CPU usage for these cores should gradual decrease to something around 15-25% with higher FL's. In order to set the FL start with a low value (Again lower than the values suggested something around 32 is a good start) and increase the FL in increments. With your original low FL you should have less than 100% GPU usage and only slightly lower CPU usage, increase until GPU usage is back up to 100% or so and CPU is down a fair bit.

IF you ever get a VPU recover or work unit errors, decrease the FL to the last stable value.

On the newest Core, version 1.24 and with Catalyst 9.3, FL behavior has changed, generally if you are updating from and older Catalyst start at around 32 again and work your way back up as described, in general, the new optimal FL should be less than before (Values over 200 usually cause problems).

The other two environmental variables are "CAL_NO_FLUSH" and "BROOK_YIELD". These are simple, set "CAL_NO_FLUSH" to 1 and "BROOK_YIELD" to 2. These can only run stable on Catalyst 9.3 and version 1.24!

With these two extra variables and a optimal FL, you should be able to push CPU usage down to 0-5%.

Enjoy your lower CPU usage, if your running on a dual-core computer, add the SMP client, it should run stable, or add a second CPU console client, and enjoy your extra PPD!
HurgyMcGurgyGurg
Gerbil First Class
 
Posts: 156
Joined: Mon Apr 21, 2008 4:47 pm

Re: ATI GPU2 Optimization Guide

Postposted on Thu Apr 09, 2009 9:17 am

HurgyMcGurgyGurg wrote:As far as I can tell, the ATI GPU2 client has not received much attention among TR folders, mainly because of lower PPD of AMD graphics cards, and partly because of the fact that the client is not as mature as the Nvidia GPU2 clients.



Good info in the rest of the article. But the ATI client came out a couple of years before the nvidia client did.
Sony a7
Sony Zeiss 55/1.8 SSM, 24-70/4 SSM
Minolta 17-35/2.8-4 D, 100-300 APO
TheEmrys
Minister of Gerbil Affairs
Silver subscriber
 
 
Posts: 2144
Joined: Wed May 29, 2002 8:22 pm
Location: Northern Colorado

Re: ATI GPU2 Optimization Guide

Postposted on Thu Apr 09, 2009 9:50 am

Maturity isn't necessarily measured by the release date.
Q9550 @ stock | 8GB DDR2 1066 5-5-5-15 | Gainward GTX 460 GS GLH | Asus P5Q SE/R | Scythe Mugen | OCZ StealthXstream 600w | Antec 300 | Zalman ZM-MFC1+ | 1TB WD Caivar Black | Audigy SE 7.1 | Logitech X-540 | HG281D 28" | Razer Diamondback| Win7x64
silent ninjah
Gerbil First Class
 
Posts: 177
Joined: Sun Oct 16, 2005 9:10 am
Location: Scotland

Re: ATI GPU2 Optimization Guide

Postposted on Thu Apr 09, 2009 10:13 am

TheEmrys wrote:
HurgyMcGurgyGurg wrote:As far as I can tell, the ATI GPU2 client has not received much attention among TR folders, mainly because of lower PPD of AMD graphics cards, and partly because of the fact that the client is not as mature as the Nvidia GPU2 clients.



Good info in the rest of the article. But the ATI client came out a couple of years before the nvidia client did.

silent ninjah wrote:Maturity isn't necessarily measured by the release date.

It is the sad state of affairs at Stanford really. GPU1 client was released exclusively for AMD GPUs. But then GPU2 came along and AMD basically got the shaft. :evil: They are now slowly revving it back up to "first class" support.

Once OpenCL is finalized I hope those Stanford guys can just use it and we won't have this mess anymore. Plus we can have a single code base to compare the GPU architectures (somewhat) fairly. Let the drag race begin!
Image
The Model M is not for the faint of heart. You either like them or hate them.

Gerbils unite! Fold for UnitedGerbilNation, team 2630.
Flying Fox
Gerbil God
 
Posts: 24297
Joined: Mon May 24, 2004 2:19 am

Re: ATI GPU2 Optimization Guide

Postposted on Thu Apr 09, 2009 10:50 am

TheEmrys wrote:
HurgyMcGurgyGurg wrote:As far as I can tell, the ATI GPU2 client has not received much attention among TR folders, mainly because of lower PPD of AMD graphics cards, and partly because of the fact that the client is not as mature as the Nvidia GPU2 clients.



Good info in the rest of the article. But the ATI client came out a couple of years before the nvidia client did.


I think it's the fact that the GPU2 client uses CUDA on NVIDIA cards. This is why it runs so much faster.

I'm pretty sure the ATI client runs Brook+ which ATI has all but abandoned. I have not heard whether or not it will be ported to OpenCL. Seems likely though.
Image
"Give me a scotch. I'm starving" ~ Tony Stark
PRIME1
Darth Gerbil
 
Posts: 7561
Joined: Mon Apr 22, 2002 5:07 pm
Location: , location

Re: ATI GPU2 Optimization Guide

Postposted on Thu Apr 09, 2009 11:18 am

PRIME1 wrote:
I think it's the fact that the GPU2 client uses CUDA on NVIDIA cards. This is why it runs so much faster.

I'm pretty sure the ATI client runs Brook+ which ATI has all but abandoned. I have not heard whether or not it will be ported to OpenCL. Seems likely though.


This is pretty much it, hopefully they'll use OpenCL exclusively, then they'll be able to combine teams and work on a single GPU client.
X __________________________
khands
Graphmaster Gerbil
Silver subscriber
 
 
Posts: 1219
Joined: Mon Dec 01, 2008 12:32 pm
Location: Chicagoland

Re: ATI GPU2 Optimization Guide

Postposted on Thu Apr 09, 2009 11:48 am

TheEmrys wrote:Good info in the rest of the article. But the ATI client came out a couple of years before the nvidia client did.


I'm referring to the GPU2 client, which both versions released at the same time, not the GPU1 client. I don't mean to suggest the ATI client is bad in any way, just that the Nvidia GPU2 client is favored because of its PPD advantage.

PRIME1 wrote:
I think it's the fact that the GPU2 client uses CUDA on NVIDIA cards. This is why it runs so much faster.

I'm pretty sure the ATI client runs Brook+ which ATI has all but abandoned. I have not heard whether or not it will be ported to OpenCL. Seems likely though.


Yes, and no, its true that it is faster because it has received far more support and development on the Nvidia side, as Nvidia wanted to show the benefits CUDA and made sure the client was as good as possible.

However, the biggest current issue with PPD is that the Nvidia graphics cards have around 240 cores instead of 800 cores. 240 "stronger" cores are more suitable for simulating a 300 atom molecule than 800 "weaker" cores, and this is a simplification of the deeper issues.
HurgyMcGurgyGurg
Gerbil First Class
 
Posts: 156
Joined: Mon Apr 21, 2008 4:47 pm

Re: ATI GPU2 Optimization Guide

Postposted on Thu Apr 09, 2009 12:07 pm

l33t-g4m3r
Gerbil Jedi
Silver subscriber
 
 
Posts: 1972
Joined: Mon Dec 29, 2003 2:54 am

Re: ATI GPU2 Optimization Guide

Postposted on Thu Apr 09, 2009 12:45 pm

l33t-g4m3r wrote:I believe this article is relevant:
http://theovalich.wordpress.com/2008/11 ... e-reveale/


Ah yes, that article. I and others have asked mhouston (AMD's driver representative on the folding forums) about it, and while he does not confirm or contradict it, he made sure to add that most of the article was pure speculation. Others also reiterated that the supposed Q'1 fix is just his guess nothing official at all, were at the end of Q'1 now and all of the current indications on the folding forums show no sign of huge improvements just around the corner.

When asked about performance improvements the folding development team has always said they will first pursue the optimizations applicable to all hardware (The optimizations I just posted are some of the fruits of this labor) before going into any specific optimizations for the HD 4000 series (Which is what the article is talking about).

The article is already dated though, and its key point now seems less important than it was made out to be.

"And here lies the problem with current GPU client - ATI X1K hardware comes off with one big flaw - lack of local memory share between the shader units. ...... You now might be wondering what will happen if you don’t put that “scratch cache” in the GPU. What happens is that your CPU will be constantly polled, and this drags the performance down to the gutter."

What he is mentioning is a CPU bottleneck hindering performance, this bottleneck, while true, has greatly been reduced by recent cores, (Again the optimizations explained here help with this) without having to resort to using HD 4000 series optimizations. Furthermore, when this article was released the HD 4870 could only get at most 2.5-3k PPD, now some work units can earn 5k PPD, without having to resort to any of the solutions mentioned.

The article misses the key other component, which is the Nvidia hardware is more suited to current work units than AMD's.
HurgyMcGurgyGurg
Gerbil First Class
 
Posts: 156
Joined: Mon Apr 21, 2008 4:47 pm

Re: ATI GPU2 Optimization Guide

Postposted on Thu Apr 09, 2009 2:51 pm

khands wrote:
This is pretty much it, hopefully they'll use OpenCL exclusively, then they'll be able to combine teams and work on a single GPU client.

I doubt it. NVIDIA will continue to use CUDA. Since they provide development help to Stanford the GPU client will continue to use CUDA.

As for the ATI/Larabee side, it would make sense to move to OpenCL. However I doubt we will see such a client this year.

Of course Intel could just do what NVIDIA did and help Stanford with a Larabee client that will run faster.

OpenCL will most likely be slower than CUDA, Brook+ or Larbabee C++ because of its general nature.

Had OpenCL been around several years ago, Stanford could have opted to just write one client without much help from either company and gone on from there. However OpenCL has yet to see any practical application yet.
Image
"Give me a scotch. I'm starving" ~ Tony Stark
PRIME1
Darth Gerbil
 
Posts: 7561
Joined: Mon Apr 22, 2002 5:07 pm
Location: , location

Re: ATI GPU2 Optimization Guide

Postposted on Thu Apr 09, 2009 3:44 pm

PRIME1 wrote:As for the ATI/Larabee side, it would make sense to move to OpenCL. However I doubt we will see such a client this year.

Of course Intel could just do what NVIDIA did and help Stanford with a Larabee client that will run faster.
Intel may need to give Stanford the "direct access" libraries (there are 3 code paths for Larabee mentioned up to this point: DirectX, OpenGL, and a "native C++ x86 style" library). But yes, a re-written OpenCL client may not be here this year. It is just too new.

PRIME1 wrote:OpenCL will most likely be slower than CUDA, Brook+ or Larbabee C++ because of its general nature.
CUDA is just an API. At the lowest level software-wise it is the drivers. Instead of doing OpenCL -> CUDA -> drivers -> GPU they could have done an OpenCL -> drivers -> GPU. Of course, if they are playing API favourites that's another story (cue Microsoft).

PRIME1 wrote:Had OpenCL been around several years ago, Stanford could have opted to just write one client without much help from either company and gone on from there. However OpenCL has yet to see any practical application yet.
Excluding Apple's initial conception and preliminary development phase, OpenCL as a standard was started in 2008. Are you confusing this with OpenGL? The final spec for 1.0 is not out yet AFAIK, so this is going to take a while, as in other work in standards. Snow Leopard is coming so that will be the first major practical application.

From an observer point of view, Stanford went to AMD first, then Nvidia started trashing the then-ATI so they did a 180 and catered to Nvidia GPU (of course, Nvidia providing nicer development support is just icing on the cake). For whatever reason, that's how GPU2 was born (they could have continued on the GPU1 codebase but I suppose it is mostly written to call Brook+ which becomes a problem when needing to port to CUDA). It is no different from the old QMD (remember those?) biased towards Intel CPUs. I can't fault them completely because getting their clients to run on the most powerful hardware in the larger numbers is their ultimate goal, not ideological portability.
Image
The Model M is not for the faint of heart. You either like them or hate them.

Gerbils unite! Fold for UnitedGerbilNation, team 2630.
Flying Fox
Gerbil God
 
Posts: 24297
Joined: Mon May 24, 2004 2:19 am

a quick question about flush interval

Postposted on Sat Apr 18, 2009 10:30 am

I have two Radeon 4850's and I was wondering about the flush interval set in the variables. Do I have two double what a typical user would use because I have 2 cards. The reason I ask is because one of my cards seems to stop work for a long period of time until I reset it, but the other card keeps plugging away? This only started happening when I started playing around with the FLUSH_INTERVAL. Thanks.
Last edited by farmpuma on Tue Apr 28, 2009 2:03 am, edited 2 times in total.
Reason: Merged this and the four following posts to consolidate similar information.
To Start Press Any Key'. Where's the ANY key?
If something's hard to do, then it's not worth doing
You know, boys, a nuclear reactor is a lot like a woman. You just have to read the manual and press the right buttons.
mmmmmdonuts21
Gerbil Elite
 
Posts: 590
Joined: Wed Jul 16, 2008 9:09 am

Re: a quick question about flush interval

Postposted on Sat Apr 18, 2009 11:14 am

Image
The Model M is not for the faint of heart. You either like them or hate them.

Gerbils unite! Fold for UnitedGerbilNation, team 2630.
Flying Fox
Gerbil God
 
Posts: 24297
Joined: Mon May 24, 2004 2:19 am

Re: a quick question about flush interval

Postposted on Sat Apr 18, 2009 12:32 pm

FLUSH_INTERVAL, changes the way the client communicates with the CPU, because of this FI isn't effected by the amount of GPUs, as each will still use the same FI values. Yes, FI has to change for different types of GPUs, but the amount of GPUs should not not effect it. Flying Fox has been nice enough to link to my guide, so it should help you zero in FI values.

FI usually only changes the PPD of the client and CPU usage, it probably isn't causing a card to stop. Are you using other clients such as CPU? Make sure the GPU clients get a higher priority (Set low in configuration keep the rest on idle) it could be one of the clients is simply getting its CPU time stolen from it and it stops. Also make sure you have the latest client version you could have one running an older client and it might just be there aren't any WU's for it at the time. What is the text log saying, does it just take 3 hours to process 1% all of a sudden, is its CPU (Or GPU) usage zero when this happens, is it 100% as if the client crashed? Does it just not work on a new unit?

By any means, I'm no expert, make an account at the official Folding forums and visit the ATI Specific Issues forum, you'll get the real experts there. :D

By the way Flying Fox, do you think my thread should be stickied?
HurgyMcGurgyGurg
Gerbil First Class
 
Posts: 156
Joined: Mon Apr 21, 2008 4:47 pm

Re: a quick question about flush interval

Postposted on Sat Apr 18, 2009 1:06 pm

HurgyMcGurgyGurg wrote:By the way Flying Fox, do you think my thread should be stickied?

It is not going to be my decision. I am not a mod in this forum. However as a fellow gerbil I would think more people need to report in about the usefulness of the information before it is worthy to be stickied. So you better hope we have more AMD GPU folders. ;)
Image
The Model M is not for the faint of heart. You either like them or hate them.

Gerbils unite! Fold for UnitedGerbilNation, team 2630.
Flying Fox
Gerbil God
 
Posts: 24297
Joined: Mon May 24, 2004 2:19 am

Re: a quick question about flush interval

Postposted on Sat Apr 18, 2009 11:16 pm

Good deal, I'm sure there are plenty of people with HD 4870 and HD 4850's from TR system build recommendations. Maybe we should start a campaign to make sure we get a post about folding in on every System Builders Anonymous thread?
HurgyMcGurgyGurg
Gerbil First Class
 
Posts: 156
Joined: Mon Apr 21, 2008 4:47 pm

Re: ATI GPU2 Optimization Guide

Postposted on Wed Apr 29, 2009 9:41 pm

I expected more replies in this thread by now.

The folding forum HurgyMcGurgyGurg mentioned above is here, for those who haven't looked at it before: http://foldingforum.org/.

It does sound like the problem may be starving one client of CPU time. And I agree that a copy of the relevant portion of the FAHlog file might help discover the problem.
Ragnar Dan
Gerbil Elder
Silver subscriber
 
 
Posts: 5354
Joined: Sun Jan 20, 2002 7:00 pm

Re: ATI GPU2 Optimization Guide

Postposted on Thu Apr 30, 2009 12:10 am

Ragnar Dan wrote:I expected more replies in this thread by now.


Its probably the fact that most users who do optimizations of this sort are the ones who spend money on hardware specifically for folding. But almost everyone who spends money on GPUs for folding buys nvidia.

Also those who do run ATI GPU2 clients are a very small minority on TR and even smaller minority among those who actively watch this forum.
HurgyMcGurgyGurg
Gerbil First Class
 
Posts: 156
Joined: Mon Apr 21, 2008 4:47 pm

Re: ATI GPU2 Optimization Guide

Postposted on Thu Apr 30, 2009 11:42 pm

HurgyMcGurgyGurg wrote:
Ragnar Dan wrote:I expected more replies in this thread by now.

Its probably the fact that most users who do optimizations of this sort are the ones who spend money on hardware specifically for folding. But almost everyone who spends money on GPUs for folding buys nvidia.

Also those who do run ATI GPU2 clients are a very small minority on TR and even smaller minority among those who actively watch this forum.

I agree with you for the most part, but my expectation had been that mmmmmdonuts21 would have replied with new information and further questions before now. And hopefully some TR regulars would have seen the thread on the Hot Forum Threads list, been intrigued and read it, and perhaps become interested in folding on their video cards for Team 2630.

I'm sure there is a fairly large contingent of 4000-series ATi owners on TR, yet we're seeing low participation levels. The great thing about the GPU client is that one can run it for a couple of hours a day and produce impressive output, and then stop it from running and start it up the next time you're browsing or reading email or suchlike. And it would help the team immensely while not costing much in electricity and making no noticeable difference in the performance of most desktop applications.

I think that sort of sales pitch has helped other teams.
Ragnar Dan
Gerbil Elder
Silver subscriber
 
 
Posts: 5354
Joined: Sun Jan 20, 2002 7:00 pm

Re: ATI GPU2 Optimization Guide

Postposted on Fri May 01, 2009 6:10 am

Well after some testing I found for my two 4850's the optimal FLUSH_INTERVAL is 256. I use only about 5% of my dual CPU cores and 99% of my GPUs. I also tried running a SMP client with the two GPU clients. This led me to have very unstable GPU folding so I abandoned running the SMP client.

In addition I have been having some problems with the GPU client once it gets to 100% completion. I usually have to manually reset the client once it gets to this point and that's why my point production has been way down lately. Once I reset it the client works fine. Anyone have any suggestions on how to fix that or how to make the SMP client stable with my GPU's?
To Start Press Any Key'. Where's the ANY key?
If something's hard to do, then it's not worth doing
You know, boys, a nuclear reactor is a lot like a woman. You just have to read the manual and press the right buttons.
mmmmmdonuts21
Gerbil Elite
 
Posts: 590
Joined: Wed Jul 16, 2008 9:09 am


Return to TR Distributed Computing Effort

Who is online

Users browsing this forum: No registered users and 3 guests