Scott's article showing improvements by affinitising threads properly on bulldozer got me thinking - could this optimisation be done automatically by a service that finds the busy threads in the busiest system process and affinitises them to each bulldozer module.
I've knocked up a prototype tool that does this as an experiment, but I've not got an AMD FX CPU to test with, so I'm after a volunteer or two to help test the tool. Ideally I'd like people who are quite technical - and a programming background would be a bonus - a .NET background even more so.
Here's roughly how the tool works, you start it running and it:
• It uses the debugging APIs in System.Diagnostics in .NET (needs to be run elevated)
• It scans the processes running on the system (ignoring session 0, which is the system session where services and critical processes run)
• For each process it calculates its CPU usage, when a process CPU usage goes over a threshold ( defined in the .config file) it will inspect the threads that process has and order them by CPU usage, the busiest 4 threads will be assigned (using Thread Affinity) to one per module (by staggering the CPU bitmask so it ends up being CPU 0,2,4,6).
• When a process no longer exceeds the CPU usage threshold it will have its thread affinity reset.
Those of you who are technical enough will understand that it won't help single threaded applications, and it won't help heavily threaded applications. It also has no way to distinguish between an FPU-heavy thread and integer-heavy thread (although I do wonder if this could be achieved using some of the lightweight profiling APIS in Bulldozer). There may also be other harder to predict problems with assymetric thread execution speeds etc.... hence this approach needs some testing.
First install .NET 4.0 Client Profile http://www.microsoft.com/download/en/de ... x?id=17113
The tool can be downloaded from here http://22.214.171.124/bulldozerHelper.zip
Then run it (it may require elevation, I don't know - I don't use UAC)
if you terminate the tool by any method other than pressing Enter in the console window, it will not remove any thread affinity it’s applied.
There are some options in the .config file you can play with in terms of checking interval, threshold and CPU assignment pattern. You won’t seeing anything in the console window until a process passes the threshold defined the .config or you have the ShowDetail config option set to true. You can confirm it’s working by checking Task manager.
if you set the thread affinity pattern in the config to ‘3’ (binary 11) it should set the affinity of each thread to either core of the same module, eg. Pattern 0 or 1, 2 or 3, 4 or 5, 6 or 7, which might enable it to be scheduled more often and may help performance a bit more too.
What I'd like is for people to run some moderately threaded workloads to see if it helps at all - games and apps that can scale to 4 CPU cores for example.
Edit: The tool needs to be running in the background during the testing so it can poll/detect/optimise and reset the affinity etc.