Without further ado, here's what happened when we ran our test apps with the default Windows 7 scheduler threading—i.e., with no awareness of modules or sharing—and with our two different affinity masks.
These results couldn't be much more definitive. In every case but one, distributing the threads one per module, and thus avoiding sharing, produces roughly 10-20% higher performance than packing the threads together on two modules. (And that one case, the FDom function in picCOLOR, shows little difference between the three affinity options.) At least for this handful of workloads, the benefits of avoiding resource sharing between two cores on a module are pretty tangible. Even though the packed config enables a higher Turbo Core frequency of 4.2GHz, the shared config is faster.
Our test apps, obviously, are not your typical desktop applications, and they may not be a perfect indicator of what to expect elsewhere. However, since many games and other apps are lightly threaded, with three or four threads handling the bulk of the work, we wouldn't be surprised if one-per-module thread affinities were generally a win on Bulldozer-based processors.
Naturally, some folks who have been disappointed with Bulldozer performance to date may find solace in this outcome. With proper scheduling, as may come in Windows 8, future AMD processors derived from this architecture may be able to perform more competitively. Unfortunately, Windows 8 probably won't ship during the model run of the current FX processors.
At the same time, these results take some of the air out of AMD's rhetoric about the pitfalls of Intel's Hyper-threading scheme. The truth is that both major x86 CPU makers now offer flagship desktop CPU architectures with a measure of resource sharing between threads, and proper scheduling is needed in order to extract the best performance from them both. (This situation mirrors what's happened in 2P servers in recent years, where applictions must be NUMA-aware on current x86 systems in order to achieve optimal throughput.) A gain of up to 20% on a CPU this quick is certainly worthy of note.
Trouble is, right now, Intel has much better OS and application support for Hyper-Threading than AMD does for Bulldozer. In fact, we're a little surprised AMD hasn't attempted to piggyback on Intel's Hyper-Threading infrastructure by making Bulldozer processors present themselves to the OS as four physical cores with eight logical threads. One would think that might be a nice BIOS menu option, at least. (Hmm. Mobo makers, are you listening?)
At any rate, application developers who want to make the most of Bulldozer are free to affinitize threads in upcoming revisions of their software packages anytime. If AMD can persuade some key developers to help out, it's possible the next round of desktop applications could benefit very soon.
106 comments — Last by format_C at 11:59 AM on 12/15/11
|1. Hdfisise - $600||2. Ryszard - $503||3. Andrew Lauritzen - $502|
|4. the - $306||5. SomeOtherGeek - $300||6. Ryu Connor - $250|
|7. doubtful500 - $200||8. Anonymous Gerbil - $150||9. webkido13 - $135|
|10. cygnus1 - $126|
|Intel's Broadwell goes broad with new desktop, mobile, server variants14-nm chips for everyone||4|
|The TR Podcast bonus video: AMD, Zen, Fiji, and moreWith special guest David Kanter||54|
|AMD: Zen chips headed to desktops, servers in 2016Details of its new x86 CPU and plans revealed||250|
|Inside ARM's Cortex-A72 microarchitectureThe next-gen CPU core for mobile devices and servers||42|
|Semiconductors from idea to productThe story of how chips are made||54|
|Intel's Xeon D brings Broadwell to cloud, web servicesA big compute node in a small package||40|
|AMD previews Carrizo APU, offers insights into power savingsExcavator cores and other innovations to help improve efficiency||115|
|The TR Podcast 169.5 bonus edition: Q&A intensifiesYou ask, we attempt to answer||5|
|Intel's Broadwell goes broad with new desktop, mobile, server variants||4|
|Nanotube-infused NRAM promises DRAM speeds with unlimited endurance||21|
|Antec puts a new Signature on its cases with the S10||25|
|16.7 billion reasons Altera sold out to Intel||46|
|Nvidia released the GTX 980 Ti; you won't believe what Gigabyte did next||48|
|Be careful not to lose SanDisk's tiny 128GB flash drive||23|
|Asus squeezes Skylake CPUs, passive cooling into new mini-PCs||11|
|PowerColor's new sound card runs with the devil||26|