Update: MS releases, yanks Win7 hotfix for Bulldozer scheduling

Some weeks ago, we took a look at the issue of thread scheduling and performance on AMD’s Bulldozer-based FX processors. Since current versions of Windows were not aware of Bulldozer’s shared CPU “modules,” the OS wouldn’t schedule threads in the most optimal fashion on those processors. By manually controlling thread allocations, we demonstrated that Bulldozer CPUs perform best when threads are scheduled one per module, avoiding sharing if possible. In fact, we saw performance gains of roughly 5-20% in a small sampling of benchmarks. Avoiding sharing was clearly most optimal despite the fact that scheduling to favor sharing would result in higher clock speeds from AMD’s Turbo Core feature. Our conclusion? Bulldozer performance would benefit nicely if only Windows were to assign threads to Bulldozer modules just like it does to Intel’s Hyper-Threaded cores.

Happily, we won’t have to wait for Windows 8 to see that happen. Microsoft has released a hotfix for Windows 7 and Windows Server 2008 R2 that appears to use the scheduling logic for Hyper-Threading (or SMT, simultaneous multithreading) on Bulldozer processors. From the patch description:

This article introduces an update that optimizes the performance of AMD Bulldozer CPUs that are used by Windows 7-based or Windows Server 2008 R2-based computers. Currently, the performance of AMD Bulldozer CPUs is slower than expected. This behavior occurs because the threading logic in Windows 7 and in Windows Server 2008 R2 is not optimized to use the Simultaneous Multithreading (SMT) scheduling feature. This feature was introduced in the Bulldozer family of AMD CPUs.

I don’t think one would say that shared Bulldozer modules use traditional SMT, but since they should pretty much be scheduled that way, I doubt anyone at AMD is shedding a tear over that description. If you have an FX processor or a Bulldozer-based Opteron, this hotfix should offer some decent performance benefits for free. Nice to see it out in the wild now, instead of late next year with Windows 8.

Update: Microsoft has apparently pulled the patch from its download servers for unspecified reasons.  We’ll try to post another update when we find out more.

Update II – 10:55AM: We’ve spoken with an industry source familiar with this situation, and it appears the release of this hotfix was either inadvertent, premature, or both. There is indeed a Bulldozer threading patch for Windows in the works, but it should come in two parts, not just one. The patch that was briefly released is only one portion of the total solution, and it may very well reduce performance if used on its own. We’re hearing the full Windows update for Bulldozer performance optimization is scheduled for release in Q1 of 2012. For now, Bulldozer owners, the best thing to do is to sit tight and wait.

Comments closed
    • ronch
    • 8 years ago

    This sucks. AMD should have been collaborating with MS way before BD’s launch so this patch would have been fully baked by now, if not at launch. As it is, it’s late, it’s broken, and it’s just embarrassing. AMD really has been grabbing a lot of negatives lately : BD didn’t get good reviews and AMD still overprices the chip, 32nm troubles (ok that’s GF’s fault, but AMD ultimately gets hit hard by it), lots of heads got sacked, AMD couldn’t even count transistors properly, and now this. Marketing, PR, engineering, manufacturing,,, what a mess. I’ve been an AMD supporter all these years but lately they’re really falling hard on their faces. Ok, before anyone tells me this is MS’s fault, know that it’s up to AMD to properly instruct MS how to treat its CPUs. TR has found that AMD’s solution isn’t optimal, and ok, they’re just a bunch of journalists, but I don’t recall Intel making these embarrassing mistakes. Intel knows what it’s doing and what their products are capable of. Heck, a few years ago even AMD doesn’t have a clue why Cool and Quiet is killing its benchmark numbers. I’ve experienced it too, and I wrote to AMD about it. They acted as though the CPU and CnQ are fine. Yeah, then later Anandtech brought up an article about it. Ha!

    I hope I’m wrong, but these are signs of a company that is falling apart.

    • HisDivineOrder
    • 8 years ago

    What’s worse than designing a CPU that performs poorly in Windows and not having a plan in place after said product was delayed for YEARS to get that part of Windows updated?

    Releasing half of the solution prematurely, getting the hopes and dreams of the 5 people who bought into your CPU that could get no good reviews, then yanking those hopes away with no official remarks of any kind announced, rumors and hearsay suggesting that the “real” update will come later.

    The launch of Bulldozer has been such an insane catastrophe from before Day 1, it’s hard to imagine AMD ever doing worse at a product launch again. I didn’t think they could beat their Phenom bug launch, but wow, they sure did prove me wrong. It just takes work across every division inside AMD to make a reality!

    • Xenolith
    • 8 years ago

    Bulldozer = Vista

    PileDriver = Windows 7

    Vista wasn’t a bad OS, the problem was that MS didn’t have the 3rd party partners in line when it launched. Windows 7 came out with a cleaner look with the same driver model … much better received.

    Same thing with with Bulldozer. AMD’s partners didn’t coordinate with this release, so performance isn’t optimized. By the time PileDriver is released, everyone should be caught up.

      • ronch
      • 8 years ago

      Except at this point no one really knows how Trinity/PileDriver will stack up. Demos are one thing, benchmarks are another. Will PileDriver be a great improvement over Bulldozer? No one knows.

    • sschaem
    • 8 years ago

    To little, way to late.AMD, what a joke you have become!

    AMD should have had this done month before the release, not almost 6 month later.

    But then again, who cares. Cray is not using Windows and they already have a kernel patch.

    lame AMD, lame.

      • cygnus1
      • 8 years ago

      Why is it AMD’s responsibility to patch the scheduler of a proprietary, closed source operating system? Honestly, people should be happy that MS is willing to provide a patch for Win7 and Server 2k8 R2 instead of making people buy Win8.

      However, AMD could have avoided the need for a patch if it had ID’ed the 2nd core of each module as a logical core instead of physical. That would have signaled all current OSes to schedule them as if they were SMT enabled cores

        • Welch
        • 8 years ago

        I don’t think AMD should have done anything to label it as a logical core…. it is in fact psychical, hence they should label it as such, this may serve a purpose later down the road and with software such as VMWare that utilizes and licenses machines based on physical CPU’s/Cores. When your splitting up the hardware amongst virtual machines I believe it makes a difference.

        You don’t call a spade a spade…. you call it a freaking Shovel!

        And Sschaem, its not AMD’s responsibility to write software for how Microsoft’s OS handles hardware ahaha….. you seriously don’t get that? If anybody is to blame for the mishandling of the cores its Microsoft for its delayed response… Or perhaps a combo of the two companies. In reality they should have been working closely together during the production of Bulldozer to make sure something as simple as Thread assignment was working as intended.

          • cygnus1
          • 8 years ago

          [quote<]I don't think AMD should have done anything to label it as a logical core.... it is in fact psychical, hence they should label it as such, this may serve a purpose later down the road and with software such as VMWare that utilizes and licenses machines based on physical CPU's/Cores. When your splitting up the hardware amongst virtual machines I believe it makes a difference.[/quote<] Unfortunately there are not really 8 full cores on these chips. There are 4 modules that each have most of the resources of 2 cores, but not all. For a static frequency, scheduling on Bulldozer should run just like on a Hyperthreaded/SMT CPU. IE unrelated threads should be on separate modules while related threads should be on one module. And to complicate that further, with Turbo Core enabled they should be crammed onto fewer modules if possible to allow for idle modules to be power gated and the active modules to ramp up in speed to improve single threaded performance. From a licensing perspective, as a CPU purchaser, I would absolutely prefer them be labeled as logical cores as most companies don't charge for logical cores. And from a VMWare server virtualization performance perspective it would still be better for them to be logical cores for the same reason MS is fixing their scheduler to treat them that way, there is a performance benefit to giving related threads the same module affinity and unrelated threads to have a different module affinity.

            • shank15217
            • 8 years ago

            Threads are packed onto the same module in the first place.. that’s why the performance is low.. what are you talking about? This patch will make unrelated threads run on different modules to improve execution resources in lightly threaded scenarios. This is an issue about turbo frequency vs module usage and it turned out that turbo wasn’t good enough to overcome the performance loss of placing a second unrelated thread on the same module.

            • cygnus1
            • 8 years ago

            I don’t know if you understood what I was trying to say. The original scheduler in Win7 and prior assigns threads mostly randomly among cores that it sees as the same and on the same NUMA node. It doesn’t recognize that related threads should run on cores 0 and 1 (module 0) or on 2 and 3 (module 1), etc. It will also place two unrelated threads on cores 0 and 1 (both on module 0), inducing a performance penalty in comparison to if the two unrelated threads were on cores in different modules. Take for instance 3 threads, related threads 1a and 1b and unrelated thread 2. The original scheduler would run them all on mostly random core choices depending on current workload. To run optimally on Bulldozer, threads 1a and 1b should be kept on the same module to increase the chance of an L2 cache hit and thread 2 should be on a second module to prevent L2 cache thrashing. My point above was mostly about this behavior. If AMD had marked the second core of each module as a logical core, existing hyperthreading aware scheduling would have kicked in and scheduled things properly give a static CPU frequency.

            But this is all irrespective of Turbo modes. The original scheduler was essentially working against the Turbo mode because it treated all the cores the same and would schedule threads on any core at any time. In order to take advantage of Turbo mode, the scheduler needs to recognize a few a scenarios and handle them differently. For instance it needs to recognize when it has a smaller number of high utilization threads than cores and intentionally stop scheduling any low utilization threads on the higher numbered modules in order to give the turbo mode a chance to kick in, power gate the higher numbered modules and increase the frequency on the remaining modules.

            This patch is the intended behavior of the Windows 8 scheduler being back ported to Windows 7. Microsoft has clearly been working on this for a while as part of Windows 8. I’m betting it will also improve performance in certain situations on Intel hyperthreaded and/or turbo enabled CPUs as well.

        • sschaem
        • 8 years ago

        Exactly my point…

        a) MS Patch its kernel for CPU ‘all the time’. check the KB.

        Reminder : AMD showed its bulldozer system running back in november 2010 at their Financial Analyst Day, over a year ago.
        At the time, yes, it was running windows7

        Are you telling me that no AMD executive could get MS to do any patch for bulldozer in over a year ?!
        The issue is all in AMD camp in how they handled this issue, that they identified only a few month ago.

        b) AMD shouldn’t design a CPU for windows that require an OS patch if they cant get MS to release the patch for it.

        something is rotten, AMD dropped the ball. I will tell you why in my next post.

          • shank15217
          • 8 years ago

          Just stop and at least read the TR article before you comment further… AMD didn’t drop any ball, their reasons were valid, it just turned out their turbo wasn’t aggressive enough to pack unrelated threads on the same module. This is a server chip by design and it excels in it’s workload. AMD has real issues with yields and the platform is new enough that the compilers need to catch up as well as other kinks that need to be ironed out however it’s not a bad design and it has a lot of room to grow.

            • cygnus1
            • 8 years ago

            Neither AMD or Intel can do much of anything about where an OS schedules threads other than get the scheduler patched. The turbo modes don’t push threads anywhere, they simply respond to the state of the CPU

          • Yeats
          • 8 years ago

          [i<]AMD shouldn't design a CPU for windows that require an OS patch if they cant get MS to release the patch for it.[/i<] I think you should familiarize yourself with the definition of the word, "require". Bulldozer clearly does not "require" a patch to run Windows. The patch - whenever it's finished - is designed to improve performance. Several Intel i5-2500k and 2600k users who installed the patch noticed that they, too, benefited in certain areas. Does this mean that Intel CPU's with HTT "required" the patch? Of course not.

            • travbrad
            • 8 years ago

            How would the 2500K benefit from such a patch? It doesn’t support/use hyperthreading.

            • Yeats
            • 8 years ago

            You’re right, and I apologize for the inaccurate reporting on my part. It’s i7-2600k users who’ve reported improvements.

            Here’s one user report, from Anandtech:

            [i<]Just so you guys know I installed the hotfix right when it came out on my sandy setup and its benching higher in everything that is using 8 threads and a bigger boost with avx. I broke 120gflops using avx and 8 threads in intel burn test when before I could never get over 108 and my cinbench points increased as much as a 150-200mhz over clock would add. this patch helps all threaded apps and helped out my 2600k Its crazzy how a 1.3mb file ca add so much performance[/i<] [i<]nothing bad so far,the os feels more responsive and the best way I can explain it is like going from 4gb ram to 8gb ram[/i<] [url<]http://forums.anandtech.com/showthread.php?t=2213100&page=5[/url<]

    • indeego
    • 8 years ago

    Can’t wait for these IE pushed patches.

    • chuckula
    • 8 years ago

    They said I was crazy to make a scheduler patch for Bulldozer. But I did it anyway! Then the first patch sank into the swamp. So I made another patch! It sank into the swamp too. Then I made a third patch! It made the Bulldozer catch fire, burn a hole through the motherboard, and then sank into the swamp. But I didn’t give up! I made a fourth patch, and the fourth patch worked! And that’s what’s running on my Piledriver core today!

      • Goty
      • 8 years ago

      This legitimately made me smile.

      • ew
      • 8 years ago

      You win the Internet today good Sir/Madam!

      • StuG
      • 8 years ago

      Best thing on the internet today, this comment. ^^

    • can-a-tuna
    • 8 years ago

    This is bulls*it. If the patch only gives 1-3% performance increase that is not enough. AMD really did “Pentium IV”:s with bulldozer. Amazing they haven’t learned from history. I will still buy ATI graphics but this time I seriously think of switching to Intel after being AMD fan for 10 years. BD is just unacceptably crappy processor.

      • Yeats
      • 8 years ago

      We can’t accurately judge the patch yet, because it is apparently still in development and MS posted it early.

      Pentium 4 was a lot closer to Athlon than Bulldozer is to i5.

      • sweatshopking
      • 8 years ago

      man. you must really HATE that cpu, because you’re as insanely fanboi as they come!

      • NeelyCam
      • 8 years ago

      [quote<]I will still buy ATI graphics but this time I seriously think of switching to Intel after being AMD fan for 10 years. [/quote<] Step into the light... Leave the past behind. Leave the Hate behind. Let the warm breeze of All That Is Good embrace you. You will find Peace and Happiness.

      • Draphius
      • 8 years ago

      i dont see any reason to buy an amd chip atm and ive owned plenty of amd chips in my time. switch to an i5 2500k or i7 2600k and youll be happy.

        • travbrad
        • 8 years ago

        Phenoms are still a good value in the low-end, although they aren’t going to be available much longer (they stopped producing X6 already). AMD can’t even compete with themselves apparently.

          • ermo
          • 8 years ago

          I can’t say that I regret upgrading from a 720BE to a 955BE on the same mobo in September after BD was delayed yet again.

          Should I have bought an X6? Maybe, but the 955BE was the cheapest available option at the time and when OCed to 3.8(core)/2.6(NB-CPU) it beats the similarly priced i3-21xx and beats the BDs in the games I play. And it doesn’t need a scheduler update to improve its competitiveness.

            • Yeats
            • 8 years ago

            [quote<]And it doesn't need a scheduler update to improve its competitiveness.[/quote<] I don't really get comments like this, they sound petty and juvenile. Are you saying you wouldn't appreciate an update that improves performance? Intel CPU's that support HTT may get a boost also. Do you complain when your video card gets a new driver that improves performance/fixes bugs?

      • shank15217
      • 8 years ago

      But that’s not true.. in lightly threaded workloads scheduler awareness can increase performance by 10-15% There is even a TR article about it..

      • ronch
      • 8 years ago

      It’s not crappy. The FX-8150 is priced higher than the 2500K, so it has to be better, right?

      // sarcasm

    • fellix
    • 8 years ago

    Well, after browsing the forums for user benchmarks it seems this patch does much more harm to the performance than any gain. It’s of no use for BD now, let’s hope for a successful second take.

      • NarwhaleAu
      • 8 years ago

      Ahh Second Take. How I miss that show. I have never forgiven Tom’s Hardware.

        • crazipper
        • 8 years ago

        If it makes you feel any better, both Ben and Rob are doing really well now.

        • Dysthymia
        • 8 years ago

        Same here! I miss The Spotlight with Tamara Krinsky as well. ] :

Pin It on Pinterest

Share This