AMD corrects muffed Bulldozer transistor count

In our review of AMD’s “Bulldozer” desktop processors, we relayed a bit of information from AMD that wasn’t quite right. The company initially told us, along with virtually everyone else in the press, that Bulldozer/Zambezi chips were made up of two billion transistors. That is a very big number, and honestly, it took us some time to process. I’ll even admit to spending the better part of a day talking to people about how extensively AMD used logic synthesis in Bulldozer, trying to figure out where all of those transistors came from.

Turns out the answer to our question was relatively simple: the transistor count number AMD supplied to the press was incorrect, as an article at AnandTech pointed out today. After reading that, we spoke with AMD PR rep Phil Hughes, who confirmed that the correct number for Bulldozer’s transistor count is 1.2 billion. Hughes told us the initial figure was a very early and rough estimate that was simply not accurate, and he said he’d triple-checked the new number with engineering.

Code name Key

products

Cores Threads Last-level

cache size

Process node

(Nanometers)

Estimated

transistors

(Millions)

Die

area

(mm²)

Gulftown Core i7-970, 990X 6 12 12 MB 32 1168 248
Sandy Bridge-E Core-i7-39xx 8 16 20 MB 32 2270 435
Thuban Phenom II X6 6 6 6 MB 45 904 346
Orochi/Zambezi FX 8 8 8MB 32 1200 315

As you may know, transistor counts for chips are generally estimates, and the methods of making those estimations can vary. Also, the relationship between the physical die area and the number of transistors on a chip is complex, because different types of transistors can occupy more or less space. Still, this new transistor count for Bulldozer makes a lot more sense, especially when one considers that Intel’s Sandy Bridge-E has 2.27 billion transistors (many of them in very dense SRAM cells) and is 120 mm² larger than Bulldozer.

Although die area is generally more important than transistor counts in the grand scheme, the more modest number of transistors in Bulldozer does improve our sense of its prospects somewhat. The chip’s performance still isn’t what it should be by any means, but its architectural efficiency now doesn’t seem quite so dire.

Naturally, with any change like this one, some folks on the Internet are bound to speculate about what really happened behind the scenes and whether AMD is, for some reason, not being entirely straightforward. The ball really got rolling this time with Dr. Schuette at LostCircuits suggesting the chip’s cache alone would occupy nearly 1.2 billion transistors. However, his first crack at the math assumed an eight-transistor SRAM cell, and those cells are almost certainly 6T, not 8T. We’ve pinged AMD to confirm. For further background, I also called up David Kanter of RealWorldTech, who wound up posting his own quick analysis of Bulldozer’s transistor count. He thinks Bulldozer may be closer to 1.38 billion transistors, based on various sources and informed speculation, but he acknowledges that he may be wrong or that variance in counting methods may account for some of the difference from AMD’s result.

Bottom line, in my view: the new number makes a lot more sense, and I think it’s probably about as correct as most of the others we’ve put into these tables over the years. We’ll be updating our Bulldozer and Sandy Bridge-E reviews with the corrected figure.

Comments closed
    • jweller
    • 8 years ago

    They mis-estimated the transistor count by 40%. Is a 40% margin of error pretty typical in transistor counts? Also I’m curious how do they count them in the first place, is it some kind of quantum probability formula. 😛

    • Silus
    • 8 years ago

    LOL, how can they NOT even know the number of transistors in their chips ? Every day it passes AMD keeps amazing us with ridiculous stuff like this!

      • SPOOFE
      • 8 years ago

      How many transistors are in your chips?

      • jensend
      • 8 years ago

      Um, maybe you were expecting them to count the transistors by hand just so you could be satisfied? (Of course that’d take about 38 years at one a second…)

      I’d be willing to bet that the latest CPU anybody has an exact transistor count for is the 6502, first made in 1975*. Bulldozer is ~350,000x more transistors than that. Transistor counts these days are always estimates; some estimates are better than others. Get over it.

      Recently AnandTech [url=http://www.anandtech.com/show/4818/counting-transistors-why-116b-and-995m-are-both-correct<]wrote about how Sandy Bridge's estimated transistor count went from .995 billion to 1.16 billion[/url<] between two production stages. Fabricating a chip is a complex process, and a lot of things happen between the schematic design phase and the actual silicon which change the transistor count. Maybe if they were getting paid hundreds of millions of dollars to make extra-accurate transistor counts they could force precise accounting of everything that happens along the way. But nobody in the real world cares about hyper-accurate estimates enough to bother slowing down the process for it. *Why the 6502? Well, it's the latest processor for which I could find a transistor count which gives a digit in the tens place, and I think it's quite possible somebody may know the exact count. It was very well-used, its schematics and design are public, and people have created transistor-level simulators for it based on very high resolution photographs of the silicon (impossible to do with visible light microscopy on any processor from at least the last decade due to diffraction).

        • Silus
        • 8 years ago

        Such a pathetic excuse for AMD you have…this is not about satisfying ME, but rather actually providing factual information about their OWN product. That’s called being competent and professional. If you don’t know the exact number, then don’t divulge that information at all.

        You AMD fanboys keep amazing me in how you protect/defend what cannot be defended, not to mention the rampant double standards that are beyond annoying at this point…

          • jensend
          • 8 years ago

          [quote<]If you don't know the exact number, then don't divulge that information at all.[/quote<] Nobody [i<]ever[/i<] knows the exact number for any modern chip, so if your advice were followed no company would give out any information at all about the number of transistors in their chips. Did you even bother reading my post beyond the first sentence? You're the one with a double standard, as you expect AMD to come up with exact transistor counts when NOBODY does that.

    • Abdulahad
    • 8 years ago

    Apart from not knowing how to make CPU, ADVANCED MATHEMATICS DISASTERS doesn’t know how to count as well… 🙂

    Ooh that’s why they can’t compete… their logic is flawed 🙂

    • P4Power
    • 8 years ago

    The Core i7-39xx processors have 6 cores and 12 threads, not 8 and 16. I know the server version SB-E have 8 cores, but the table is sort of misleading listing the 39xx as the key product.

    • ronch
    • 8 years ago

    This whole thing got me thinking. Why did AMD release the 2B transistor count figure in the first place anyway? Was it really sloppiness? Or was it really true, but this time AMD wanted to retract that figure and give a much lower one because BD isn’t performing up to par and it’s laughable it’s even a 2B transistor chip that gets spanked by a 1B chip (SB)? Or, maybe they initially gave the 2B figure on purpose, knowing it was wrong, because it’ll help create some buzz around BD? I think I read somewhere before that there seems to be a question of where ~800M of those transistors ended up. If this correction is true, I guess we have the answer now: They don’t exist.

    It may be excusable if your figure is off by just a few percentage points, but 1.2B is a mile away from 2B. That’s a huge miscalculation on their part. Something is horribly wrong inside AMD, folks. If the PR dept. and the engineering dept. can’t even get a simple figure across themselves, there’s a coherency problem. Everyone seems disjointed. If something as simple as this is happening, how sure are we that the engineers aren’t tripping amongst themselves designing their chips? Heck, they may even be getting BD’s cache coherency mechanism wrong, seeing as they have an inherent coherency problem. Maybe that’s why it sucks.

    Also, a bit off topic, but if their primary design goal with BD was saving die space and logic, why resort to logic synthesis which, according to sources around the web lately, produces circuits that end up 20% larger and 20% slower? Isn’t BD supposed to be FAST and SMALL (die size)?

    • ronch
    • 8 years ago

    This is a bit sloppy and embarrassing, isn’t it? First time I ever heard of a chip company who couldn’t even count properly. No wonder BD is under-performing. And no wonder they don’t have a clue that they have to price BD lower than competing SB parts such as the 2500K.

    • Mr Bill
    • 8 years ago

    If we are going to talk about the FX-8150 transistor count. Dave gets 1380 million including the 16M L2 plus L3 cache. To compare apples to apples, or AMD’s to Intels…

    It is the 8 core 20MB cache Sandy Bridge-E that is speced at 2270 million transistors. But what was released was six core and 15MB cache Sandy Bridge-E so that would be 1702 million transistors of which 810 million transistors are 15MB of cache which leaves us 893 million transistors for the 6 cores.

    We have the FX-8150 with David Kanter’s estimated 1380 million transistors of which 864 million are 16MB of combined L2 and L3 cache which leaves us 516 million transistors for 8 modules. Suppose those modules were thought about as 4 dual threaded “cores”. So to get us 6 dual threaded”cores”we need to bump that up by the appropriate factor and we get 774 million transistors which is close to the 893 transistor count for the 6 core Sandy Bridge-E.

    Now lets apply that 6/4 multiplier to David Kanter’s estimate for the FX-8150. We get 2070 million transistors! I wonder if the 8 module (4 “core”) FX-8150 was supposed to be a 12 module (6 “core) part but they had to cut it down to stay inside the power envelope?

      • willmore
      • 8 years ago

      That should be pretty easy to verify, someone go etch the metal off of a die and look.

      Or did you mean that early in the design it was a 6-module part and marketing got that stuck in their heads and missed when things got changed to a 4-module part?

        • Mr Bill
        • 8 years ago

        Yeah, I was thinking it might be the former but mostly the latter. At any rate for about roughly 1.7 -1.8 billion transistors Intel got a 6 core 12 thread product and AMD got 4 module 8 thread product. No wonder there is a performance disparity.

          • just brew it!
          • 8 years ago

          Using number of cores/modules/threads as your performance yardstick is nearly as bad as measuring it in MHz. You just can’t do any of those things and get meaningful results when you’re comparing disparate architectures.

            • SPOOFE
            • 8 years ago

            I didn’t see him make any comment about actual performance; he just extrapolated the separate parts of the chip to discern where approximate numbers of transistors would be located.

            • willmore
            • 8 years ago

            “performance disparity” Hmm, sounds like he mentioned it.

            • Mr Bill
            • 8 years ago

            Hopefully, I am counting right after reading these reviews. Bulldozer’s 8 integer cores outperformed SNBe’s 6 hyperthreaded integer cores. But SNBe’s 6 hyperthreaded FP cores wipe the floor with Bulldozers 4 dual threaded FP cores. So, actual numbers of cores seem to matter more than hyper/dual threading count of cores. In each case, the one with more actual cores mostly beats the one with more less cores in the integer and floating point benchmarks. So, yeah, there seems to be a ‘performance disparity’ depending on workload.

    • xeridea
    • 8 years ago

    I am genuinly curious why there can’t be an exact transistor count. The manufacturing equipment has to know where everything is. There are blueprints for everything, why can’t it just be counted?

      • sweatshopking
      • 8 years ago

      you wanna count 1.2 billion anything?

        • ronch
        • 8 years ago

        The Count, from Sesame Street, will gladly do it for you.

        • streagle27
        • 8 years ago

        Isn’t this why we have computers?

          • dpaus
          • 8 years ago

          Yes, but they’re not yet self-aware 🙂

      • just brew it!
      • 8 years ago

      It is not too far-fetched to imagine that the design tools don’t provide a feature to count every individual transistor on the die, because nobody cares.

        • willmore
        • 8 years ago

        Exactly. Even if the design tools do report a figure, summing up over all of the different modules used and how many of each one there is, and then adding in the clock distrubution and long signal amplification found in the ‘link’ phase, and you’re just got too many layers of abstraction to make a clear number easy to find.

        Plus, the engineers just don’t care. What matters is power and area budgets. That’s the resources that need to be minimized. The number of transistors is just meaningless. Plus, this is CMOS, even the definition of transistor is pretty flexable. Many people think of a transistor like the little plastic cased things you buy at RadioShack, but that’s not what they’re like in real silicon. They’re not all three terminal devices. Here’s a simple example, it’s a 2 input NAND gate: [url<]https://en.wikipedia.org/wiki/File:CMOS_NAND_Layout.svg[/url<] How many transistors are there? How many are there for a 3 input? Surprise, it's exactly the same number. The transistor just gets a little bigger. So, what's the value of transistor counting, again?

          • sschaem
          • 8 years ago

          Its not because its not that relevant that it cant be measured….
          All the layout tools will give you that number accurate to 1 transistor.

          This is not the case that AMD as no idea, and just guessed and say “Well, maybe its 2 billion”
          and later said “No, I guess it s 1.2 billion.. maybe. well for sure actually. 2 billion was wrong , but 1.2 is right”

          B*S … The PR team are just monkeys that heard that interlagos had over 2 billion transistor and labeled Zambezi to have 2 billion transistors.
          But I cant explain why no tech at AMD noticed that and informed the PR department.. unless this was intentional.

          • TurtlePerson2
          • 8 years ago

          A 3 input nand gate uses more transistors. It needs 6 transistors and a 2 input needs 4 transistors. You’re confusing transistors with diffusion regions.

          I could lay out a 64 bit adder on two diffusion regions (just like the NAND gate) if you gave me enough metal layers, but I would use hundreds of transistors.

      • TurtlePerson2
      • 8 years ago

      I’m kind of curious myself. I’m pretty sure that Cadence will tell you how many transistors you use. I’m not sure if it counts multi-fingered transistors as more than one transistor though (they technically are).

    • bcronce
    • 8 years ago

    This is good news. This means when they do get around to a 2bil tran cpu, it will have a lot of cores.. 😛

      • Farting Bob
      • 8 years ago

      And it will still be slower than the latest Intel Quad core.

    • tone21705
    • 8 years ago

    What does even 1.2b transistors look like? Can the human eye even see them?

      • cegras
      • 8 years ago

      You’ll see wonderful diffraction patterns from them.

        • knate
        • 8 years ago

        Yeah, looks like a colorful hologram, but you won’t be able to pick out any individual transistors.

      • yogibbear
      • 8 years ago

      [url<]http://www.google.com.au/imgres?q=32nm+wafer+silicon&um=1&hl=en&biw=1308&bih=911&tbs=isz:lt,islt:2mp&tbm=isch&tbnid=l1bqv_Wt4Jlp3M:&imgrefurl=http://www.intel.com/pressroom/kits/events/idffall_2009/photos.htm&docid=3nH4DzCEi_ScDM&imgurl=http://download.intel.com/pressroom/kits/events/idffall_2009/images/idf_2009_keynote_baker_1.jpg&w=4256&h=2832&ei=Z-vZToDbDZGhmQXZnfDuCw&zoom=1[/url<]

    • DavidC1
    • 8 years ago

    But at least with 2 billion transistors, the increased power consumption makes sense. Now with 1.2 billion, it means they created a relatively modest transistor count chip that sucks power.

      • OneArmedScissor
      • 8 years ago

      Yeah, well, look what happens when you turn a 32nm CPU with only 382 million transistors up to 4.4 GHz:

      [url<]https://techreport.com/articles.x/18448/16[/url<] Voltage + clock speed > transistor count

      • chuckula
      • 8 years ago

      You have an excellent point. Before, with the 2 billion figure, AMD could at least say that they had a chip in the ballpark of Sandy Brdge E in size with about the same power draw. Now they have a chip with substantially fewer transistors, but with the same power driaw. When you remember that the L3 cache is clocked much slower at about 2.2 ghz, you really have to wonder at how AMD managed to make such a lousy product from the power comsumption standpoint.

        • Krogoth
        • 8 years ago

        32nm process, high voltage and high clock-speed.

        Westmere chips (32nm Nehalem) power consumption go through the roof if you push them beyond 3.5Ghz and voltage needed to keep it “stable”.

        People keep forgetting that SB owes its modest power consumption to 28nm process and conservative voltage. Intel doesn’t need to crank up the MEGAHURTZ. Sandy-Bridge E is when Intel doesn’t hold back as much. Sandy-Bridge Es consume a lot more juice, but they have the performance to back it up if your applications can fully utilize it.

          • Peldor
          • 8 years ago

          You lost me at Sandy Bridge 28 nm process.

          • NeelyCam
          • 8 years ago

          [quote<]People keep forgetting that SB owes its modest power consumption to 28nm process and conservative voltage.[/quote<] SB uses the same 32nm process as Westmere, although the process is more mature now, and transistors are probably faster.. Also, SB doesn't have the same speed paths as Westmere, so it can clock higher at lower supply voltages.

      • sschaem
      • 8 years ago

      The horrible power efficiency come from the fact that AMD had to shoot up the voltage to made a desktop chip viable.
      At lower frequency/voltage, its actually a big step forward for AMD. But who wants a 2ghz chip with low IPC?

      If AMD can use 1.2V or lower Vs the 1.4V they are stuck at now, expect a dramatic reduction in power under load.

      As we saw, at iddle this chip match intel best design (until ivy bridge is out). its only under load that (1.4 overvolted) it literally suck.

      But then ,AMD has been trying for over 18month to lower voltage at >3ghz without success.
      AMD announced last week that they are out of the ‘Intel race’ so its possible that they completely given up on that front.
      A few AMD interview also show the AMD team and proud and 100% happy with the bulldozer design and outcome.

      At this time at AMD the attitude is “Its not broken, it work exactly as expected, so we wont fix it”

    • Goty
    • 8 years ago

    It’s within a factor of two. That’s a pretty darn good measurement in my field.

      • yogibbear
      • 8 years ago

      I hope you’re not a doctor…. or an engineer… or anything. Are you a politician?

        • BoBzeBuilder
        • 8 years ago

        He’s Paul Ryan.

          • NeelyCam
          • 8 years ago

          Paul who?

        • Goty
        • 8 years ago

        Actually, I’m an astrophysicist. I work in orders of magnitude, a factor of two is nothing. =P

          • SPOOFE
          • 8 years ago

          You, ah, work with lots of silicon in that astrophysics line of work, there? 🙂

    • WaltC
    • 8 years ago

    No wonder I was scratching my head over BD’s reported 2B transistors! Now I can quit scratching and go back to waiting on the first dramatic iteration of BullDozer for the consumer market. It’s a bit weird why AMD would introduce a cpu without having a fairly good handle on a number like this. Seems like something fairly sloppy and not exactly confidence inspiring.

    I want to see AMD massively hand-tune this beast as I think it’s the only way to make this design break a few Intel balls along the way…which would be nice for a change…;) I’m longing for some competition these days.

    I almost get physically ill thinking about being driven back to Intel eventually as that was a habit I thought I’d sworn off and taken the pledge on way back in 1999. ..;) Seems a shame to roll off the wagon after so many years of being Intel free…:D

      • Duck
      • 8 years ago

      AMD also needs to cut all or nearly all of that L3 cache. Everything will get better… die size, transistors, cost, TDP, cache latency. 8MB L3 cache for a consumer/desktop CPU is very poor.

        • WaltC
        • 8 years ago

        You bet, and I think that’s one place the hand optimization is going to go to tune this baby for the consumer market. That’d be one of the first things I’d look at, too. Bound to really get in the way of yields at this stage. Here’s hoping round two is dramatically different.

        (I hate it when companies start to flounder, as if they can’t quite figure out the way the wind is blowing. That’s a condition reserved for politicians–not tech companies.) I want to see AMD get back to 2/3-3/4 substance with the remainder being marketing fluff!

          • flip-mode
          • 8 years ago

          Hmm… /if/ AMD does do some “hand-tuning” of BD, that effort will probably take substantial time. Maybe they’ll hand-tune little bits and pieces at a time, hopefully targeting the most important areas first. Regardless, I would be surprised if any of this would be able to be done in quick order. There would be debugging involved and new lith masks and all that stuff – stuff that’s far over my head. But, my point is that it will probably be months – like 10 to 20 months – before that stuff makes it into the production silicon.

            • spigzone
            • 8 years ago

            Trinity is Bulldozer II (Piledriver) based and is operational now, although the release date appears to be based on GloFo’s getting yields to an acceptable level.

            AMD has obviously been ‘hand tuning’ the next iteration of Bulldozer for the consumer market for a very considerable time.

            It is reasonable a follow-up to Bulldozer can be expected sooner than later.

            • flip-mode
            • 8 years ago

            Are you talking about the same kind of hand-tuning that WaltC and I are talking about? WaltC is referring to the fact that much of BD was designed with “logic synthesis”, and this term was also referenced in the first paragraph of this news article. My understanding of that term is that it means transistor layouts performed by software. This is opposed to “hand-tuned” transistor layouts that are meticulously designed by human engineers. Supposedly, from what I have heard, hand-tuned transistor layouts are superior to synthesized layouts in a few ways. From memory, those ways include a more efficient use of transistors and circuitry that can attain higher clockspeeds.

            The downside of hand-tuned is that it takes a frickin long time – much longer than the synthesized route. I have no idea what that means in absolute terms, just that relatively speaking, layout by human is much slower than layout by automated software.

            So I’m wondering if there has been enough time for any of the circuitry in Zambezi to be hand-tuned and released in the upcoming Piledriver unless it is just in some very small and specific areas. Zambezi was probably late to launch by interior roadmaps so maybe they’ve been working on Piledriver internally for longer than we know, but my understanding is that it takes [i<]years[/i<] for humans to do the circuitry layout, not just a few months. And I have not heard that AMD has said anything at all about reducing the extent of synthesized circuitry in Piledriver, and it sounds like WaltC is just hoping that this will eventually happen, rather than expecting that it will happen. As far as anyone knows, the chages in Piledriver don't have anything to do with whether or not a human or a machine lays out the circuits.

            • WaltC
            • 8 years ago

            Yea, hand tuning will be a continuing process as much will be decided by yield efficiencies GF is able to (hopefully) reach with upcoming die shrinks and process revamps. Sometimes lots of stuff can be done at once–some stuff doesn’t have to be done because “other stuff” that got done eliminated a need for it, etc.

            Earlier post–I think under the right circumstances an L3 can be a very beneficial approach. But my feeling is that it won’t contribute too much for the consumer markets (as opposed to commercial server markets) and its absence on a few iterations of BD should dramatically improve yields in the consumer segment . It might actually take very little to rapidly alter the general power and market appeal of BD.

        • khands
        • 8 years ago

        That’s one of the things that makes me excited about Trinity.

        • cegras
        • 8 years ago

        What’s wrong with L3 cache? Genuinely curious.

          • Duck
          • 8 years ago

          It takes up a tonne of space. For a server CPU maybe it’s useful. For a desktop CPU it makes everything worse; including performance due to having to have a higher latency than is possible with a smaller cache.

            • d0g_p00p
            • 8 years ago

            lol, u wish you knew what you were talking about

            • cegras
            • 8 years ago

            Isn’t it better to have L3? It adds a layer of memory faster than RAM.

            • Duck
            • 8 years ago

            No. Sandy bridge E performs on the same level per core as sandy bridge for example. This is all assuming a desktop workload.

            SB-E is a big, expensive, power hungry platform. L3 isn’t worth it.

            • cegras
            • 8 years ago

            So what are the technical reasons?

            • flip-mode
            • 8 years ago

            Like Duck said: server workloads. SB-E is really a server chip. And Zambezi is really a server chip.

            • cegras
            • 8 years ago

            Riiiight .. But I am looking for an architectural, micro code, low level, etc, reason for why L3 is a bad idea.

            • Krogoth
            • 8 years ago

            L3 cache is big (2MiB+, huge transistor budget) and slow (latency/bandwidth). It is meant for large data chunks/streams that are commonplace with server/workstation-level applications.

            It never gets fully utilized in mainstream applications, so it ends eating up the transistor budget = higher power consumption for dubious returns with mainstream stuff.

            • flip-mode
            • 8 years ago

            I’m not saying L3 is a bad idea. This stuff is miles over my head, but I have heard that larger caches are slower:
            [url<]http://wiki.osdev.org/CPU_Caches[/url<] As for whether or not the L3 cache helps or hurts, well, it's certainly faster than going to RAM, so it must be something about not only the size of the L3 but also the way that it is used. For all I know, the fact that there is an L3 cache might not be a problem at all, but instead the problem could be all about AMD's implementation of the L3 cache. Anyway, the water is already too deep for me. You'll have to aks google from here onward.

            • Peldor
            • 8 years ago

            An L3 certainly doesn’t make performance worse in most cases for desktop CPUs. Look at Phenom II X4 vs Athlon II X4 benchmarks. There are relatively few cases where the Athlon wins by any margin and plenty where it lags by 10%+, gaming especially.

            I agree it may well be a lot more [i<]efficient and sensible[/i<] for AMD to skip the L3 on Bulldozer, save on die area, power, etc, and then maybe just clock things up a bit, but an L3 is not a general performance penalty in the abstract case.

            • Duck
            • 8 years ago

            Yep, exactly. I meant a larger L3 can make performance worse rather than adding a L3 makes performance worse.

            They could cut the L3 down to say, 0.5 MB per module. Then when you compare it to the i5 2500k I bet it wouldn’t look quite so unappealing and perform about the same (again, assuming desktop workloads, gaming, etc).

            • wierdo
            • 8 years ago

            it makes no sense for L3 cache to be small, that’s what L2 cache is for, smaller but faster cache. L3 is supposed to be big so it can catch anything that wont fit in L2.

            • Duck
            • 8 years ago

            When you can sell your CPUs in excess of $700 then it’s easier to make a large L3 cache work out ok.

            Bulldozer will be better off without a L3 cache just to make it cheaper and less power hungry (it shouldn’t hurt performance too much).

            • wierdo
            • 8 years ago

            Oh I can see an argument for removing the L3 cache, there would be a tradeoff in performance vs cost. so a CPU designer could justify the decision based on market goals.

            Was just confused about making a CPU with a non-exclusive L3 cache that’s same size as its L2 cache or smaller, wouldn’t make sense, might as well have none vs tiny slow L3 cache.

          • OneArmedScissor
          • 8 years ago

          Bulldozer’s L3 is locked at 2.2 GHz, about half the core clock, and incredibly slow for going head to head with Intel’s CPUs, which are running their L3 synced to the up to 3.9 GHz core clock.

          This leaves Bulldozer with a memory latency disadvantage, slowing down [i<]everything[/i<]. While it may have about twice as much cache, Sandy Bridge already has enough, perhaps still too much. Look at the difference between i5s and i7s - practically nothing! 1.5MB or so per core is plenty for a PC - which Bulldozer already has at the much, much faster L2 level. It's like how quad-cores used to be slower for PCs than higher clocked dual-cores. More cores do you no good if they're idling and making the busy parts of the CPU slower. While turbo boost fixed that, it's only possible because of power gating, which can't be done to a shared cache stage. That isn't to say that AMD messed up by including an L3 cache. Servers need it. However, they run closer to 2 GHz core clock speeds with all cores active, so they're roughly in sync. Where AMD messed up is trying to push the desktop Bulldozer's core clock to 5 GHz territory, [b<]purely for marketing purposes[/b<], but leaving the L3 cache at the same speed as the server chips. This is why overclocked Bulldozer tests do so poorly. It's just making the problem worse! Unlike Sandy Bridge, Bulldozer's L3 cache is made out of slower, less leaky transistors, so it can't run the same speed as the core clock. AMD didn't do anything wrong here, either, as it is a necessity with so many transistors. However, the L3 does have some wiggle room, which could have been taken advantage of...had Bulldozer stuck closer to the traditional 3-4 GHz range and dropped its core voltage.

            • cegras
            • 8 years ago

            [quote<]Where AMD messed up is trying to push the desktop Bulldozer's core clock to 5 GHz territory, purely for marketing purposes, but leaving the L3 cache at the same speed as the server chips. This is why overclocked Bulldozer tests do so poorly. It's just making the problem worse![/quote<] Marketing? It seems like bulldozer's design was to have high clock speeds to compensate for lower IPC. To me the failure of the L3 to match the rest of the chip is entirely an engineering department fault.

            • OneArmedScissor
            • 8 years ago

            Then why are the server chips around 2 GHz?

            Failure to match the rest of the chip? Then I guess every single CPU ever made, except Sandy Bridge, is a failure? I answered your question in detail and now you’re just throwing out reality and making things up.

            It doesn’t necessarily have “lower IPC.” They aren’t “making up” anything by cranking up the core clock, as it’s hobbled by the L3, regardless of the speed, hence my point about how the overclocking tests still do poorly. I’ve only seen one site test increasing the L3 clock. Nobody else bothers, and just exacerbates the existing problem.

            These CPUs are 4+ GHz for desktops because then they can use it for advertising. It doens’t otherwise accomplish anything. Do you want Intel’s 3 GHz quad-core, or AMD’s 4+ GHz octal-core? That’s the way they show up in the Newegg search results to people who are none the wiser.

            • chuckula
            • 8 years ago

            [quote<]Then why are the server chips around 2 GHz?[/quote<] The reason is simple, the server chips have 2 dies crammed into an MCM and they have to meet the TDP numbers. Believe me, AMD *wishes* they could clock those server chips up through the roof, but it ain't happening. The fact that over 1/3 of the transistors on this chip (using the new numbers) only run at 2.2 Ghz *and* it still sucks more power than a 3960x at stock speeds raises some major red flags about AMD and Glofo's ability to execute. Further, the whole myth that GloFo's gate-first process gives AMD amazing transistor density benefits has just been shot straight to Newark. Using the 2 Billion transistor figure, AMD could at least brag that they crammed almost as many transistors as are in SB-E into a smaller die. Now, the transistor density for Bulldozer is the lowest of any 32 nm process chip from either AMD or Intel.

            • just brew it!
            • 8 years ago

            [quote<]The fact that over 1/3 of the transistors on this chip (using the new numbers) only run at 2.2 Ghz *and* it still sucks more power than a 3960x at stock speeds raises some major red flags about AMD and Glofo's ability to execute.[/quote<] It's not like there weren't red flags before. They've already admitted that yields on the 32nm Fusion APUs suck.

            • cegras
            • 8 years ago

            [quote<]It doesn't necessarily have "lower IPC." They aren't "making up" anything by cranking up the core clock, as it's hobbled by the L3, regardless of the speed, hence my point about how the overclocking tests still do poorly. I've only seen one site test increasing the L3 clock. Nobody else bothers, and just exacerbates the existing problem.[/quote<] You're a quick one to bark, aren't you. I was commenting on how the situation might have been different if the L3 in Bulldozer was 'synced to the rest of the core'.

            • Bensam123
            • 8 years ago

            I think more generally speaking, AMD messed up by rushing the product to market before it was mature enough to even take on it’s own chips it’s supposed to replace and look good. I don’t think that’s the engineers fault at this stage though. This smells like a management ‘get’r done’ push through.

      • NeelyCam
      • 8 years ago

      Removed in an effort to try not to offend anybody.

        • NeelyCam
        • 8 years ago

        -2? Wow, I’m sorry…! I’ll take it back!!

          • sweatshopking
          • 8 years ago

          you bastard.

      • ronch
      • 8 years ago

      You can stop scratching your head now? Good for you then. I can’t stop scratching my head regarding BD’s price. At the $245 MSRP (which is practically impossible to find), it’s still a bit of a stretch considering the i5-2500K is $220, is faster, and sucks far less juice.

      • Meadows
      • 8 years ago

      Any particular reason why you capitalise Bulldozer in such a manner?

        • WaltC
        • 8 years ago

        Wish I could spout something pithy, but–nah…;) I’m just not very rigid about much of anything having to do with BD at the moment.

    • ModernPrimitive
    • 8 years ago

    1.2b aerobic transistors. The other .8b are probably there as leg warmers…..

      • sirroman
      • 8 years ago

      [url<]http://www.youtube.com/watch?v=YPV93vA_sEU&list=FLA7djCNTq6xa_vtLnt0_gbg&index=4&feature=plpp_video[/url<] Ow! That's smaaaall...

    • Forge
    • 8 years ago

    So it’s not ridicu-wastage, but still underperforming. That 2bil number lent a lot of credibility to the former AMD engineer’s claim that AMD had overcommitted to automated circuit design instead of hand-laid. Now it’s just plausible.

      • quasi_accurate
      • 8 years ago

      You talking about that x-bit labs article? “Former AMD engineer” who is unnamed? Riiiiight.

Pin It on Pinterest

Share This