AMD’s high-bandwidth memory explained

For years now, AMD has taken on the responsibility of defining new types of memory to be used in graphics cards, standards that have eventually come to be used by the entire industry. Typically, being first out of the gate with a new graphics-oriented memory technology has given AMD a competitive advantage in that first generation of products. For instance, the introduction of GDDR5 allowed the Radeon HD 4870 to capture the performance crown back in the day.

Trouble is, GDDR5 is still the standard memory type for graphics processors to this day, seven years after the 4870’s introduction. Graphics memory has gotten faster over time, of course, but it hasn’t fundamentally changed for a long while.

GDDR5’s reign is about to end, however, thanks to a new type of memory known by the ridiculously generic name of high-bandwidth memory (HBM).  Although we’ve been waiting quite a while for a change, HBM looks a be a fairly monumental shift as these things go, thanks to a combination of new fabrication methods and smart engineering. The first deployment of HBM is likely to be alongside Fiji, AMD’s upcoming high-end GPU, expected to be called the Radeon R9 390X.

Fiji is a fitting companion for HBM since a team of engineers at AMD has once again helped lead the charge in its development. In fact, that team has been led by one of the very same engineers responsible for past GDDR standards, Joe Macri. I recently had the chance to speak with Macri, and he explained in some detail the motivations and choices that led to the development of HBM.  Along the way, he revealed quite a bit of information about what we can likely expect from Fiji’s memory subsystem. I think it’s safe to say the new Radeon will have the highest memory bandwidth of any single-GPU graphics card on the market—and not by a little bit. But I’m getting ahead of myself. Let’s start at the beginning.

The impetus behind HBM

Macri said the HBM development effort started inside of AMD seven years ago, so not long after GDDR5 was fully baked. He and his team were concerned about the growing proportion of the total PC power budget consumed by memory, and they suspected that memory power consumption would eventually become a limiting factor in overall performance.

Beyond that, GDDR5 has some other drawbacks that were cause for concern. As anybody who has looked closely at a high-end graphics card will know, the best way to grow the bandwidth available to a GPU is to add more memory channels on the chip and more corresponding DRAMs on the card. Those extra DRAM chips chew up board real-estate and power, so there are obvious limits to how far this sort of solution will scale up. AMD’s biggest GPU today, the Hawaii chip driving the Radeon R9 290 and 290X, has a 512-bit-wide interface, and it’s at the outer limits of what we’ve seen from either of the major desktop GPU players. Going wider could be difficult within the size and power constraints of today’s PC expansion cards.

One possible solution to this dilemma is the one that chipmakers have pursued relentlessly over the past couple of decades in order to cut costs, drive down power use, shrink system footprints, and boost performance: integration. CPUs in particular have integrated everything from the floating-point units to the memory controller to south bridge I/O logic. In nearly every case, this katamari-like absorption of various system components has led to tangible benefits. Could the integration of memory into the CPU or GPU have the same benefits?

Possibly, but it’s not quite that easy.

Macri explained that the processes used to make DRAM and logic chips like GPUs are different enough to make integration of large memory and logic arrays on the same chip prohibitively expensive. With that option off of the table, the team had to come up with another way to achieve the benefits of integration. The solution they chose pulls memory in very close to the GPU while keeping it on a separate silicon die. In fact, it involves a bunch of different silicon dies stacked on top of one another in a “3D” configuration.

And it’s incredibly cool tech.

Something different: HBM’s basic layout

Any HBM solution has three essential components: a main chip (either a GPU, CPU, or SoC), one or more DRAM stacks, and an underlying silicon wafer known as an interposer.  The interposer is a simple silicon die, usually manufactured using an older and larger chip fabrication process, that sits beneath both the main chip and the DRAM stacks. 

Macri explained that the interposer is completely passive; it has no active transistors because it serves only as an electrical interconnect path between the primary logic chip and the DRAM stacks.

The interposer is what makes HBM’s closer integration between DRAM and the GPU possible. A traditional organic chip package sits below the interposer, as it does with most any GPU, but that package only has to transfer data for PCI Express, display outputs, and some low-frequency interfaces. All high-speed communication between the GPU and memory happens across the interposer instead. Because the interposer is a silicon chip, it’s much denser, with many more connections and traces in a given area than an off-chip package.

Although the interposer is essential, the truly intriguing innovation in the HBM setup is the stacked memory. Each HBM memory stack consists of five chips: four storage dies above a single logic die that controls them. These five chips are connected to one another via vertical connections known as through-silicon vias (TSVs). These pathways are created by punching a hole through the silicon layers of the storage chips. Macri said those storage chips are incredibly thin, on the order of 100 microns, and that one of them “flaps like paper” when held in the hand. The metal bits situated between the layers in the stack are known as “microbumps” or μbumps, and they help form the vertical columns that provide a relatively short pathway from the logic die to any of the layers of storage cells.

Each of those storage dies contains a new type of memory conceived to take advantage of HBM’s distinctive physical layout. The memory runs at relatively low voltages (1.3V versus 1.5V for GDDR5), lower clock speeds (500MHz versus 1750MHz), and at relatively slow transfer rates (1 Gbps vs. 7 Gbps for GDDR5), but it makes up for those attributes by having an exceptionally wide interface. In this first implementation, each DRAM die in the stack talks to the outside world by way of two 128-bit-wide channels. Each stack, then, has an aggregate interface width of 1024 bits (versus 32 bits for a GDDR5 chip). At 1 Gbps, that works out to 128 GB/s of bandwidth for each memory stack. 

Making this sort of innovation happen was a broadly collaborative effort. AMD did much of the initial the heavy lifting, designing the interconnects, interposer, and the new DRAM type. Hynix partnered with AMD to produce the DRAM, and UMC manufactured the first interposers. JEDEC, the standards body charged with blessing new memory types, gave HBM the industry’s blessing, which means this memory type should be widely supported by various interested firms. HBM made its way onto Nvidia’s GPU roadmap some time ago, although it’s essentially a generation behind AMD’s first implementation.

 

The benefits of HBM

Macri says this first-generation HBM solution has a number of advantages over GDDR5. Higher peak transfer rates is chief among them, but that’s followed closely by some related wins. He estimates that GDDR5 can transfer about 10.66 GB/s per watt, while HBM transfers over 35 GB/s per watt.

HBM also packs tremendously more bits into the same space. A gigabyte of HBM is just one stack 35 mm² in size. By contrast, four GDDR5 dies can occupy 672 mm² of real estate. As a result, HBM ought to allow for much smaller total solutions, whether it be more compact video cards or, eventually, smaller footprints for entire systems.

Past GDDR memory types have largely been confined to graphics-related devices because of their high bandwidth and corresponding higher access latencies, but Macri expects HBM to make its way into more general applications. That’s no great surprise given that AMD hinted strongly at future server-class APUs that use HBM in its recent Analyst Day roadmap reveal. 

Although HBM was built primarily to deliver more bandwidth per watt, Macri cites a host of reasons why its access latencies are effectively lower. First, he cites “very small horizontal movement” of DRAM, since the data paths traverse the stack vertically. With more channels and banks, HBM has “much better pseudo-random access behavior,” as well. Also, HBM’s clocking subsystem is simpler and thus incurs fewer delays. All told, he says, those “small positives” can add up to large reductions in effective access latencies. Macri points to server and HPC workloads as especially nice potential fits for HBM’s strengths. Eventually, he expects HBM to move into virtually every corner of the computing market except for the ultra-mobile space (cell phones and such), where a “sister device” will likely fill the same role.

Another advantage of HBM is that it requires substantially less die space on the host GPU than GDDR5. The physical interfaces, or PHYs, on the chip are simpler, saving space. The external connections to the interposer are arranged at a much finer pitch than they would be for a conventional organic substrate, which means a more densely packed die. Macri hinted that even the data flow inside the GPU itself could be optimized to take advantage of data coming in “in a very concentrated hump.”

Of course, all of these things sound very much like the sorts of positive effects one might expect from closer integration of a critical component. In that respect, then, HBM looks poised to deliver on much of the promise of the initial concept.

A likely map to Fiji’s HBM-infused memory subsystem

We’ve already discussed the potential savings in die space that HBM might grant to AMD’s next big Radeon GPU. Surprisingly enough, I think we can map out that GPU’s entire memory subsystem based on my discussion with Macri and the information included in the JEDEC documentation for HBM.

The basic stack layout I outlined above is almost surely what Fiji uses: a four-die stack with two 128-bit channels per die. At a clock speed of 500MHz with a DDR-style arrangement that transfers data on both the rising and falling edges of the clock, that memory should have a 1 Gbps data rate. Thus, each 1024-bit link from a stack into the GPU should be capable of transferring data at a rate of 128 GB/s.

As in the examples provided by AMD, Fiji will have four stacks of DRAM attached. That will give it a grand total of 512 GB/s of memory bandwidth, which is quite a bit more than both the Radeon R9 290X (320 GB/s) and the GeForce Titan X (336 GB/s).  Based on that difference alone, I’d wager that the new Radeon will outperform today’s fastest GPU by a considerable margin. Memory bandwidth is one of a handful of key constraints that defines the performance of a GPU these days, and having that sort of an edge in bandwidth should translate into world-beating performance, provided AMD doesn’t have any show-stopping problems elsewhere.

One thing we don’t really know yet from the information Macri presented is how much power savings HBM really delivers in the context of a big GPU like Fiji, where power budgets can touch 300W. I’m not sure what portion of that budget is consumed by memory and memory-related logic.

Macri did say that GDDR5 consumes roughly one watt per 10 GB/s of bandwidth. That would work out to about 32W on a Radeon R9 290X. If HBM delivers on AMD’s claims of more than 35 GB/s per watt, then Fiji’s 512 GB/s subsystem ought to consume under 15W at peak. A rough savings of 15-17W in memory power is a fine thing, I suppose, but it’s still only about five percent of a high-end graphics cards’s total power budget. Then again, the power-efficiency numbers Macri provided only include the power used by the DRAMs themselves. The power savings on the GPU from the simpler PHYs and such may be considerable.

This first-gen HBM stack will impose at least one limitation of note: its total capacity will only be 4GB. At first blush, that sounds like a limited capacity for a high-end video card. After all, the Titan X packs a ridiculous 12GB, and the prior-gen R9 290X has the same 4GB amount. Now that GPU makers are selling high-end cards on the strength of their performance at 4K resolutions, one might expect more capacity from a brand-new flagship graphics card.

When I asked Macri about this issue, he expressed confidence in AMD’s ability to work around this capacity constraint. In fact, he said that current GPUs aren’t terribly efficient with their memory capacity simply because GDDR5’s architecture required ever-larger memory capacities in order to extract more bandwidth. As a result, AMD “never bothered to put a single engineer on using frame buffer memory better,” because memory capacities kept growing. Essentially, that capacity was free, while engineers were not. Macri classified the utilization of memory capacity in current Radeon operation as “exceedingly poor” and said the “amount of data that gets touched sitting in there is embarrassing.”

Strong words, indeed.

With HBM, he said, “we threw a couple of engineers at that problem,” which will be addressed solely via the operating system and Radeon driver software. “We’re not asking anybody to change their games.”

The conversation around this issue should be interesting to watch. Much of what Macri said about poor use of the data in GPU memory echoes what Nvidia said in the wake of the revelations about the GeForce GTX 970’s funky 3.5GB/0.5GB memory split.  If Nvidia makes an issue of memory capacity at the time of the new Radeons’ launch, it will be treading into dangerous waters. Of course, the final evaluation will be up to reviewers and end-users. We’ll surely push these cards to see where they start to struggle.

 

A few other tricky issues

Any tech as radically new and different as HBM is likely to come with some potential downsides. The issues for HBM don’t look to be especially difficult, but they could complicate life for Fiji in particular as the first HBM-enabled device.

One problem with HBM is especially an issue for large GPUs. High-end graphics chips have, in the past, pushed the boundaries of possible chip sizes right up to the edges of the reticle used in photolithography. Since HBM requires an interposer chip that’s larger than the GPU alone, it could impose a size limitation on graphics processors. When asked about this issue, Macri noted that the fabrication of larger-than-reticle interposers might be possible using multiple exposures, but he acknowledged that doing so could become cost-prohibitive.

Fiji will more than likely sit on a single-exposure-sized interposer, and it will probably pack a rich complement of GPU logic given the die size savings HBM offers. Still, with HBM, the size limits are not what they once were.

Another possible issue with HBM’s tiny physical footprint is increased power density.  Packing more storage and logic into a smaller area can make cooling that solution difficult because the cooling solution must transfer more heat through the available surface area. AMD arguably had a form of this problem with the Radeon R9 290X, whose first retail coolers couldn’t always keep up, leading to reduced performance. 

Fortunately, Macri told us the power density situation was “another beautiful thing” about HBM. He explained that the DRAMs actually work as a heatsink for the GPU, effectively increasing the surface area for the heatsink to mate to the chips. That works out because, despite what you see in the “cartoon diagrams” (Macri’s words), the Z height of the HBM stack and the GPU is almost exactly the same. As a result, the same heatsink and thermal interface material can be used for both the GPU and the memory.

(Notice that Macri did not say Fiji doesn’t have a power density issue. He was talking only about the HBM solution. The fact remains that leaked images of Fiji cards appear to have liquid cooling, and one reason to go that route is to deal with a power density challenge.)

The final issue HBM may face out of the gate is one of the oldest ones in semiconductors. Until HBM solutions are manufactured and sold in really large volumes, their costs will likely be relatively high. AMD and its partners will want to achieve broad market adoption of HBM-based products in order to kick-start the virtuous cycle of large production runs and dropping costs. I’m not sure Fiji alone will sell in anything like the volumes needed to get that process started, and as far as we know, AMD’s GPU roadmap doesn’t have a top-to-bottom refresh with HBM on it any time soon. Odds are that HBM will succeed enough to drive down costs eventually, but one wonders how long it will take before it reaches prices comparable to GDDR5.

When I asked Macri this question, he avoided getting into the specifics of AMD’s roadmap, but he expressed confidence that HBM will follow a trajectory similar to past GDDR memory types.

I don’t think his confidence is entirely misplaced, even if HBM may take a little longer to reach broad adoption. The tech AMD and its partners have built is formidable for a host of reasons we’ve just examined, and what’s even more exciting is the way HBM promises to scale up over time. According to Macri, HBM2 is already on the way. It will “wiggle twice as fast” as the first-gen HBM, giving it twice the bandwidth per stack. The memory makers will also move HBM2 to the latest DRAM fabrication process, giving it four times the capacity. The stack itself will grow to eight layers, and Macri said someday it may grow as large as 16. Meanwhile, JEDEC is already talking about what follows HBM2.

Whatever happens that far down the road, we appear to be on the cusp of a minor revolution in memory tech. We should know more about what it means for graphics in about a month, when AMD will likely unveil Fiji to the world.

Comments closed
    • anotherengineer
    • 5 years ago

    “Trouble is, GDDR5 is still the standard memory type for graphics processors to this day, seven years after the 4870’s introduction.”

    Geeeez has it been that long already?!?! Wow that was out over a year before Win7 was RTM!

    • BryanC
    • 5 years ago

    When Tonga was released, we were all excited about its memory bandwidth compression, and hoped that it would compensate for its low memory bandwidth. From the TR review:
    [quote<] By far the most consequential innovation in Tonga is a new form of compression for frame buffer color data... The payoff is astounding: AMD claims 40% higher memory bandwidth efficiency... 3DMark Vantage's color fill test has long been gated primarily by memory bandwidth, rather than the GPU's raw pixel fill rate. Here's how Tonga fares in it. Whoa. ... Perhaps my concerns about Tonga's memory bandwidth were premature. We'll have to see how well this compression mojo works in real games, but it certainly has my attention. [/quote<] Ok, getting back to this article: Fiji is purported to be a bigger version of Tonga, but with much more memory bandwidth. Something doesn't compute here: at Tonga's release we were arguing that Tonga didn't need more memory bandwidth, and now we're arguing that adding more memory bandwidth will lead to something amazing. I get that Fiji likely has more SPs than Tonga, but Tonga was already a 190 W part. Assume the 256b GDDR5 interface from Tonga consumes 20 W, and that Fiji's memory consumes 0 W. Then there's a 250/170=1.47X upside for Fiji over Tonga. Or assume Fiji is water cooled at 300 W. Then a 300/170 = 1.76X upside. However, Titan X is about 2.2X faster than R9 285 (according to the summaries at TPU - TR didn't test R9 285 against Titan X directly). So I don't think that adding more SPs to Tonga will overcome Titan X, because of power constraints. And I don't think adding more memory bandwidth will help much, because Tonga wasn't that memory bandwidth constrained, thanks to its color compression. The logical conclusion: either Fiji doesn't beat Titan X, or the real advancement in Fiji is other power efficiency improvements, not HBM.

      • chuckula
      • 5 years ago

      [quote<]Something doesn't compute here: at Tonga's release we were arguing that Tonga didn't need more memory bandwidth, and now we're arguing that adding more memory bandwidth will lead to something amazing.[/quote<] I brought up exactly that point when Damage & David Kanter had their video chat. They briefly address it near the very end of the video when they answer questions. Basically the answer was: More is always better... but it is a little confusing.

        • BryanC
        • 5 years ago

        That’s a rather strange answer, TBH. Damage and David Kanter should both know better than that.

          • chuckula
          • 5 years ago

          Part of it is that they don’t have complete information. David K could definitely figure out what is going on if he actually had all the relevant information in front of him.

      • BryanC
      • 5 years ago

      Here’s another way of saying the same thing.
      R9 285: 3.3 TFlop/s, 176 GB/s, 190 W: 18.75 Flops/byte, 17 Flops/W
      R9 290X: 5.6 TFlop/s, 320 GB/s, 300 W: 17 Flops/byte, 18.6 Flops/W
      R9 295X2: 11.3 TFlop/s, 640 GB/s, 500 W: 17.6 Flops/byte, 22.6 Flops/W
      Fiji: ?? TFlop/s, 512 GB/s.

      If we assume Fiji (derived from Tonga) has the same balance of compute and memory bandwidth as Tonga, it should have 9.6 TFlops (9600/512=18.75). If we also assume Fiji has the same performance per watt as R9 295X2 (AMD’s most power efficient big GPU), Fiji is then 424 W (!)

      So I guess I had missed a possibility: Fiji beats Titan X, but at over 400W.

      Strange, though, that R9 285 doesn’t have a lower Flop/byte ratio than 290X – was expecting color compression to make more of a difference in practice.

        • MathMan
        • 5 years ago

        I don’t think it makes sense to assume that 285 and Fiji have the same balance: the expansion of BW due to HBM is too big for that to be possible. But I like the idea of starting with Fiji as double the amount of resources of Tonga.

        There were already some leaked compute benchmarks (if I remember correctly) that pinned Fiji at 64 CUs. Due to HBM, the CUs will almost never have to stall due to lack of bandwidth, so while their peak TFlop/s will be double, their average TFlop/s will go up and average power as well.

        So I think 380W is a great starting point. Subtract 30W or so due to HBM saving and you end up with 350W.

          • BryanC
          • 5 years ago

          A couple things:
          If your hypothesis is true, and the balance of memory bandwidth to compute shifted dramatically towards memory bandwidth with Fiji, then color compression tech it has isn’t super useful.

          Second, the 190W for R9 285 is for only 28 CUs. If you want 64 that leads to 434 W. Maybe minus 30 W for memory savings – we’re still looking at a 400W part.

          Unless Fiji includes some big power efficiency gains, which as I said earlier would be more important to AMD than HBM – because it could help AMD’s entire product line today, whereas HBM is going to be niche for a while. To be honest, I feel like AMD’s marketing campaign around HBM is a lot of prestidigitation.

            • MathMan
            • 5 years ago

            I think color compression is useful irrespective of the amount of external BW: it reduces the amount of transactions on the external bus, which will always cost more power than die internal transactions. This will be less pronounced with HBM, but it’s still there. Fiji can probably use all the power savings it can get.

            As for the 28 vs 32 CUs: I knew that but these are very rough first order calculations anyway. Some would call it CEO math. 😉 We can already see in the 295X2 that it has more perfect/W than an otherwise identical 290X. I’m assuming that there will be similar scale efficiencies for Fiji. Who knows?

            I’m still down for 350W, but I wouldn’t bet more than a virtual beer on it.

            • BryanC
            • 5 years ago

            All of this sounds reasonable to me. Looking forward to the Fiji reviews.

            • Pwnstar
            • 4 years ago

            It’s useful if AMD cuts the HBM bandwidth in half using dual links to double the RAM.

            • MathMan
            • 4 years ago

            Think real hard for a second…

            Ok!

            Doing that would double the memory but keep the BW intact.

            Google ‘GDDR5 clamshell’ which uses the same principle.

            • Pwnstar
            • 4 years ago

            It doesn’t, because each HBM would have to share an IMC. GDDR5 doesn’t, each stack gets its own IMC.

            Fiji only has four IMCs.

      • flip-mode
      • 5 years ago

      I’m tempted to be dismissive of your whole post. HBM’s purpose isn’t solely to increase bandwidth. Increasing bandwidth is just one of several things that HBM does.

      The other possibility is that bandwidth is more important than AMD said it was with Tonga. To rephrase that, AMD’s claims about frame buffer compression were overstatements. This would not be even remotely surprising – most of AMD’s claims are overstatements.

      Building on that last statement, it would not be the least bit surprising if Fiji does not be Titan X or only beats it a portion of the time.

      What I’m saying may just be semantics, but the point is that HBM has a lot of potential regardless of whether Fiji beats Titan X, and whether Fiji beats Titan X probably depends on more than just HBM.

        • BryanC
        • 5 years ago

        [quote<]HBM's purpose isn't solely to increase bandwidth. Increasing bandwidth is just one of several things that HBM does.[/quote<] What are the other things HBM does? It saves power, sure, but remember that memory power is just a small fraction of GPU power - if you save 3x on 10% of your budget, you've saved 3% overall. The shader array is far more dominant. HBM saves physical space, so for space constrained applications (consoles, etc.) it might be appealing. Although power density is also very important for such applications. [quote<]The other possibility is that bandwidth is more important than AMD said it was with Tonga.[/quote<] I don't think this is true, outside of maybe a few specific scenarios. BW is probably more important at high resolution, but then we hear of a 4 GB limit, which is unfortunate: the scenarios that make use of high bandwidth often go past that limit. I don't see any evidence to suggest AMD GPUs are mostly bandwidth bound. We do have lots of evidence that AMD GPUs are power bound. [quote<]the point is that HBM has a lot of potential regardless of whether Fiji beats Titan X, and whether Fiji beats Titan X probably depends on more than just HBM.[/quote<] I agree with you on this point. Fiji's ultimate performance is probably not going to depend on HBM. From my perspective, Fiji is mostly constrained by power: how much efficiency AMD has gained over Tonga, and how extreme AMD is willing to push the cooling solution. HBM won't be the transformative technology for Fiji, IMO. I'm a fan of HBM long term, it's definitely the right thing for the industry. I'm just pointing out that Fiji's performance is mostly dependent on other things (on this we seem to agree.)

      • Chrispy_
      • 5 years ago

      Tonga is a 1080p card and it doesn’t have the fillrate to go much higher than that, so bandwidth isn’t a problem for Tonga.

      4K requires 4x as much bandwidth, so even with all the compression voodoo, cards that have the fillrate to deal with 4K also need bandwidth to match. Bandwidth is definitely the biggest problem at 4K. Titan-X does it by upping from 256-bit to 384-bit bus using 7GHz VRAM. AMD did it with Hawaii by upping the bus to 512-bits instead. HBM will effectively be a 4096-bit bus, for a 4-deep stack of 1024-bit HBM memory layers.

        • BryanC
        • 5 years ago

        [quote<]Tonga is a 1080p card and it doesn't have the fillrate to go much higher than that, so bandwidth isn't a problem for Tonga.[/quote<] So are you saying that Damage's excitement over Tonga's color compression was misplaced in his review? [quote<]Bandwidth is definitely the biggest problem at 4K. [/quote<] Not sure that's true. I think shader throughput is a big issue at 4K, and also memory capacity. Do you think 4G is enough?

      • Damage
      • 5 years ago

      There’s lot to answer here, and I’m not sure I can lay it all out for you. Several points, though.

      First, Tonga needed to make do with only a 256-bit interface on the R9 285, and that was a concern given everything. The color compression AMD added helped with that, making the Tonga + 256b combo about as effective as a Tahiti + 384b combo. So that was good, given the obvious memory bandwidth constraint on that card. We didn’t take a step back. Doesn’t mean Tonga wouldn’t benefit from additional bandwidth, like, say, a 384-bit bus.

      I *still* think Tonga has a native 384b bus that hasn’t been used. I think it will be eventually, most likely.

      True, Tonga wasn’t any more power-efficient than Tahiti aboard the R9 285, which is kind of terrible. Some theories on that front:

      -They were binning Tonga parts and choosing the less healthy ones that required more voltage for the R9 285. The better bins that run well at low voltage–with all shader CUs enabled–almost certainly went into the 5K iMac instead.

      -A good bin of Tonga with the full shader and memory interface count enabled could perform much better and consume no more power than the R9 285, maybe less.

      -Nothing says Fiji won’t have additional power savings beyond Tonga. I would hope it does! As you’ve noticed, it probably needs to be more efficient. But you can accomplish a lot by going “low and wide” with lots of shader ALUs and lower operating voltages. Remember, power use increases by the square of voltage. If Fiji is very wide, it could achieve a lot more performance per watt by keeping clocks and voltage in check.

      Finally, be careful in trying to figure out the overall GPU performance picture by looking at one constraint like ALU throughput or memory bandwidth. These things interact with one another very closely–ALUs must be fed with streaming data to be effective–and the needs of different games or even different frames within a game can move around from one bottleneck to another in tricky ways. I don’t think we know enough to predict doom or success for Fiji until we test it. I certainly wouldn’t predict doom based on what we know about HBM and Tonga at present!

        • BryanC
        • 4 years ago

        Let’s look at Tahiti, Tonga, and Hawaii again:
        [code<] Chip Product Tflops GB/s W Flops/B Flops/W Tahiti 7970 Orig 3.8 264 250 14.39 15.20 Tahiti 7970 GHz 4.3 288 300 14.93 14.33 Tahiti R9 280X 4.1 288 250 14.24 16.40 Hawaii R9 290X 5.6 320 300 17.50 18.67 Hawaii R9 295X2 11.3 640 500 17.66 22.60 Tonga R9 285 3.3 176 190 18.75 17.37 [/code<] When you say [quote<]Tonga needed to make do with only a 256-bit interface on the R9 285[/quote<], the interesting thing is that Hawaii actually made a much bigger jump in terms of Flops/B compared to Tahiti than Tonga did compared to Hawaii. Why weren't we concerned about Hawaii not having enough bandwidth? [quote<]Finally, be careful in trying to figure out the overall GPU performance picture by looking at one constraint like ALU throughput or memory bandwidth.[/quote<] Agreed on this one! This is the exact reason why I've brought these issues up: it's unlikely that adding a whole bunch of memory bandwidth on its own will yield great performance. You have to scale the whole GPU together. And if you scale the shader array much larger than Hawaii, you run into power limitations very quickly - unless AMD has dramatically improved power efficiency or gone with an extreme cooling solution. The main issue with AMD's GPU lineup right now doesn't seem to be memory bandwidth, given the data, as well as AMD's own discussion and implementation of color compression. However, AMD GPUs do seem to be power constrained. HBM is an interesting and important technology, but Fiji as a product won't hinge on memory bandwidth, because on 28nm the power limitations are going to be more important. The real story about Fiji as a product, IMO, is going to be about power: either dramatic power efficiency gains over Tonga, or cooling heroics with a 350-400W water cooler. I don't mean for any of this discussion to sound like "doom" or "success" - rather, I'm trying to focus attention on what I see the real issues are for Fiji. Can't wait to see your review - hopefully you'll address these issues directly! =)

    • swaaye
    • 5 years ago

    Maybe AMD should do a Kickstarter that hypes some new technology idea that sounds super exciting. Have their awesome Powerpoint A-team put together sleek parallelogram slides because I love that stuff. Come up with some swanky stretch goals. Seeing the comments here I bet they could get their R&D funded by fans. 😉

    • derFunkenstein
    • 5 years ago

    [quote<]With HBM, he said, "we threw a couple of engineers at that problem,"[/quote<] "...which is all we had."

    • Klimax
    • 5 years ago

    There was question about difference between HMC and HBM, there is discussion about that on RWT forums:
    [url<]http://www.realworldtech.com/forum/?threadid=150441&curpostid=150441[/url<]

      • psuedonymous
      • 5 years ago

      In practical terms:
      HBM uses stacked DRAM, on top of a silicon interposer, with the memory being accessed directly by the GPU (in this case) with control hardware on the GPU die.
      HMC uses stacked DRAM on top of a controller IC, and can connect to the GPU (or CPU) over a PCB rather than a dedicated interposer.

      Theoretical performance between the two is roughly equivalent. HBM is cheaper due to the simpler stacks, but with a need for a dedicated interposer (though that may mean savings elsewhere due to the simpler interconnect). HMC chips are more expensive due to the under-stack IC, but you can separate them from the GPU/CPU over a PCB as normal.

    • Klimax
    • 5 years ago

    Interesting. But I am pretty sure it won’t help AMD’s chips that much as apparently is hoped. They (especially with VRAM/bandwidth) are damn too inefficient. Drivers are still crap and until they get fully drivers in order, all these changes will have limited impact. (One just needs to compare what resources Maxwell uses against GCN to get full scale. Without serious investment in drivers, it won’t be enough. (This might be unpopular but facts and results are undeniable) And so far it is unknown if those new chips will have enough resources for 4K rendering to have bandwidth matter.

    [quote<] With HBM, he said, "we threw a couple of engineers at that problem," which will be addressed solely via the operating system and Radeon driver software. "We're not asking anybody to change their games." [/quote<] The only reason they can do that is, DirectX 11 abstracted developers from HW and thus drivers can deal with changes like this. NO low level API allows or can allow it and thus anything using low level API will have problem with it. That's why DX 12/Vulcan are bad ideas and never good for regular computers. Like any other previous low level API. This was precisely reason why Microsoft moved memory management to drivers when transitioning from DirectX 9 to DirectX 10. Funny. Everything keeps validating my statements and assertions, including those who tried to sell low level API as next great thing.

      • Action.de.Parsnip
      • 5 years ago

      “But I am pretty sure it won’t help AMD’s chips that much as apparently is hoped. They (especially with VRAM/bandwidth) are damn too inefficient. [..] (One just needs to compare what resources Maxwell uses against GCN to get full scale.”

      I would take issue with this assertion. By my interpretation of Maxwell performance, lets say in the context of R290x versus GTX980, I see one outperforming the other by dint of a higher switching speed. That is the margin by which it seems to perform faster is very close to the margin by which it’s (sustained) clock speed is faster. Hawaii is something like 10% larger, but in that comes a bus of double the width and half-rate DP performance. The real world implications of this observation can be argued of course but the throughput at a given clockspeed of GCN in a given area looks very solid.*

      Power consumption however is on another planet entirely.

      In terms of VRAM/Bandwidth, usage of the former appears fine from the utlization test I have seen, usage of the latter … *perhaps* it looks bad in the GCN 1.0 of the 280/290 cards? For better or worse they’re still on sale and yet to be rebadged and put on sale again, which it goes without saying is not good at all, but on a technical level the launch vehicle for HBM is apparently an upgraded version of the much more modern Tonga.

      *Comparing GPUs by performance per clock is often frowned upon but now everyone is on a SIMD architecture the comparison is arguably sound.

    • Ninjitsu
    • 5 years ago

    SCOTT! I read this comment on AT:
    [quote<] Honestly it is looking more and more like Fiji is indeed Tonga XT x2 with HBM. Remember all those rumors last year when Tonga launched that it would be the launch vehicle for HBM? I guess it does support HBM, it just wasn't ready yet. Would also make sense as we have yet to see a fully-enabled Tonga ASIC; even though the Apple M295X has the full complement of 2048 SP, it doesn't have all memory controllers. [/quote<] You've been predicting a fully enabled Tonga chip forever now, maybe this is the case?

      • ImSpartacus
      • 5 years ago

      I hope that’s not where we see full tonga. It would make me cry to see fiji turn out to be a dual gpu card.

      • chuckula
      • 5 years ago

      If you watch the video with David Kanter discussing the R9-390X, he basically says that Fiji appears to be an evolved form of the Tonga GPU. That doesn’t mean that it’s literally two Tonga dies glued together, just that Fiji is an extension of the basic architecture in Tonga.

        • ImSpartacus
        • 5 years ago

        Let’s hope to God that it’s not two duck taped tongas. My heart couldn’t handle it.

        • Ninjitsu
        • 5 years ago

        I just meant a full Tonga chip, like Scott’s been looking for/predicting. Wouldn’t be surprised if they [i<]also[/i<] released a dual GPU version, though.

    • Ninjitsu
    • 5 years ago

    Again, from AT:
    [quote<] Meanwhile AMD and their vendors will over the long run also benefit from volume production. The first interposers are being produced on retooled 65nm lithographic lines, however once volume production scales up, [b<]it will become economical to develop interposer-only lines[/b<] that are cheaper to operate since they don’t need the ability to offer full lithography as well. Where that cut-off will be is not quite clear at this time, though it sounds like it will happen sooner than later. [/quote<] Will Intel want in here? They are 40-50% foundry after all.

      • Klimax
      • 5 years ago

      I don’t think they have them many older foundries left over and assumes they don’t have available resources already fully occupied by KN*.

      • psuedonymous
      • 5 years ago

      Intel are backing HMC (Hybrid Memory Cube), an alternative to HBM that does not require an interposer due to the use of an under-stack IC.

    • Ninjitsu
    • 5 years ago

    I have a question: GDDR5 is based on DDR3, right? So why not have GDDR6/7 based on DDR4? DDR4 does provide for higher bandwidth over DDR3…

    I mean, yes, HBM is awesome and I’m not contesting that, but reading AT’s article:
    [quote<] The short answer in the minds of the GPU community is no. GDDR5-like memories could be pushed farther, both with existing GDDR5 and theoretical differential I/O based memories (think USB/PCIe buses, but for memory), however doing so would come at the cost of great power consumption. In fact even existing GDDR5 implementations already draw quite a bit of power; thanks to the complicated clocking mechanisms of GDDR5, a lot of memory power is spent merely on distributing and maintaining GDDR5’s high clockspeeds. Any future GDDR5-like technology would only ratchet up the problem, along with introducing new complexities such as a need to add more logic to memory chips, a somewhat painful combination as logic and dense memory are difficult to fab together. [/quote<] DDR4 lowers the voltage, too.

      • ImSpartacus
      • 5 years ago

      That’s an interesting point. I’m sure there are good reasons. I’d be interested in hearing what they are.

      • MathMan
      • 5 years ago

      It’s all about hard physical limits. Here’s a bit of a mishmash of issues.

      – at 10 GBps, in a vacuum, your signal can travel only 30mm at the speed of light during the time of one symbol.
      – on PCBs, speed of light is even less. Depending on the dielectric used, it’s about half, 15mm!

      Now that’s not an issue by itself, but it illustrates that you’re talking about a pretty extreme environment.

      The problems arise when you want to transmit a pulse in that environment. The dielectric constant changes with frequency, so when you transmit an edge (which has multiple frequency components) it will spread out into something less than an edge as the distance increases. One edge will smear into the next edge. They become entangled. What you get is a closing of the signal eye: it becomes impossible to distinguish the zero of one symbol from the one of the next. In addition to that, you also get reflection, making things even harder.

      There are various things you can do about that, but they are expensive to implement. That’s fine for a low number wire cables (think HDMI, DP), but not for a memory interface that has hundreds of them.

      DDR4 doesn’t run at a higher speed than GDDR5 (it’s much lower actually), so that doesn’t solve anything. PCIe is running against exactly the same limits as well. It’s a wall dictated by pure physics.

        • Ninjitsu
        • 5 years ago

        You’re talking about inter-symbol-interference on copper, yes? 😛

        (feel free to go EE on me)

        Thanks for the explanation though!

          • MathMan
          • 5 years ago

          Yes. But for the life of me, I can’t understand why a technical explanation has been down voted…

            • derFunkenstein
            • 5 years ago

            because it didn’t suit the monetary interest of one party or another. I’ve undone that damage.

    • Jigar
    • 5 years ago

    I don’t know but while reading this article I could hear – Doom 3’s dialogue – “Amazing things will happen here soon, you just wait”

    • USAFTW
    • 5 years ago

    I wish there was a way that AMD could charge NVIDIA for HBM, as it appears that they’ve done much of the heavy lifting, despite their limited resources. They also gave opened up MacDonald’s vault when GDDR5 became status-quo.
    Seriously WTF NVidia? You make GameWorks, crap that shouldn’t exist, according to Johan Andersson (https://twitter.com/repi/status/452812842132332544), yet you refused to participate in HBM development, they only memory standard viable for you to make your GPUs in the future?

      • Klimax
      • 5 years ago

      So, you want to argue that TressFX and co like Mantle shouldn’t exist too, right?

        • Sencapri
        • 5 years ago

        TressFX is open to everyone and was optimized for both Nvidia/AMD. I’m pretty sure Mantle technology has been passed on to Kronos which turned into Vulkan and supports both Nvidia/AMD. This would have never been done by a proprietary company.

          • Klimax
          • 5 years ago

          That’s good joke right there. It was never optimized for NVidia! NVidia had to put optimization (like with everything else) into driver proper. (IIRC there were few changes on side Crystal Dynamics but major work had to be done in drivers)

          As for Mantle, they were essentially forced to do so after they repeatedly failed their own promise to open it. The only thing they ever released beside tech demos like Thief was documentation long after their own deadlines after they abandoned it.

            • USAFTW
            • 5 years ago

            At least Nvidia had access to source code that was put directly on AMDs website, which is more than could be said for GameWorks.

            • Klimax
            • 5 years ago

            Bit rarity, but not required. In fact often bad idea to really on it, because more important is final output because compiler can make quite interesting things with it. And final assembly is done by GPU drivers to adapt it for HW itself. And I haven’t seen anything similar LTCG or similar in there, so code won’t tell you much more anyway. (CreateVertexShader)

            • Sencapri
            • 5 years ago

            “That’s good joke right there. It was never optimized for NVidia! NVidia had to put optimization (like with everything else) into driver proper.”

            But that’s the thing, they could properly optimize their drivers given that fact they had access to the source code. AMD needs the source code from Nvidia Game-Works in order to have proper optimizations in their drivers too.

            • Klimax
            • 5 years ago

            I seriously doubt they used source code for TressFX. Far more important is used compiler and resulting assembler output. Reason is, there are some interesting options for compiler affecting output and thus to rely on developer using only one set is generally recipe for “fun”.
            (https://msdn.microsoft.com/en-us/library/windows/desktop/hh446872.aspx) Also there are multiple versions of compiler provided by Microsoft., don’t know how are differences though. (Didn’t have time to check it)

            Let’s face it, AMD doesn’t have enough engineers to get drivers there and thus has to rely on PR to deflect blame. Same thing with Project CARS. They still have no DCL for DX 111 in drivers and consequences come knocking.

            Although in case of Witcher, AMD driver team is somewhat not to blame, but whoever decided that GCN doesn’t need strong tessellation performance. (Only with Tonga they fixed finally it)

            • A_Pickle
            • 5 years ago

            No, what’s a good joke is the idea that AMD has to not only provide the low-level source code that they, with their [i<]extremely[/i<] limited resources (which frequently make them the butt of many jokes in the industry), paid for to develop... ...but that you're apparently STILL unsatisfied until they OPTIMIZE THEIR COMPETITOR'S PRODUCTS FOR IT. THAT, my friend, is top kek. Hilarity. You could be a stand up comedian, if only people knew WTF you were talking about.

          • MathMan
          • 5 years ago

          [quote<]TressFX is open to everyone and was optimized for both Nvidia/AMD...[/quote<] Cough. Tomb raider. cough. AMD is a company that tries to exploit competitive advantages as anyone. It just so happens that they have less resources to develop them. Closed Mantle was just another example, until they realized that it didn't catch on and they gave it away to Khronos.

        • USAFTW
        • 5 years ago

        I don’t care too much for TressFX, it was a waste of resources (both to AMD and our GPUs, whichever vendor).
        As for Mantle, there are two ways of looking at things.
        1. AMD is right at saying that it was the first low overhead API in development. If so, I’m glad Mantle used to exist. It started or accelerated the development of DX12 and Vulkan.
        2. Nvidia is right and DX12 was in development years prior to Mantle. In that case, Mantle was useless garbage that only served to waste AMDs resources and give them an edge.
        Now it depends on you whichever side you trust, but I find AMD more trustworthy (GTX 970 ring a bell?)

      • psuedonymous
      • 5 years ago

      [quote<]yet you refused to participate in HBM development, they only memory standard viable for you to make your GPUs in the future?[/quote<]What gave you that idea? Nvidia are part of JEDEC who worked on the HMB standard, and Nvidia have already shown off early Pascal samples with HBM 1 and intend to release Pascal with HBM2.

        • USAFTW
        • 5 years ago

        They sure have been awful quiet about it. I hope you’re right though. They shouldn’t be allowed to piggyback AMD.
        However, after the 2008 debut of GDDR5, it took Nvidia a while to implement it WELL (Kepler GTX 680). Earlier implementations were limited to around 4 Gbps.
        Also, the reason why the 4870 was such a hit was because it used GDDR 5 and didn’t, like Nvidia, stick to a complex 512-bit GDDR3 bus.

        • Pwnstar
        • 4 years ago

        JEDEC didn’t work on it, AMD and Hynix did. It was then submitted to JEDEC with the understanding that their patents on it would be RAND terms.

      • K-L-Waster
      • 5 years ago

      The only way they could do that would be to make HBM a proprietary technology instead of an open standard.

      Unless you want AMD to pull a Rambus, that is….

        • USAFTW
        • 5 years ago

        No, that’s not what I mean. I feel a bit sad that it’s just AMD that has to invent new Graphics memory tech and NVIDIA use it freely. Happened with GDDR5 and the article hints at it being the case with HBM.
        Obviously, a proprietary HBM would be terrible for everyone, not to mention AMD doesn’t have the market penetration to make it proprietary.

          • MathMan
          • 5 years ago

          AMD didn’t invent GDDR5. It was a joint effort by multiple partners in the memory industry. HBM is no different: AMD doesn’t make memory. They don’t make interposers. They don’t make TSVs. They don’t make substrates. They obviously were a partner in developing an open standard. That doesn’t make them inventors who can claim the rights of others.

            • USAFTW
            • 5 years ago

            Inventing something is different from commercializing it. The Wright’s brothers invented the idea, Boeing and Airbus commercialized it.

            • MathMan
            • 5 years ago

            Is there a point to your statement?

      • MathMan
      • 5 years ago

      What part of your brain came up with the idea that “Nvidia refused to participate in HBM development”?

      If Nvidia isn’t doing an HBM1 product and starts with HBM2, it could be because HBM1 has too many disadvantages: not enough memory, speed improvement too low, too costly, and most important: bad timing, though that not HBM’s fault.

      If Fiji is going to perform only a bit faster than Titan X, then that will prove that memory bandwidth isn’t as much of a constraint for the amount of compute power that can be put in 28nm.

      HBM (1 or 2) will truly shine on 16nm or smaller.

    • HERETIC
    • 5 years ago

    NICE
    After 390X- Top end mobile could really benefit from this-Well all mobile could
    benefit-But top end could absorb early costs…………………..
    If 50% power saving is accurate combined with 16nm GPU-We could finally see
    top end graphics on lappys at a sane power level……………………..
    Interesting time ahead………………………….

    • TopHatKiller
    • 5 years ago

    Thanks, Mr.Wasson for giving credit where it is due. AMD innovations are usually ignored; so a little honesty makes a nice change.
    I do appreciate how the complex issues about HBM have been presented in such an easily digestible and understandable manner. That makes a change from the terrible technical discussions I’ve read to date.
    Once small disagreement: there is simply no evidence at all that Hawaii exhibited any specific worrying power/density issue…. only that AMD’s reference cooler was… crap.
    Not unusually crap, they are almost always crap, Nvidia’s too [of course.]
    Decent aftermarket coolers fixed any and all heat problems with Hawaii. Integrated-invalid water closets were not at all necessary, nor will they be on Fiji. But they look nice in marketing.
    And everyone concerned can [i<]pretend[/i<] they are expensive - they are, of course, cheap as chips. Forgot to say!: Thanks!

    • Voldenuit
    • 5 years ago

    [quote=”Techreport”<]One problem with HBM is especially an issue for large GPUs. High-end graphics chips have, in the past, pushed the boundaries of possible chip sizes right up to the edges of the reticle used in photolithography. [/quote<] Wait. If the HBM only directly connects to specialized PHY circuits at the edges of the GPU die, what's to stop them from using 2 or 4 smaller interposers (each interposer has connections to the package substrate and to 1-2 corners or edges on the GPU die) instead of having to use 1 big interposer? EDIT: To AMD: No charge. But if you find my suggestion useful, I wouldn't be averse to finding a shiny R9 390X card in my mailbox. For, uh, evaluation. "Lifetime evaluation" of the card. :p

      • chuckula
      • 5 years ago

      [quote<]hat's to stop them from using 2 or 4 smaller interposers (each interposer has connections to the package substrate and to 1-2 corners or edges on the GPU die) instead of having to use 1 big interposer?[/quote<] Technologically that could work. Economically that's more complex & expensive to build. Additionally, it probably wouldn't help yields.

      • TopHatKiller
      • 5 years ago

      It is not quite right to say that high-end gpu’s have pushed to the boundaries of the reticule.
      No chip, as far as I’m aware have done so. The limits are manufacturing cost and yield for the commercial market; not the limits of what is possible. The difference is of course, irrelevant in reality, so just go on and ignore what I just said.

        • chuckula
        • 5 years ago

        The highest GPUs may not exactly hit the literal reticule limit at a foundry like TSMC, but they come AWFULLY close and are definitely the largest practical chips that a place like TSMC ever produces.

        • Voldenuit
        • 5 years ago

        It’s about the interposer hitting the reticule limit, right? Not the GPU itself.

        Which is built on some old-ass process that probably has excess capacity and probably benefits from having the extra work rather than getting mothballed.

      • psuedonymous
      • 5 years ago

      [quote<]what's to stop them from using 2 or 4 smaller interposers (each interposer has connections to the package substrate and to 1-2 corners or edges on the GPU die) instead of having to use 1 big interposer?[/quote<]Alignment. Trying to line up a single BGA chip over the edge of multiple PCBs would be a massive headache. Trying to line up a die over the edges of multiple interposers would be a Severe Embuggerance. Then you have the handling issues once you've bonded them (you might end up needing yet another packaging layer just to keep the assemble from snapping when lifted) and differential thermal expansion issues..

        • pasta514
        • 5 years ago

        because Intel has the patent on this idea and there is NFW that anyone else will be allowed to?
        [url<]http://patentimages.storage.googleapis.com/pdfs/US20140070380.pdf[/url<]

    • Ninjitsu
    • 5 years ago

    Pretty cool tech! I don’t know what it means for AMD, though. HBM will remain expensive because Nvidia will be jumping straight to HMB 2, and it naturally makes sense for AMD to shift at the same time as well.

    How do they increase bandwidth, though? Higher and more numerous stacks? And I guess next year onward 2GB will be minimum for GPUs, with 4GB standard at $200 and 8GB min above $300.

    [quote<] HBM made its way onto Nvidia's GPU roadmap some time ago, although it's essentially a generation behind AMD's first implementation. [/quote<] Well, not really, what's written on their roadmap is a generation ahead, but it'll be adopted at the same time as AMD (hopefully) adopts HBM 2.0. [quote<] Macri hinted that even the data flow inside the GPU itself could be optimized to take advantage of data coming in "in a very concentrated hump." [/quote<] I assume Pascal will do exactly this, going by Nvidia's track record. I hope, for AMD's sake, that their GPUs manage it too.

    • cygnus1
    • 5 years ago

    I realize this is solid state, but any discussion with Macri about HBM 3D stacked chips being any more sensitive to heat damage than traditional 2D chips? When they talk about the individual die being able to flap when held in the hand, it makes me worry that lots of heating and cooling related expansion and contraction could cause damage over time.

    • SecretMaster
    • 5 years ago

    Definitely an interesting approach. Rather than making a small pipe flow really fast, it’s a wide pipe that flows really slow. I’m not much of a engineer, but has this been thought of before? To me it seems like a pretty major structural revision, but again I’m not too keen on how all this stuff works.

      • Sam125
      • 5 years ago

      Yeah, that’s the correct analogy but the net effect is that the wide pipe is so much wider that as long as the data reservoir (the system <-> GPU) can saturate the bus, the slower moving but wide pipe will push greater than 400% more data than the small but fast moving pipeline.

      Another added benefit to the wide/slow approach is that voltages will be lower so theoretically HBM should be more reliable than GDDR5 as long as the heat density of the stacks don’t make them too difficult to cool.

    • puppetworx
    • 5 years ago

    I can’t wait to see how this performs in Damagelabs, anticipating all kinds of gainz.

    • DragonDaddyBear
    • 5 years ago

    If AMD took the memory tech in Tonga and put it in the R9 390X wouldn’t that help the “limited” 4GB memory problem? I remember the R9 285 being more efficient in bandwidth but wasn’t that because of the compression? Wouldn’t that help with the memory usage?

      • MathMan
      • 5 years ago

      The compression only helps reduce to bandwidth. AFAIK it doesn’t reduce memory usage.

        • Meadows
        • 5 years ago

        Intuitively, one would think it does.

          • MathMan
          • 5 years ago

          Only when you don’t consider the technical consequences.

          Ask yourself the question: how do you do size compression on a blob of data that requires random read/write access? You need to be able to access your data in O(1) time. That’s impossible when the compressed data can end up anywhere in memory as if it were a zip file.

          When you look at the slides that Nvidia showed during the launch of Maxwell, they clearly work on pixel rectangles of 8×8 pixels being compressed to either 2:1 or 8:1. I think they go over each rectangle individually, compress it on chip, and then write out half or one eight of that data for that particular rectangle. And voila: bandwidth compression without compressing the amount that needs to be allocated in memory. And random access in constant time.

          Google for “hardocp color compression and memory efficiency”.)

            • homerdog
            • 4 years ago

            This is correct. This type of compression cannot reduce storage requirements (obviously if you think about it).

            Edit: However AMD is apparently working on something for Fiji that will reduce storage requirements (will use VRAM more efficiently). This will be different than (and likely in addition to) what is present in Tonga.

            • K-L-Waster
            • 4 years ago

            In principle sounds like a good idea, for exactly the same reason that we now use JPG and PNG instead of BMP. Why tie up VRAM with raw pixel values if there is a more efficient way to handle them that provides the same results with a smaller footprint?

            • MathMan
            • 4 years ago

            It sounds like a great idea, but that doesn’t mean it’s feasible.

            They can probably be a little bit smarter about which textures to swap in and out of the local memory, but that’s about it.

            They can’t go the compression route like JPG and PNG, because that results in a loss of random access. They can’t do texture compression on a frame buffer because that’s computationally very expensive and can result in huge quality loss (because it’s fixed ratio compression at all times.)

            Maybe they can play some tricks with loading textures partially in memory, but that seems like a corner case.

    • mikato
    • 5 years ago

    [quote<]"Fiji is a fitting companion for HBM since a team of engineers at AMD has once again helped lead the charge in its development" "Making this sort of innovation happen was a broadly collaborative effort. AMD did much of the initial the heavy lifting, designing the interconnects, interposer, and the new DRAM type. Hynix partnered with AMD to produce the DRAM, and UMC manufactured the first interposers. JEDEC, the standards body charged with blessing new memory types, gave HBM the industry's blessing, which means this memory type should be widely supported by various interested firms."[/quote<] So does AMD not get any IP ownership from developing this? Anybody know anything about the IP ownership?

      • VincentHanna
      • 5 years ago

      You can own a patent that is standard-essential, you just can’t monetize it very effectively.

      • MathMan
      • 5 years ago

      I’m sure they have plenty of patents. But none of that matters when:
      – your partner (Hynix) is only willing to work with you if they can use the technology broadly. AMD by itself is too small of a volume player to warrant exclusivity.
      – your competitors have a thousands of other parents that would crush you in a game of mutual assure destruction patent warfare.

      • _ppi
      • 5 years ago

      Looks like AMD did not have the rersources to complete this on their own and then start charging ridiculous roalties …

    • Sam125
    • 5 years ago

    I’m going to have to agree with Mr Kanter on the size limit of Gen1 of HBM of 4GB sounding rather arbitrary. It sounds like AMD is completely ok with just limiting Gen1 to 4GB simply because that corresponds neatly with using 4 HBM stacks. Then Gen2 will double that to 8GB while Gen3 should double or quadruple that with Gen4 being logarithmically larger than Gen1.

    I guess that’s fine since Gen1 is going to be an AMD-only show but that really highlights the difference between Intel and AMD, IMO. Intel could’ve easily thrown a few extra engineers at the problem to make it a non-issue but AMD clearly has to be much more thrifty with how they spend their engineering man-hours.

    I could rant about this from an engineer’s POV but I don’t think I will. 😉

    • ptsant
    • 5 years ago

    Great article, made things much clearer. Thanks a lot. We expect benchmarks ASAP.

    • VincentHanna
    • 5 years ago

    Very interesting, depending on how aggressively AMD pushes HBM, MFGs who are currently demanding tons of ram may need to re-evaluate how they qualify those statements.

    There are a few things that I don’t necessarily understand(or think the article could have expanded on).
    1) Why is AMD so down on having relatively untouched data on the GPU? I mean I get that it’s not “efficient” but what does that actually mean when you are talking about i/o latency and whatnot? Presumably, having 12GB of GDDR5 on the titan X, filled with maps and textures and who knows what, would be better/faster than having 2GB of “really active” data on the card, and having the other 10GB of stuff hanging out in GDDR3 land, right? Granted, its better than the 32[b<]MB[/b<] of eSRAM on the Xbone, but it's still essentially the same problem. Ram > CPU > GPU > CPU > RAM is an incredibly slow and congested route. Bidirectional communication between just the GPU and the CPU via PCIE 3/4 is much better, no? 2) Building on 1) is there any desire/use for hybrid solutions so that cards can hit those 6GB+ high water marks between now and whenever HBM2 is announced(it sounds like HBM2 will solve this problem)? 3) Does this have any real benefit for APUs, CPUs, phones, cars, toasters, etc? or is GDDR 2/3 good enough for low-power stuff that we probably won't see the changeover until gddr 2/3 becomes so legacy that they start retooling their old factories for HBM3 ^_)^

      • NoOne ButMe
      • 5 years ago

      1. I suspect the reason they’re so down on having that Data on the GPU VRAM is that putting more stacks than your bus has room for probably is really hard and/or expensive.

      2. HBM2 increases the max density 8 times iirc. So, a HBM2 card could have 32GB of VRAM. From my understanding. I’ve also seen 4 times density posted. So, it might “only’ increase it to 16GB. Don’t think that anyone want to mix and match as that would be a nightmare.

      3. It has benefits for APUs certainly. I think for CPUs, phones, etc. it won’t have much impact in the short term. Possibly it could in the long term.

      • Action.de.Parsnip
      • 5 years ago

      Having untouched data in the memory probably does very little to affect performance given how GPUs smear data across as many memory channels as possible. Having a limit of 4gb on a 4K-targeted GPU makes this all relevant suddenly I presume.

    • Mat3
    • 5 years ago

    How long until Rambus comes along and says ‘we own this’?

      • NeelyCam
      • 5 years ago

      They might, considering this looks very much like HMC, and [url=http://www.avsusergroups.org/joint_pdfs/2013_6Li.pdf<]Rambus is involved in HMC[/url<] Rambus might even have patents on these technologies, and valid claims. I know it's popular here to call Rambus a patent troll, but they do research and development on memory and data links, and I think they have a right to defend the IP they spent a lot of time, money and effort to develop.

    • HisDivineOrder
    • 5 years ago

    I’m excited about this memory technology and its implications with regards to discrete GPU’s and integrated GPU’s. Hell, even what it might mean for larger cache levels for CPU’s.

    I think next year is going to be exciting. Based on this article, I think HBM is early if it’s going to mean cards with only 4GB of VRAM. Yes, AMD promises that they went from no engineers working on VRAM usage to two engineers, but… well, this is AMD. They’re the ones that completely ignored DX11 multithreaded development because we didn’t have any benchmarks that actually tested that even though it had deep impact on games.

    That is, AMD has proven they don’t fix a problem unless people start talking about that problem a LOT. Do I trust them to be the early adopter on a card with a relatively small amount of high speed memory? Especially when the problem of a small amount of VRAM is going to be essentially solved next year by having more speed AND more capacity? How much effort are they really going to put into solving the problem when it’s a problem solved by just waiting until next year? That’s nVidia’s solution. Wait until the technology is mature and ready to be used.

    And that makes sense imo. I think AMD is pushing this out too soon if it can’t scale up to 8GB and it’s going to undermine any effort to use these cards for 4K gaming, which is ostensibly what you’d pay the big bucks for.

    As a result, I’m looking forward to HBM when it’s used in products that include 8GB or more VRAM. It’ll be awesome then. Right now, it feels like they took a prototype of what could be done and are going to try and sell it for high dollar (ie., $600-700-ish) with an absurdly low amount of VRAM.

    That essentially then paves the road for nVidia to release a 6GB or even a 12GB version of a Titan X-based x80 series card that doesn’t stutter when it hits the limits of VRAM at 4K resolutions.

    People will say, “Well, AMD knows this. They have to have a solution!” That’s the sense I get from this article, too. Except AMD always says they have solutions and the solutions are always down the road to come. I just don’t have faith in their ability to fix problems quickly at all and I expect they’ll try and ride the HBM train to next year when they’ll trumpet that everyone who wants to upgrade really should get a card with more than 4GB of VRAM.

    Wow, what a surprise twist, that.

      • _ppi
      • 5 years ago

      4GB will certainly limit them in workstation cards.

      But in games? Radeon 290X barely shows noticeable improvement with 8GB vs. 4GB even in 4K.

        • NoOne ButMe
        • 5 years ago

        Depends on the game, and framerates you’re looking at. I’m sure you can make a 290x with 8GB over twice as fast in 2160p benchmarks… I’m also sure the FPS difference will be 1 for the 290x w/ 4GB and 2 for the 290x w/ 8GB. XD

        • HisDivineOrder
        • 5 years ago

        I don’t think the R9 290X is fast enough to push 4K games even with a larger VRAM. I’m hoping that, given the givens, that the R9 390X will be. If it is, then 4K with the AA most reviewers use plus ultra resolution memory packs should be pushing VRAM well above 4GB in many of the most high end games to come this year and especially next.

        I just don’t think selling a 4GB card for anything between $500-1k is going to be a very good prospect going forward. I think if you want a card that pushes 1600p or less, it’ll probably be great for the near future, though.

        I just don’t think people who spend $700-ish on a GPU are going to want to game at less than 2160p or at “only High” settings instead of Ultra. And I am hoping that if there is a bottleneck preventing that, it’ll be amount of memory because if it’s anything else it’ll be annoying…

          • _ppi
          • 5 years ago

          I never claimed 290x is fast enough to play at 4k, but the 8gb vs 4gb benchmarks have shown like 27 vs 25fps.

          TitanX is not fast enough to play at 4k with “Ultra” today, so what is the issue if Fiji can’t do it next year?

          • auxy
          • 5 years ago

          [quote=”HisDivineOrder”<]I don't think the R9 290X is fast enough to push 4K games even with a larger VRAM.[/quote<]But you know that it is, right? Even without larger VRAM. At 4GB, I can run the majority of games in my Steam library with little issue at 4K, even 60+ FPS, and even recent titles like Dark Souls II: Scholar of the First Sin, Firefall, and Warframe! Battlefield 4, Crysis 3, Metro 2033, and other excessively-demanding titles are a tiny percentage of the gaming industry, and do not represent the majority of games. (´・ω・`)

            • Klimax
            • 5 years ago

            They better hope that console ports wont torpedo that… (Watch_dogs and co)

            • nanoflower
            • 5 years ago

            I don’t understand what you are suggesting. Next-Gen consoles aren’t preventing any developer from targeting 4k. In fact they are making it easier to support 4k gaming given the suggestion that some developers are holding back the graphic capabilities of their games to keep the PC version looking similar to the Xbox One/PS4.

            Just look at the complaints about the Witcher 3 and even a statement (done anonymously but verified) that CD Project Red dialed back the capabilities in TW3 from what they showed previously to what they released to support a single build that works on next-gen consoles and the PC. So the game could have pushed the performance edge even more if they had built a version solely for the PC.

            • Klimax
            • 5 years ago

            8GB of shared RAM for both in consoles. I thought that hint of Watch_dogs size would suffice. There are games which simply love VRAM, GPU arch regardless.

        • Melvar
        • 5 years ago

        Stuttering doesn’t reduce framerate all that much.

    • sweatshopking
    • 5 years ago

    So again, AMD makes an ENTIRE new standard that the ENTIRE industry adopts, without paying them a dime, and again, AMD is losing money.
    IF these guys go down, are we anticipating Nvidia is going to step up and start funding these huge projects? Vulkan? GDDR5? HBM? x64? (i realize x64 isn’t useful to nvidia) they profit off of AMD’s work, but actually don’t contribute NEARLY as much to industry standards.

    AMD has other problems, but they sure do a lot of industry work that benefits their competition.

      • DarkMikaru
      • 5 years ago

      I was thinking the exact same thing. Again they are leading in innovation but how is it just not paying off? Don’t they have licensing over this kind of thing? As I’d love to see Intel have to pay for this new tech. Not like they couldn’t afford it instead of re-inventing the wheel.

        • Mat3
        • 5 years ago

        It really is a shame that other companies that will use it don’t step up and help fund the R&D. But I guess if you don’t make it an open standard, it won’t be widely adopted and that’s what’s needed to bring prices down.

        • Topinio
        • 5 years ago

        AMD don’t need Intel to pay for this, they need end-users to. Meanwhile, Intel and Nvidia need to discourage users from doing so, by hook or by crook…

          • jihadjoe
          • 5 years ago

          The end user always pays, whether it’s from buying AMD’s own products or license fees from other companies.

        • BobbinThreadbare
        • 5 years ago

        Intel and AMD have a cross licensing agreement.

        So Intel “pays” by AMD having access to all their x86 patents.

          • jihadjoe
          • 4 years ago

          Intel also does a lot of heavy lifting to create industry standards itself, but you could argue that it’s expected because they are an industry leader.

          Nvidia really ought to start doing so as well. It’s been decades since they were any sort of underdog. IMO, far from JHH’s idea that it would be tantamount to giving away their technology, Nvidia standardizing their stuff can only boost their status as a proper industry leader.

      • NoOne ButMe
      • 5 years ago

      Yup. Another funny one, ATI did the majority of the work on GDDR5 with JEDEC helping a bit and getting it to be a standard, than Nvidia launched the first GDDR3 card :D.

      I find that amusing. However, it just shows that designing the technology does not always give you an advantage with it. GDDR5 meanwhile, shows that having a hand in the design can give you an advantage with it.

      • chuckula
      • 5 years ago

      AMD wanted to push this technology for their own GPUs but it is built on a bunch of existing memory technology that was already out and about in the industry.

      Believe it or not, you need a RAM company to work on standards for RAM.

        • sweatshopking
        • 5 years ago

        OF COURSE IT IS.

        [quote<] Fiji is a fitting companion for HBM since a team of engineers at AMD has once again helped lead the charge in its development. In fact, that team has been led by one of the very same engineers responsible for past GDDR standards, Joe Macri [/quote<] [quote<] Macri said the HBM development effort started inside of AMD seven years ago, so not long after GDDR5 was fully baked. [/quote<] AMD has created a team of engineers that has come up with a new way of implementing memory, and created a new standard. of course there are others working on it, but according to my reading of this article they've done the bulk of the effort in making it work.

          • chuckula
          • 5 years ago

          [quote<]but according to my reading of this article[/quote<] LMFTFY: according to your reading of an article... that is an interview with an AMD marketing guy... yadda yadda yadda. Believe me, AMD deserves some credit for pushing this particular type of memory for their GPUs, but HBM is a long LONG way from being some unique invention that AMD ginned up out of thin air.

            • sweatshopking
            • 5 years ago

            [quote<] I recently had the chance to speak with Macri [/quote<] wait, is Macri marketing or engineering? [quote<] In fact, that team has been led by one of the very same engineers responsible for past GDDR standards, Joe Macri. [/quote<] TR makes it sound like he's an engineer. Also I never said AMD invented it out of thin air. I said they've created their new HBM standard, which may or may not be true, the article isn't clear. It also sounds like nvidia will adopt HBM2, not 1. Obviously stacked ram isn't an AMD invention, there are multiple approaches including HMC, which rambus' work was already linked by Tiffany. IF amd did the bulk of the work on the HBM JEDEC standard, which it sounds like they did, and nvidia is using the second gen HBM jedec standard, which AMD did the bulk of the work on, than my point stands. I'm not sure. I don't read or understand as much of this stuff as many of the guys here. I basically just like all caps.

            • MathMan
            • 5 years ago

            If an employee is allowed to speak to the press, he’s doing marketing work. Most technical marketing people at companies like this will have an engineering background.

            • sweatshopking
            • 5 years ago

            yeah, i don’t really agree with you there.

            • DancinJack
            • 5 years ago

            you should. he’s right.

            • A_Pickle
            • 5 years ago

            I would argue that MathMan’s downvotes are unwarranted, and DancinJack’s sentiments are correct. AMD’s execs weren’t unaware that Joe Macri would be speaking with TechReport, an independent, journalistic publication that is pretty committed to honest reporting — they have on MANY occasions reported (with heavy hearts, no doubt) on AMD’s failings (and probably been too soft on them, to be honest).

            Joe Macri is talking it up, or he wouldn’t be talking at all. That’s not to say he’s LYING, he probably isn’t, and bless AMD for this. God damn they need an ace in the sleeve.

            • sweatshopking
            • 5 years ago

            Macri said AMD has been the primary driver of HBM memory in the industry.

            “We do internal development with partners, we then take that development to open standards bodies and we open it up to the world,” he said. “This is how Nvidia got GDDR3 and how they got GDDR5. This is what AMD does. We truly believe building islands is not as good as building continents

            From pcworlds interview with macri. Is he lying when he says NVidia did get it from their work?

            • MathMan
            • 5 years ago

            He’s not lying when he says that they provide input to the standard bodies. He’s very likely lying when he says that Nvidia doesn’t participate in the process and that they do all the work.

            The cool part is that there’s nobody to contradict him. Not that it matters: if somebody would correct him, you’d call that person a liar.

            Such is the power of preconceived beliefs.

            • A_Pickle
            • 5 years ago

            Yes and no. I’d sure like to be able to point to a single technology Nvidia pioneered and made open, in good faith, to the rest of the industry.

            • MathMan
            • 5 years ago

            That’s a different topic: Nvidia tries to monetize inventions that they can develop and productize on their own. They do this because they are a commercial enterprise that likes to make money. AMD gives it away. They don’t make money.

            But this doesn’t apply for essential building blocks like RAM technology, which I thought was being discussed here.

            Neither AMD or Nvidia have DRAM fabs, or interposer technology etc. No matter who it is, they rely on partners who do the heavy lifting. It’s disingenuous of AMD to claim that they invent everything: what is there to invent on their side? I’m fine with them claiming that they spurred their partners into developing the technology.

            • sweatshopking
            • 5 years ago

            Preconceived beliefs?!?! Do you even know my opinion of amd? Apparently you don’t keep up because I’ve been dumping on them for years.

            I don’t think he’s suggesting nvidia has nothing to do with it. I think he’s suggesting amd does the bulk of the work, and that nvidia benefits from their efforts. I think that’s likely true, otherwise, i’m sure nvidia would respond.

            • MathMan
            • 5 years ago

            AMD doesn’t own a fab to make, so they didn’t develop interposer technology or TSVs. They don’t own a substrate company. They don’t own DRAM technology. Yet that’s where all the innovation is. The main memory controller is likely very similar to a conventional GDDR5.

            If the partners are responsible for all the technical work, how can it be AMD that did most of the work?

            • sweatshopking
            • 4 years ago

            Are you suggesting that hynix just brought them a stick of HBM and said “THERE YOU GO, ENJOY IT!”? That would be as equally false as if i were to claim AMD invented it on their own. They clearly worked together on the project, hence the term “partners” which you yourself used. It sounds however, like AMD did most of the “partnering”, not nvidia, and that’s basically my point. unless you have something suggesting otherwise, besides “amd says it so it’s not true” your point is irrelevant.

            • MathMan
            • 4 years ago

            No, I’m suggesting that they say around at a table, hashed out a bunch of requirements. That Hynix came back with a spec and that they hashed out the details.

            And the Hynix actually implemented the spec and gave it to AMD to test. Same thing with the interposed etc.

            I’m not saying that AMD didn’t do anything. They did. But they make it sound like they are the second coming of Christ. They are not.

            The part where they claim ownership of GDDR5 is even more ridiculous.

            • TopHatKiller
            • 5 years ago

            Smoke is on the air; and you are breathing in. S/A posted a prototype of a cancelled AMD gpu with stacked memory [yes…what became hbm] back in about 2012. It is AMD, breathe the oxygen present in the atmosphere and not the b.s. vapours – they will harm your sense of self and reason.

            • K-L-Waster
            • 5 years ago

            Would this be the same S/A that has been predicting Nvidia will go bankrupt any day now for the past 5 some odd years?

            Charlie posting something doesn’t mean it has the slightest resemblance to reality.

            • TopHatKiller
            • 5 years ago

            Yes it’s semiaccurate. But no, they’ve never predicted nvdia’s bankruptcy. [Microsoft’s all the time, but not Nv.] Much of their work however is studiously researched, and given that many people here are quoting/or using sources from the likes of gutter sites and repeat-o-mill ten-a pennies, it’s a bit rich complaining about using semiaccurate.

            • MathMan
            • 5 years ago

            Dream on. Sometime May 2010 he wrote that Nvidia would be dead and gone within 5 years. Can’t find the link right now, but there was some fun a week or two ago with the link about how Nvidia had 3 more week to go.

            • MathMan
            • 5 years ago

            [url<]http://www.semiaccurate.com/forums/showpost.php?p=48497&postcount=10[/url<] Enjoy!

            • TopHatKiller
            • 5 years ago

            I hate it when I can’t find a link.
            Sadly, in this case it’s probably because it’s the community’s imagination over-working.
            The journalistic ethics of s/a either way; are just that; actual probity in their stories.
            Several years ago [no, I can’t find the link*] s/a posted that the 15h family from AMD has been killed. Amd responded, within the hour, saying nonsense – steamroller etc were still in development and future parts are incoming. S/A was…semi-right: server designs were killed, but Amd weren’t actually lying either… Kaveri and so on did come out [still coming out] but s/a were in reality right, and that came from their [u<]actual real[/u<] industry sources. S/A were ages in front of anyone else, and were attacked for talking b.s. about it at the time. It's the difference between properly sourced and researched sites and bs-merchants. We're on a 'proper' site right now, sites like this are being swamped-over by the 'other kind'. It's important to be able to tell the difference. {Some people here don't seem able.} *Sorry, I didn't even bother! I'm a lazy [post deleted].

            • K-L-Waster
            • 5 years ago

            In addition to the link Mathman posted, there is this here:

            [url<]https://semiaccurate.com/2014/05/07/nvidia-going-beat-numbers-quarter/[/url<] Found by scrolling to the bottom of S/A's home page and looking at the first item in Featured Posts. Second sentence of the article: "The short versions is that we told you exactly how they would blow out this and the next few quarters before it all comes crashing down. "

            • TopHatKiller
            • 4 years ago

            Nope, couldn’t find any reference to nv bankruptcy. I do not subscribe to s/a [you must be kidding – that would make me bankrupt] so what’s behind the firewall, I don’t know. I didn’t say he didn’t attack nv, but that he never predicted their bankruptcy.

            • K-L-Waster
            • 4 years ago

            I guess “all comes crashing down” means something different in your language….

            • TopHatKiller
            • 4 years ago

            Well yes, it does. Specifically not ‘bankrupt’ …. Sigh.[Can’t come up with a decent joke. Sorry.]

            • K-L-Waster
            • 4 years ago

            Ah, I see – guess Charlie was predicting an architectural disaster at Nvidia headquarters. Can’t imagine how I missed that….

          • MathMan
          • 5 years ago

          Your reading of the article is very likely to be quite wrong.

      • K-L-Waster
      • 5 years ago

      Well, part of the issue here is they are not in a dominant market position so they cannot enforce changes. If they developed new tech and attempted to patent or license it, there is a significant risk no one would adopt it or develop software and ancillary hardware to support it. This probably isn’t as much of a problem with HBM as it would be with Adaptive Sync, Mantle, etc.

      • HisDivineOrder
      • 5 years ago

      If AMD didn’t give it away, it wouldn’t become a standard. It’s just that simple.

        • nanoflower
        • 5 years ago

        HBM may not fit into that category if it provides the sort of benefits that seem likely. Especially if the additional cost over GDDR isn’t that much. The problem is they don’t own the technology. They worked with a partner to develop the technology and it’s not clear how much AMD did that could be patented. I suspect that they would have tried for licensing fees if they thought they had a shot at a collecting them since they clearly need new sources of income.

      • MathMan
      • 5 years ago

      It’s tempting to look at it this way, and that’s definitely what AMD marketing wants you to do.

      But these things don’t exist in a vacuum. In fact, of all the work that’s needed to make a system like HBM work, the vast majority is done by other parties.

      The memory is done by Hynix, including the the TSVs and the timing controller logic on that’s on the 5th die of the stack. The interposed is done by UMC. Packaging everything together is done by Amkor (or some other packaging specialist). What is really left for AMD? Making a memory controller than can use it. But the memory organization of HBM is very similar to GDDR5.

      Not saying that AMD wasn’t involved, but the hardest parts are none of their speciality.

      It’s ironic that AMD has lost money by focusing on a technology too early (HBM is not very useful for 28nm), when they could have worked on making their architecture more efficient in terms of perf/mm2 and perf/W, like Nvidia did. Fiji will be a Volkswagen with Ferrari tires.

        • HisDivineOrder
        • 5 years ago

        Or AMD could have worked on improving their DirectX 11 drivers in obvious (if not easy) ways like adding/improving their multithreaded performance or doing something like nVidia did with Shadercache. This would have benefited every AMD user across every DX11 game. Instead, they went Mantle. Given that most games that come even for the next two years are going to be DX11, that doesn’t seem like the decision that will benefit most AMD users.

          • MathMan
          • 5 years ago

          I think Mantle is the one (and only?) thing where we really should credit AMD with moving the industry forward.

          It’s unfortunate for them that it will never played out the way they wanted it.

          At least they got the Vulkan consolation prize out of it (which has been postponed long enough to allow the competition to catch up before anything meaningful hits the market.) And thanks to DX12, they can now I gnore optimizing their DX11 driver for concurrency. So that a win on the resources front.

          HBM would have happened now matter what. It’s a logical progression of working around a physical bottleneck. The GDDR5 part makes me laugh.

            • HisDivineOrder
            • 5 years ago

            DX12 doesn’t replace DX11. It exists alongside it. There will still be plenty of DX11 games made when DX12’s heavier requirements are too burdensome for most developers (ie., any developer that made DX9 games in the last two years).

            • Ninjitsu
            • 5 years ago

            They were just the first to make noise about it. It’s been pretty clear that they just went shouting about their implementation of DX12 while Nvidia saw no need to do the same with theirs (instead worked on it).

            And as His Divine Order says, the DX11-esque abstraction layer will still exist in DX12.

            • Klimax
            • 5 years ago

            There’s very good reason for continuation of DX11. Otherwise AMD wouldn’t be able to fix their drivers for better management of memory transparently to games.

        • BigTed
        • 5 years ago

        [quote<]Fiji will be a Volkswagen with Ferrari tires.[/quote<] A Bugatti Veyron then?

          • jihadjoe
          • 4 years ago

          Dude.

      • Airmantharp
      • 5 years ago

      I’d go a step further and say that if Nvidia isn’t involved as a customer, AMD wouldn’t have gotten anywhere with it.

        • flip-mode
        • 5 years ago

        That is probably not the case. HBM benefits a lot more than just GPUs. Going by what the article stated, it’s show up everywhere but cell phones and has potentially high benefits to server chips.

      • UnfriendlyFire
      • 5 years ago

      I believe Rambus had a fast memory technology only a few years ago, and there were rumors that AMD was considering it.

      Except they couldn’t find anyone to fab it, because they sued them previously over the DDR stuff. Whoops.

      • psuedonymous
      • 5 years ago

      [quote<] AMD makes an ENTIRE new standard[/quote<]Hynix doesn't get a look-in for actually implementing HBM? What about the other members of JEDEC (including Nvidia) who worked on the standard?

        • TopHatKiller
        • 5 years ago

        SK Hynix’s contribution was basic. This again, is AMD’s work.
        Nvidia’s fixated by their own margins, and Hung-less is deranged by obsessions with propriety paths & is not interested [if even capable] of contributing to industry-wide technical projects. Running an Empire takes all his energy, apparently.

          • chuckula
          • 5 years ago

          You go on believing that all that fanboy drek is true.

          Bonus points for making it seem like Nvidia’s ability to do this thing called “make a profit” shows that AMD is superior because they don’t engage in that kind of behavior.

          • NeelyCam
          • 5 years ago

          [quote<]SK Hynix's contribution was basic. This again, is AMD's work.[/quote<] How do you know that?

            • chuckula
            • 5 years ago

            Let me phrase it in a way that corresponds to logic:

            [quote<]SK Hynix's contribution was basic. [/quote<] Yeah, they basically built the memory and got it working. [quote<]This again, is AMD's work.[/quote<] It's true that AMD built a memory controller to actually use the memory that SK Hynix built.

            • TopHatKiller
            • 5 years ago

            Regrettably as i’ve sadly concluded from your posts [many of then, anyway] ‘logic’ and ‘chuckula’ are as uncommand bedfriends as ‘hungry zombie next door’ and ‘sleeping peacefully’.

            • chuckula
            • 5 years ago

            Blah Blah Blah.

            Where were your histrionics in favor of Intel when — 2 freakin’ years ago — they actually did develop and manufacture their own eDRAM solution that has been sold in literally millions of systems. Intel sold more Crystalwell equipped chips in January of this year than AMD will sell of HBM equipped cards in 2015.

            Where were your histrionics about how Intel must have invented DDR4 when Intel put out a wide product line that supported DDR4 2 years before AMD? (and we’re still waiting on AMD)

            Where were your histrionics in favor of Nvidia’s memory efficiency when their 256-bit interface cards regularly beat AMD cards with 512-bit wide interfaces?

            • TopHatKiller
            • 5 years ago

            [1]This is getting even more pointless then usual. I was not here two years ago …so?
            Intel’s attempt at edram always confused me; 128bit wide 50mb/s? what was the effing point in that? Nevermind the unbearable cost; almost everyone in the industry avoided it like the plague.
            [2]my statement that clearly have misread/misunderstood is supportable by evidence, your claim is a shrill steam escape from fanboydem.
            [3]which cards, which year, which performance indicators? if you think buswidth = performance then not only are you unfunny, but sad with it.
            …Nope, still effing pointless.

            • derFunkenstein
            • 5 years ago

            I don’tt hink you’ve actually ever said that before. Maybe it was when you were here as spigzone, or maybe as S.T.A.L.K.E.R.

            • TopHatKiller
            • 4 years ago

            I can assure you Mr.Frank-enSTINE! STINE! [gene wilder],
            I have never appeared on TR as anyone other then TopHatKiller. My wisdom is a too rare and beautiful thing to dilute between aliases. I’m confused about you comment. Have a nice day.

            • derFunkenstein
            • 4 years ago

            That’s the most oddly polite comment I’ve ever read. You, too, have a nice day.

            • TopHatKiller
            • 5 years ago

            Hands-up; as a gold-pressed fact I don’t.
            But, I’ve been following what has become known as ‘HBM’ since about 2012. And in all of that time amd was mentioned as the design party, sk-hynix was only mentioned, occasionally, as a manufacturing partner. Hence my conclusion.

          • Airmantharp
          • 5 years ago

          AMD-shill alert.

            • TopHatKiller
            • 5 years ago

            BUY BUY amd shares now. I’m not an amd employee. Is that shilling?

      • Bensam123
      • 5 years ago

      Hah… awesomely realistic post. I’m sure Intel will chip in as well as Nvidia.

      • K-L-Waster
      • 5 years ago

      So what’s the takeaway? “Everyone buy AMD so they don’t go out of business and leave us stagnant?”

      Working on new standards is nice, but if it doesn’t lead to products that are compelling solely from their own merits it is always going to lead to losses. Banking on pity-purchases isn’t a really good business strategy.

      Having said that, *if* they deliver on the things they are talking about with Zen and the 390X, they may actually have compelling products this time around. Hopefully that turns out to be the case this time around.

      • lilbuddhaman
      • 4 years ago

      their drivers still suck

      • torquer
      • 4 years ago

      Did they have a choice?

    • NeelyCam
    • 5 years ago

    Well, look at that. [url=https://techreport.com/news/21807/tuesday-shortbread?post=587661<]Right on time.[/url<]

      • chuckula
      • 5 years ago

      WRONG NEELY!

      It’s early. In order for it to be 5 years we would have to wait until October 2016.

        • NeelyCam
        • 5 years ago

        I don’t usually count months… but yes – if AMD gets this out to the market by the end of this year, NeelyCam was wrong

      • anotherengineer
      • 5 years ago

      I see this below your post LOL

      “dpaus
      Oct 11
      11:23 AM

      “Five years. Mark my words.”

      At which time you’ll post another “Was NeelyCam right?” message to drive your accumulated total of downvotes over (under?) the -10,000 threshold.”

        • sweatshopking
        • 5 years ago

        I MISS DPAUS 🙁

          • NeelyCam
          • 5 years ago

          me too.

          maybe i should start writing all my posts without capital letters

            • sweatshopking
            • 5 years ago

            DUH. OF COURSE YOU SHOULD.

      • Ninjitsu
      • 5 years ago

      Wow, that’s pretty cool.

      • f0d
      • 5 years ago

      hybrid memory cube ISNT high bandwidth memory
      [url<]http://www.extremetech.com/computing/197720-beyond-ddr4-understand-the-differences-between-wide-io-hbm-and-hybrid-memory-cube[/url<]

        • NeelyCam
        • 5 years ago

        The only real difference is the signaling between the CPU/GPU and the memory – HMC has fewer parallel lanes but each operating faster, while HBM has several parallel lanes, operating slower. Both rely on very similar memory stacking configurations. HBM is like HMC v0.9

    • Ryhadar
    • 5 years ago

    Great piece. Thanks for the interesting read.

    • Milo Burke
    • 5 years ago

    I really like the bit about engineers working on making better use of how graphics memory is used.

    – They said it happens in the OS and graphics drivers, not handled by game developers. Good.

    – It’s embarrassingly bad now. So even newbie gains should bring a lot of goodness.

    – I wonder if this work will scale to other non-HMB GPUs for AMD? And will prompt similar work by Nvidia engineers?

    – I’m almost hoping there will be some drama regarding this after the 390x comes out. Maybe Nvidia will sling some mud over being limited to 4 GB. If they do, hopefully AMD will have covered some ground by then and can demonstrate how 4 GB managed correctly can outperform X GB managed incorrectly.

      • MathMan
      • 5 years ago

      I’m very surprised by this mismanaged memory remark. They claim that nobody has seriously looked at it before, but sounds quite ridiculous to me.

      There’s only so much you can do: you can have smarter swapping algorithms between DRAM and local FB RAM, but that’s only going to help so much. And you’d think that such things were already done in the past to cover cases with lower-end GPUs that were already running out of memory.

      It’s going to be interesting to see what they come up with.

        • Zoomastigophora
        • 5 years ago

        I think he’s hinting at the fact that GPUs generally want to pad data up so that memory can be split and stored across as many GDDR chips as possible. This would allow memory accesses to hit as many channels as possible to achieve the highest bandwidth. GPUs also tile textures to improve memory coherency for access patterns common in rendering and tiling usually works better when data can be evenly chunked into…tiles, which could lead to higher overhead as you deal with the rest of the mip chain of a texture. I also don’t know the granularity of addressing in GDDR (how large is a memory page) so there could wastage just in allocation. In general, it sounded like he was hinting that GPUs have arranged data to maximize throughput and minimize latency across memory channels, by trading off memory utilization.

        The part that concerns me is that the new changes for memory allocations are happening in the OS and drivers – AMD software engineers have a pretty poor track record of holding up their end of the bargain, and mixing in cooperation from Microsoft isn’t a recipe for success.

          • MathMan
          • 5 years ago

          The access granularity of HBM is the same or very similar to GDDR5, and pretty small (32 bytes?) I don’t expect anything to be gained there. The page sizes should be similar in size as well. The major difference between GDDR5 and HBM is the physical interface, but the internal organization is pretty much identical.

          Whatever optimizations AMD has in mind is probably at the driver level, and should also work on GDDR5. If there really are some…

      • VincentHanna
      • 5 years ago

      [quote<]- It's embarrassingly bad now. So even newbie gains should bring a lot of goodness.[/quote<] Why? As they also said, adding "ridiculous" amounts of ram allowed them to basically ignore the problem. Memory bus size ballooned with every Dram package, helping with latency and throughput, while the ungodly ram behemoths were able to store everything on-chip in a relatively inefficient, but still very effective way. I don't think much "goodness" will come from dumping all the stuff that [i<]no longer fits[/i<] onto your system ram. If anything their decision to assign engineers to memory management, where before they simply ignored it altogether tells me that they are treating this as a limitation of HBM, at least in the short term...

      • Klimax
      • 5 years ago

      I’m pretty sure Nvidia already had done their homework considering configurations of cards (bus and VRAM). No idea about Intel. (Still performance problematic outside of Iris Pro to be sure)

      Reminder: 970 required highly advanced memory management, so at minimum they would have to solve it at that time.

    • anotherengineer
    • 5 years ago

    The line that really caught my eye.

    ” AMD did much of the initial the heavy lifting, designing the interconnects, interposer, and the new DRAM type.”

    So no patents or royalty fees from others that want to use this design??

      • Milo Burke
      • 5 years ago

      No patents or royalty fees, but they did have to write off $100 million as a “one-time cost” related to patents and royalties.

        • anotherengineer
        • 5 years ago

        And so it begins, the getting old and tired “one-time cost” jokes.

          • K-L-Waster
          • 5 years ago

          Ridicule early and often 🙂

          All kidding aside, this does look in principle like a very promising development.

          • Milo Burke
          • 5 years ago

          We’ll stop joking about it when they stop doing it.

      • NoOne ButMe
      • 5 years ago

      Nope. AMD co-developed HBM iirc.

      • Pwnstar
      • 4 years ago

      No, AMD believes in open standards. That’s why they submitted HBM to JEDEC.

        • MathMan
        • 4 years ago

        Like Mantle! That was open too!

          • sweatshopking
          • 4 years ago

          It was/is open.

            • torquer
            • 4 years ago

            And completely abandoned now as it is completely unnecessary with DX12.

    • Unknown-Error
    • 5 years ago

    So it is limited to 4GB?

      • chuckula
      • 5 years ago

      Theoretically: No.
      As a practical matter in this part of 2015 because of the current chips that are available for products hitting the market: Yes.

    • Tristan
    • 5 years ago

    How do they made connections through interposer ? Is interposer monolitic or multilayer with TSV ?

      • chuckula
      • 5 years ago

      The interposer is a piece of silicon with metallic conductive traces that are formed in a similar manner to the metalization layers that are found in regular integrated circuits. The metallic conductors in the interposer can be routed in 3 dimensions through the interposer (although that’s not a strict requirement).

      In a 2.5D interposer like what you see here there are contacts for the chips on top of the interposer.

    • the
    • 5 years ago

    [quote<]Macri points to server and HPC workloads as especially nice potential fits for HBM's strengths. Eventually, he expects HBM to move into virtually every corner of the computing market except for the ultra-mobile space (cell phones and such), where a "sister device" will likely fill the same role.[/quote<] I'd actually expect this to be used in mobile from companies like Apple. There are trade offs in terms of cost but Apple has generally been on to accept them for an ultimately faster product. Power consumption would be higher than LPDDR4 but not radically so, especially if clock speeds/voltages can be further reduced on the memory bus. Other components like the modem can be brought on to the interposer to free up additional board space. Z-height should also be reduced compared to traditional wire bond die stacking or package-on-package methods.

      • brucethemoose
      • 5 years ago

      Wide I/O is basically the mobile-optimized version of this tech.

        • chuckula
        • 5 years ago

        IIRC Wide I/O existed first and HBM is something of an extension of Wide I/O.

    • allreadydead
    • 5 years ago

    Sounds… Revolutionary, and promising.

    Just like any other AMD related tech that is not released yet..
    To be serious, it has potential. However, I think the costs will be reflected highly on fiji as it will be an early adopter.
    If NVIDIA will be following one step behind to adopt HBM to its cards, most probably the chips will be already costing much less and again most probably got ridden of the annoying bugs.

    AMD will have the time advantage but looks like NVIDIA will adopt a more mature and cheaper HBM in 2016.

      • NoOne ButMe
      • 5 years ago

      HBM will bring higher costs, yes. That’s known. AMD has been working on and with HBM for a long time now, however. A lot longer than Nvidia will have been next year.

      I feel like HBM2 for Nvidia is going to be similar to how GDDR5 was for Nvidia. The first generation or two of devices they will stumble and be worse than AMD. Afterwards, they will reach parity. Of course, given all the excess bandwidth that HBM will provide, a stumble in terms of the controller and such could result in no difference for the consumer product! YAY!

    • the
    • 5 years ago

    [quote<]HBM made its way onto Nvidia's GPU roadmap some time ago, although it's essentially a generation behind AMD's first implementation.[/quote<] Technically correct but this statement can be read more than one way. It can mean that nVidia will be using HBM1 while AMD is using HBM2. From all indications, AMD and nVidia will be adopting HBM2 at nearly the same time. I think it'd better put to say that nVidia is just skipping HBM1 and going directly with HBM2 to make it a bit more clear.

    • NoOne ButMe
    • 5 years ago

    Only thing to be wary of about GDDR5 is that the ~10GBps/Watt is likely at the best point on the Power-Performance Curve.

    Based on what AMD has claimed about the 290x memory consumption, 5GBps appears to be above the highest point! 0.0

    • jjj
    • 5 years ago

    You should have included at least a comparison with HMC ( similar dumb storage and logic base layer stacks) and ofc point out that interposers and TSV can be used to come up with better solutions too so lets not proclaim HBM as the winner just yet.

    There was also a Hot Chips presentation on HBM (pdf) [url<]http://www.hotchips.org/wp-content/uploads/hc_archives/hc26/HC26-11-day1-epub/HC26.11-3-Technology-epub/HC26.11.310-HBM-Bandwidth-Kim-Hynix-Hot%20Chips%20HBM%202014%20v7.pdf[/url<] and video [url<]https://www.youtube.com/watch?v=FyxzNVTQea4[/url<]

      • JMccovery
      • 5 years ago

      Well, since the information was a ‘deep dive’ provided by AMD, and since AMD isn’t using HMC, I don’t see why they would talk about it.

    • willmore
    • 5 years ago

    I’m really curious to see how this will help the APU side of things. It seems well established that AMD APU performance is pretty tightly bound by memory performance and that AMD hasn’t had a good performing memory controller since the stars cores–if you can even call that one good.

    If the GPU frame buffer memory optimization/minimization comes to fruition, then an APU with one HBM stack on it could be a very potent device–especially in the mobile space. There seems to be a big demand for ‘better than integrated’ graphics in laptops. Especially if the integrated graphics are Intel ‘not exactly industry leading in performance’ parts.

    Having to go to a discrete solution isn’t optimal for mobile because of the size and power costs associated with it and it leads to messes like Optimus.

    Of course, there’s nothing to stop Intel from using HBM, too. They certainly have shown that they know how to stick a high bandwidth memory chip on the same substrate as a CPU. 🙂

      • NTMBK
      • 5 years ago

      Supposedly HBM actually makes the memory controller a lot simpler. No need to drive signals off-package and across the board.

      • wimpishsundew
      • 5 years ago

      Sticking HBM into APUs will finally make APUs reach its full potential. The APU’s biggest problems has been bandwidth, latency, and power consumption. HBM seems to have solved all 3. The only problem is cost and increase their CPU IPC.

      Hopefully Zen cores are going to solve the CPU IPC problem and economy of scale + yield improvement will solve the cost problem. Then AMD still has a major issue of executing on time. Let’s hope they close the gap on CPU performance and increase the lead on GPU performance over Intel to make APUs finally a decent solution for laptops.

        • ImSpartacus
        • 5 years ago

        What do we know about hbm and latency?

        I’ve read about the power savings and the obvious bandwidth increase, but I missed the part about latency.

          • Ninjitsu
          • 5 years ago

          He didn’t have hard numbers in the article, but says it could be lower…tbh I don’t know why Intel would go to DDR4 if HBM had lower latency and higher bandwidth.

            • ImSpartacus
            • 5 years ago

            Cost? There’s got to be [i<]something[/i<]. Intel generally has their shit together.

            • Klimax
            • 5 years ago

            Well, eDRAM…

            • ImSpartacus
            • 5 years ago

            That’s not a sustainable long term solution. Too pricey.

      • UnfriendlyFire
      • 5 years ago

      “high bandwidth memory chip”

      That’s their L4 cache. It would be interesting to see HBM vs. something like +256 MB of L4 cache.

      • ImSpartacus
      • 5 years ago

      Yeah, I would love an apu with like a few gb of hbm and then buckets of ddr4. That’s the way to do it.

        • UnfriendlyFire
        • 5 years ago

        And then an OEM sticks it with the cheapest DDR3 or DDR4 they have in stock because the average consumer doesn’t know anything about RAM except for a vague understanding of memory capacity.

        I’ve seen A10 APUs get stuck with single-channel 1333mhz RAM. Bonus if they soldered that RAM in the name of chasing less than $1 of savings per shovelware laptop.

        EDIT: I also remember seeing a 2GB GT 610M. My dad almost fell for that, until I compared it to a 2008 netbook with 16 GB of RAM and asked him if the single-core Atom CPU was ever going to use the entire 16 GB of RAM.

          • jihadjoe
          • 5 years ago

          AMD’s APUs are budget parts though, so it’s really no surprise they get paired up with cheap RAM.

          IMO there is no scenario apart from notebooks maybe where it makes sense put 2133MHz DDR3 on an AMD APU, even if good RAM what’s needed to get the most out of it:

          Since APU’s graphics memory is shared with the CPU, it really needs 16GB minimum to give the OS 8GB to itself, but the price difference between two 8GB 2133MHz DIMMS and their 1600MHz counterpart can buy a GDDR5 equipped R7 250: [url=http://i.imgur.com/4bStviR.jpg<]Microcenter search[/url<]. Drop down to a 2x4GB DDR3 1600 kit + disable the integrated graphics and the R7 260x is on the cards, with a $10-$20 stretch to the R7 270 or 750Ti.

            • UnfriendlyFire
            • 5 years ago

            Windows 7 through 10 are fairly decent at handling memory, assuming you’re not running some heavy applications such as photo shop.

            There’s been benchmarks that showed the APUs gaining at least several FPS from upgrading the from 1333 mhz to 1866 mhz, especially as the GPU becomes stronger (and thus more bandwidth starved).

          • ImSpartacus
          • 5 years ago

          You’re going to make me cry.

            • UnfriendlyFire
            • 5 years ago

            You’re not the only one. Imagine the disappointment over AMD’s Bulldozer and Intel’s “mobile-first” strategy.

            • ImSpartacus
            • 5 years ago

            At least Intel is actually killing it on the mobile front. Core M is a monster and Atom is very competitive.

            I want AMD to succeed so bad, but they keep churning out mediocrity. HBM could be a big thing for them, but the dual Tonga rumors make me nervous as hell.

      • _ppi
      • 5 years ago

      Imagine following APU, or rather SoC:

      300W power
      4 Zen Cores eating some 60W
      GPU for some 220W
      16 GB low-latency-HBM for CPU and 8GB high-bandwidth HBM for GPU
      SoC with integrated sound card, USB, NVMe and possibly LAN/WiFi

      To that NVMe, attach SSD.

      Put to NUC form-factor. Killer gaming PC, slightly larger than nVidia shield console. This should be possible 2017/18.

        • NoOne ButMe
        • 5 years ago

        “Let’s dissipate 300W+ of heat in a NUC form factor, that’s a good idea” said the non-engineer.

          • dodozoid
          • 5 years ago

          I can imagine it being cooled by closed loop water cooler… the block should be relatively simple since all that heat comes from a single package

            • thor84no
            • 5 years ago

            Simple, yes. But it’d have to be way larger than the device itself. Water-cooling isn’t magic. Who wants a small hand-held device that can’t be used without being plugged in to a box twice the size just for cooling?

            • dodozoid
            • 5 years ago

            I thought about that shield box, not shield handeld… the radiator could even contribute to “coolness” of the design

          • _ppi
          • 5 years ago

          Agreed, NUC is probably too small. 1/4 of current XB1/PS4 (half width and depth, same height) is not unimaginable.

          • UnfriendlyFire
          • 5 years ago

          Just let it throttle. Or better yet, force a severe power restriction on it.

          HP’s Elitebook 850 G1/G2 laptops have a tendency of throttling the Radeon 8750M even when the GPU is operating at less than 72C, which indicates that HP couldn’t design the motherboard to handle the extra power draw.

        • Klimax
        • 5 years ago

        While still losing considerably to Haswell and newer. Take those 40% they parade about and apply that scaling to benchmarks. At best it will reach Sandy Bridge, but that assumes those 40% will hold across the board. Doubtful…

        ETA: And so far strongly uncertain about GCN without good driver support able to use well this amount of bandwidth…

          • _ppi
          • 5 years ago

          But Intel just won’t be able to put on single chip/package such a level of gaming performance (they do not have GPU) and nVidia won’t be able to do it either (they do not have CPU). Therefore, AMD could have clear nice home gaming small factor niche here (with lighter versions going to consoles) and could ask some premium for the coolness factor. Maaaybe if nVidia specifically designs chipset (packed with their big GPU) for Intel CPUs, they could get sort of close in form factor (and probably exceed in performance).

          Zen is obviously the big unknown, but if it gave Sandy Bridge single-threaded level performance, it probably would not be limiting factor in games. Single-threaded performance will be more than enough for normal daily tasks (browsing, office, etc.) and more demanding matters tend to support multi-core (including SMT) now. I would not count on GCN-Maxwell power/performance disparity forever either.

            • Klimax
            • 5 years ago

            Iris pro say hi. And with each generation of IGP that gap got considerably smaller. (In high TDP AMD kept somewhat lead, in smaller more important TDPs for NUC-like or notebooks AMD losses) AMD has edge in nonexistent markets… (alias won’t save them at al)

            And don’t forget Skylake is coming. And since Intel exhausted most of performance improvements, they are focusing on GPUs.

            As for Zen, that was likely artificial benchmark and in reality it will come nowhere near that figure. That was best case..

            • _ppi
            • 5 years ago

            If Iris Pro is so cool, and with Intel’s 14nm vs everyone else 28nm, and their endless R&D budget that is bigger than AMD+nVidia+Qualcomm together, please remind me why Intel does not dominate graphics card market already?

            In the same way, you in one post say that AMD cannot catch up Intel on CPU side, but in second post you say Intel’s CPU improvements are pretty much stalled. But clearly, Intel will catch up AMD in GPUs in very short time, when at the same time AMD is going to transition to 14nm and HBM (and it will be hardly without improvements to architecture, since they did not release anything really new for two years). But okay … Well, the only reason I can see such a scenario might actually happen is severe strain on AMD R&D resources.

            • chuckula
            • 5 years ago

            [quote<] please remind me why Intel does not dominate graphics card market already?[/quote<] 72% marketshare for Intel sounds pretty good for a company that doesn't dominate in graphics. [url<]http://jonpeddie.com/publications/market_watch/[/url<]

            • _ppi
            • 5 years ago

            You know very well I did not mean cards aimed at running browser and office suite. And Solitaire.

            In this respect, I have to give credit to Intel that they know their market: I recall couple years ago Tom’s hardware found out that Intel GPUs are much better at accelerating Windows UI than AMD (back then with 5xxx series or so), which did not bother to enable acceleration of some objects in their drivers.

        • Pwnstar
        • 4 years ago

        AMD is actually working on that:
        [url<]http://www.bit-tech.net/news/hardware/2015/03/31/amd-hpc-roadmap/1[/url<]

    • Hattig
    • 5 years ago

    Thank god, words. I hate the proliferation of videos on article based websites. Can you make it clear in the future that this is an article in the future as ‘explained’ suggests a video.

      • dodozoid
      • 5 years ago

      I agree,
      written article is so much more comprehensive than an interview

        • ImSpartacus
        • 5 years ago

        I think both are good. You can get some interesting little tidbits from an interview that won’t make it into an article, but are cool nonetheless.

        For example, Anandtech had an interview with an arm fellow that led the team that made a53. Anand got the guy to explain his “ideal” big.LITTLE soc using arm cores (spoiler alert, 4 a53s, 2 a57s). It was a really really cool part of the interview that absolutely would not have made it into a written article.

        So I think there are advantages and disadvantages to both. Both are worthwhile.

          • derFunkenstein
          • 4 years ago

          4 A53 + 2 A57 makes a lot of sense. That’s what the SoC in the LG G4 has (Snapdragon 808) and performance looks to be towards the top.

        • VincentHanna
        • 5 years ago

        I think the comprehensiveness of a covered topic is more or less up to the discretion/passion of the author in most cases…

        However, I can read a 3,4,5 page article in less time than it takes to convey the same information in video format. (and that’s before noting that TRs videos have a little unedited stream-of-conciousness type bend to them)

        More importantly though, I can find info again, without re-watching the video. Have you ever tried to hunt through a video for stats and figures that you know are there somewhere, after you haven’t seen it in a month or two?

      • jessterman21
      • 5 years ago

      Needs a tl:dr at the end though… Stuff’s way over my head.

        • Firestarter
        • 5 years ago

        yeah but in video form, words are hard

    • chuckula
    • 5 years ago

    Hey Guize! Next time can you put a warning in the title that this isn’t a video? I can’t be bothered to read words and I need a warning that the information won’t be in a video.

      • Anovoca
      • 5 years ago

      I was going to read this article but I think now I will wait for the audio book. Hope its narrated by Dutrice

      • September
      • 5 years ago

      I LOL’d, but I can’t decide whether to upvote or not.

        • chuckula
        • 5 years ago

        At least somebody managed to get the sarcasm…

          • Anovoca
          • 5 years ago

          Well maybe if you prefixed your post with ” Sarcasm statement:”

            • chuckula
            • 5 years ago

            I did start out the post by saying “Hey Guize!”
            If that isn’t enough of a flag, then I can’t be held responsible for the reading comprehension around here.

            • ImSpartacus
            • 5 years ago

            Those are the worst. Don’t ever do that.

      • Milo Burke
      • 5 years ago

      I up-voted this comment, but I don’t know if I agree or not because I didn’t read it. I only bother with video replies.

        • derFunkenstein
        • 5 years ago

        [url<]http://www.youtube.com/watch?v=WAOxY_nHdew[/url<]

      • HisDivineOrder
      • 5 years ago

      I want a music video version of this article narrated at times by a Morgan Freeman impersonator.

      • ggaky00
      • 5 years ago

      Why not read an article instead of listening to it? You (not particularly yourself, I speak in general) can really benefit from the grammar and learn how to use “you’re” instead of “your” and “their” instead of “there”.. 😛

      Still, it was a good article. Nice one Scott.

Pin It on Pinterest

Share This