Samsung begins mass production of 4GB HBM2 memory chips

Just under a week ago, JEDEC updated the High-Bandwidth Memory standard with provisions for bigger, faster memory packages. Hot on JEDEC's heels, Samsung has taken the wraps off its mass-production 4GB HBM2 chips this evening.

The company says it's fabricating these 4GB dies on its 20-nm process. Each of these packages comprises four 8Gb core dies atop a buffer die at the base of the stack. Consistent with JEDEC's specifications, each of these HBM2 dies will offer 256 GB/s of bandwidth. For comparison, Samsung says that figure is a little over seven times the bandwidth of its through-silicon via (TSV) 4Gb GDDR5 dies. HBM2 chips are claimed to deliver twice the bandwidth per watt of Samsung's GDDR5 solutions, and the company notes that its HBM2 chips come with ECC support built in, as well.

Samsung also plans to release 8GB HBM2 packages this year, a move it says will allow graphics card designers to enjoy space savings of up to 95 percent versus designing around GDDR5. None of this news will be surprising to anybody familiar with AMD's Radeon R9 Fury X and friends, but it's exciting to think about nonetheless.

The company expects to ramp up production of HBM2 over the course of the year to meet anticipated demand not only in the graphics card market, but also for applications like high-performance computing, network systems, and servers.

Comments closed
    • Arclight
    • 4 years ago

    HBM stems from HMC if i’m not mistaken, so will anyone try to use it as system RAM?

      • BlackDove
      • 4 years ago

      Theyre unrelated and somewhat competing technologies. Intel and Micron developed HMC.

      HMC is already being used as system RAM

    • Welch
    • 4 years ago

    Yes!!! ECC support ftw.

    Compute uses, proffesional grade uses like CAD/3D Animation and hell… even games. With this sort of bamdwidth what is the few % or less performance hut of ECC really going to do to hurt compared to GDDR5? Absolutely nothing.

      • BlackDove
      • 4 years ago

      You can implement ECC with GDDR5 too.

    • Ushio01
    • 4 years ago

    If 2006/7 started the age of good enough CPU/GPU performance then 2015/16 is starting the age of more than enough memory and SSD storage.

    16GB DDR sticks, multi-terabyte SSD’s and we will soon know how much GPU memory.

      • Airmantharp
      • 4 years ago

      If these things all evolved in unison at an even pace, we’d always need ‘just a little more’ of each. But they don’t, and software doesn’t either. What we do have is enough for what we do with computers today; what will make our current technology look like a child’s toy is what we’ll be wanting to do in the future.

    • anotherengineer
    • 4 years ago

    I wonder if AMD can put these on it’s APU’s?

    Always a drag when you have a dual channel APU that will take DDR3-2133, and then the oem’s slap 1 stick of DDR3-1333 in there, and then people complain about it being slow! Ya I wonder why!

      • tipoo
      • 4 years ago

      I believe that is the plan, and it will be sweet.

    • MEATLOAF2
    • 4 years ago

    Is there any reason they couldn’t use HBM2 on an SSD as a sort of super high speed cache? Any reason they shouldn’t (besides cost)?

      • Chrispy_
      • 4 years ago

      There’s not reason other than cost, but standard bog-standard DDR3 (or even DDR2 for that matter) is so much faster than the interface an SSD could use that it just wouldn’t matter.

      It’s very rule-of-thumb but each jump down this list is (roughly) an order-of-magnitude increase in bandwidth:

      Mechanical hard drives
      SSD’s
      Old RAM like DDR and DDR2
      DDR4 and GDDR5
      HBM2

        • derFunkenstein
        • 4 years ago

        yeah, when you’re running into the write speed limitations of MLC and TLC flash, the cache seems to lose its luster.

      • tipoo
      • 4 years ago

      Probably no benefit over the DRAM cache that’s already there, on top of the cost. It has to go through a SATA or even PCI-E/NVMe bus which have upper limits on speed, and DRAM is already magnitudes faster than NAND. And then, why not put that high bandwidth memory nearer to the CPU/GPU anyways, where it could cache disk things too. I.e system memory file cache.

      (it’s almost like engineers somewhat know what they’re doing 😛 )

      • willmore
      • 4 years ago

      It would make more sense to put the high speed memory nearer the processor than near the storage.

      Keep asking questions, though, that’s a good way to learn.

    • Pancake
    • 4 years ago

    I’d like to see a low-power version of HBM. Fantasy island stuff – 4 or 8GB module, next-gen Atom or Core M, lots of next-gen Intel HD GPU cores to soak up the bandwidth… Surface 4… The power savings from not having to drive external RAM (I could live with 4GB but would be happy for years with 8GB in my travel device – currently using a T100 with 2GB) could be used for more GPU. Wouldn’t it suck if Apple were to do this first?

      • Beahmont
      • 4 years ago

      I doubt Apple is going to put any kind of HBM with a A9(X) or A10(X) any time soon. Though I suppose if they really wanted to they could design and make an interposer for Intel or AMD chips themselves, put HBM on it, and design a motherboard that would take the new package themselves or have MSI, ASUS, GIGABYTE, etc. do it for them.

      However that’s a crap ton of cost to capitalize for just Apple’s products. And Intel or AMD would both almost assuredly come up with their own solution quick fast and in a hurry, with the other motherboard manufacturers not far behind.

      Intel or AMD themselves don’t stand to make any money at all really in the current environment for putting HBM on mobile chips. Too costly all around that they’d likely never recoup design costs unless they made some very generic interposers that could be used for many successive CPU generations and iterations, but then they would likely loose too much design flexibility to do that.

      Edit: Grammar edit.

    • rems
    • 4 years ago

    “High Bandwidth Memory2 memory chips” sounds redundant just like “PIN number”, just saying.

      • Beahmont
      • 4 years ago

      Another oddity of English. The words in the acronym have little to no relation to the words outside the acronym when using the acronym as an adverb or adjective.

      HBM2 is the formal acronym of the tech. It’s being used as an adjective to describe memory chips.

      Technically both are correct. But people don’t tend to remember or associate the words in an acronym as individual words once they are part of the acronym. That’s actually the whole point of acronyms.

      • jessterman21
      • 4 years ago

      “I’ve gotta put my PIN number in this here ATM machine!”

    • f0d
    • 4 years ago

    i hope they are making smaller memory sizes also as 4GB modules is just too much imo
    doesnt fury have 4 stacked dies? 4X4GB is too much and any less dies means you lose out on performance
    2GB modules X 4 for 8GB memory should be plenty
    with just 2 modules x 4GB there is only 512GB/s memory bandwidth which is what the new GDDR5X is supposed to be able to do on a 256bit bus

    i cant even use more than 2GB on my 290, nothing i play uses more than 2GB with the settings i use (120hz monitor so settings are for 120fps constant) and most games that i set the settings to approach 4GB of usage the GPU is just too slow for it and dips way under 60

    i dont want to pay for all that memory that will never get used

      • JMccovery
      • 4 years ago

      HBM2 has double the bandwidth, so 2 4GB stacks gives twice the capacity, and equal bandwidth to what Fiji had.

      I see a 4x4GB configuration as probably landing on a FirePro part before a Radeon part, though it is entirely possible that AMD could drop a 16GB Fury.

        • f0d
        • 4 years ago

        2 4GB stacks is still about the same bandwidth as 256bit GDDR5X and will probably cost more that GDDR5X also, so whats the point?

        4GB X 4 is way more memory than is needed and will drive up the costs of whatever card it would be in, i havnt seen anyone able to use anywhere near the amount of memory a 390X has let alone 16GB as would be in 4X4GB modules

        i like the idea of HBM, its a big step forward but it just seems to me there needs to be smaller sizes (smaller than 4GB modules) of HBM2 for it to truly be ubiquitous

    • mcnabney
    • 4 years ago

    Just thinking out loud. Is there any reason that the GPU and HBM memory could not be put on the same package? Make a quad or even 6x GPU on a card. Power might be a nightmare, but this kind of shrinkage could really change the rules in graphics.

      • chuckula
      • 4 years ago

      [quote<]Is there any reason that the GPU and HBM memory could not be put on the same package? [/quote<] Yes for two technical reasons (not to mention economics): 1. Large GPUs are already very close to the optical reticle limit for fabs. Basically, they are unable to make larger chips, and HBM would take a big chunk ouf of the GPU transistor budget. 2. HBM needs to be stacked to get sufficient capacity to be interesting. A single integrated chip is going to be a planar affair and stacking on top of that chip isn't going to be practical.

        • BlackDove
        • 4 years ago

        Also theyre on different nodes. 16nm GPU and 20nm RAM.

          • chuckula
          • 4 years ago

          That’s another good reason. The physical sizes of the nodes aren’t actually that different, but the “16nm” transistors are finfets — vital for high-speed dense logic — while the transitor & capacitor structures in the HBM are still planar and don’t really have to be using fins.

        • mcnabney
        • 4 years ago

        Package, not die.

      • BlackDove
      • 4 years ago

      They are on the same package if you consider everything on the interposer to be the “package”.

      Intels EMIB is another interesting package technology that allows mixed nodes in the same package(but not the same die).

      • kuttan
      • 4 years ago

      You mean GPU and HBM in one die ?? If that is the case the GPU die would be too big to produce cost effectively. Power efficiency and heat dissipation were other troubles.

      • willmore
      • 4 years ago

      DRAM and logic use very different processes.

      You could make a combined process (there were some attempts at “smart DRAM”), but you have to compromise one or the other or use a process that is more complex than either alone. It ends up not being worth it.

    • ace24
    • 4 years ago

    Maybe a stupid questoin, but is there any reason these couldn’t be used on CPUs? Dual channel DDR4-2400 is 38.4 GB/s, which is nothing compared to HBM2’s 256 GB/s (especially if they could be used dual channel). If nothing else, it would give integrated graphics a real shot in the arm.

      • chuckula
      • 4 years ago

      [quote<]Maybe a stupid question, but is there any reason these couldn't be used on CPUs?[/quote<] They can be used with CPUs, but there will be cost issues (large silicon substrate with TSVs to the HBM stacks) and you need to take into account the capacity-vs-bandwidth tradeoffs. A single 4 or 8 GB stack acting as a high-speed intermediate cache backing a larger traditional set of RAM certainly is interesting, but just as we saw with the Fury X and with the eDRAM enabled Intel CPUs, merely having high bandwidth doesn't necessarily mean the processor runs much faster for all workloads. There certainly are *some* workloads that benefit, however. As we have seen with the new generation Xeon Phis that use similar HMC memory stacks and the next-generation GPUs, having a boatload of stacked RAM can be useful if your chip can actually use the bandwidth.

      • Laykun
      • 4 years ago

      I could see it doing well on an APU or for Intels integrated chipsets. The iris pro intel integrated graphics already have on-die memory known as edram, which is similar is concept but much smaller in capacity but can also act as an L3 cache due to its latency properties.

      If HBM can keep memory access latency low then I could also see it being a very good L3 cache on any CPU, but I don’t know if anyone has any latency numbers for HBM. Traditionally GDDR5 has had pretty high memory latency, as a trade off to get high bandwidth, but after doing a bit of reading it seems these latency effects may be somewhat diminished with HBM. AMD has said they foresee HBM making its way into pretty much every corner of the computing world except mobile devices, so the possibility of it happening is there, time will tell though.

      • Beahmont
      • 4 years ago

      There are purportedly chips in the works that do use HBM2 by AMD. I’m not sure how HBM2 compares to eDRAM, or even really basic SDRAM like DDR 3 and 4, granularity of access however. My understanding of the tech was that it takes a relatively but significantly longer time to access individual bits in HBM, but that it makes up the bandwidth because it can access so many more bit’s in a transfer.

      That sounds all well and good in theory, but that’s only if one needs to keep the CPU constantly fed with large amounts of ever changing data. If you only need a small fraction of that capacity or you can’t parallelize the data sufficiently and get stuck needing large numbers of sequential access, then HBM would appear to be Sub-optimal compared to eDRAM and possibly DDR3 and/or DDR4 depending on the timings of the chips and cost differential of going to main memory through traces.

      Now if I’m wrong, somebody please tell me because I’d really like to learn more about the specifics involved here. It would go a long way to making be feel better about Zen’s chances if I knew that the HBM would be a consistent improvement over traditional main memory.

      I was also under the impression that HBM had a higher parallel access ability but slower access rate versus HMC’s lower level of parallel access but higher rate of transfer. Over all they both have very large amounts of total bandwidth, but HMC was purportedly better at keeping 72 high frequency- quad threaded CPU’s fed compared to hundreds of individual shaders all constantly needing new data.

    • shank15217
    • 4 years ago

    You can all thank AMD for DX12 and 32GB of 1TB/s of memory goodness. You can thank Nvidia for game works and over tessellated flat surfaces.

      • hansmuff
      • 4 years ago

      Let’s just wait and see what gets released. First of all there will not be 32GB cards outside of crazy expensive professional sectors, and th HBM bandwidth has done little for Fury as well.

      DX12 is not AMD and Mantle is not DX12.

      Those opinions don’t mean I love game works.

      • Airmantharp
      • 4 years ago

      We can thank Nvidia for not using overpriced HBM1. AMD had nothing to do with HBM any more than Nvidia or Intel; they just jumped on the limited first version.

        • namae nanka
        • 4 years ago

        AMD had nothing to do with HBM which is why this very site published an article on ‘AMD’s high-bandwidth memory explained’. Pointing out,

        [quote<]Making this sort of innovation happen was a broadly collaborative effort. AMD did much of the initial the heavy lifting, designing the interconnects, interposer, and the new DRAM type. Hynix partnered with AMD to produce the DRAM, and UMC manufactured the first interposers. JEDEC, the standards body charged with blessing new memory types, gave HBM the industry's blessing, which means this memory type should be widely supported by various interested firms. HBM made its way onto Nvidia's GPU roadmap some time ago, although it's essentially a generation behind AMD's first implementation.[/quote<] [url<]https://techreport.com/review/28294/amd-high-bandwidth-memory-explained[/url<]

          • NoOne ButMe
          • 4 years ago

          Are you going to suggest AMD had nothing to do with GDDR5 next? Also your quote appears to contradict your statement?

            • chµck
            • 4 years ago

            namae was replying tongue-in-cheek to airman

            • Airmantharp
            • 4 years ago

            He/she tried and failed with their own quote ;).

            • Airmantharp
            • 4 years ago

            He’s getting hung up on ‘designed’. AMD should get some credit for producing the first product with HBM, and getting it before a standards body; but the idea was well into the market by that time, and Nvidia and Intel (and whoever else) would be using it regardless of whether AMD used HBM1 with it’s 4GB limit.

        • USAFTW
        • 4 years ago

        Which is why Mr. Huang has decided they can do without HBM2 on Pascal. They even showed prototypes based on blistering fast, much more power efficient GDDR5. Wow, if nvidia can get double GM204 performance with, er, GM204, AMD is AMDead.

      • Klimax
      • 4 years ago

      AMD is NOT responsible for DirectX 12. That would be Microsoft, Intel and NVidia. AMD outright denied DX 12 ever existing as a pretext to push proprietary Mantle.

      As for rest, sour grapes… (It’s AMD fault their chips sucked in tessellation)

        • xand
        • 4 years ago

        You’re absolutely right! Amazing!

        /s

          • Klimax
          • 4 years ago

          Unlike lot of you, I actually do understand things behind those pretty images you see.

          Tessellation is part of ultimate solution to correct rendering of objects. (or at least in real world where memory nor bandwidth scale infinitely) What we have currently are just fairly crude approximations.

          ETA: BTW: We have loads of evidence that DX 12 predates crap known as Mantle. There is no evidence it was jumpstarted by AMD or they had much to do there.

          Not that I care much about idiocy known as low-level APIs as they are brittle crap that is wrong answer to wrong question.

        • USAFTW
        • 4 years ago

        The least we know AMD do was encourage M$ to accelerate DX12 development.
        As for overtesselation, Scott covered it time and time again so I’ll take his word for it.
        The green feces is strong in this one.

          • Klimax
          • 4 years ago

          Nope. It seems ignorance is strong in this one. I guess it makes people more susceptible to BS out there. (And there are at lest two like that…)

          ETA: And personal attack? Well ,I guess when you have nothin substantive to throw at me, all you are left is with idiotically personal attack. It reflects badly on you, not me.

        • Klimax
        • 4 years ago

        BTW: People can downvote me all they want. Cannot change reality nor correctness of my posts. (Too hard…)

      • jihadjoe
      • 4 years ago

      Dude, I’m sure that cinder block could do with about 30,000 more triangles!

      • USAFTW
      • 4 years ago

      Agreed.

    • DancinJack
    • 4 years ago

    So:

    1. The timing coincides pretty damn well with new GPU launch dates! (Green and Red)
    2. Built in ECC is awesome. (although I’m not sure there is a ton of application in GPUs? I may be wrong) I’m sure people will find proper use cases.
    3. BRING ON PASCAL yayyayaayayayyay!!!

    4. I guess bring on Big Polaris too

      • Mr Bill
      • 4 years ago

      The ECC support might be a nod to the use of GPU’s in supercomputers.

        • willmore
        • 4 years ago

        Xeon Phi as well.

          • BlackDove
          • 4 years ago

          Xeon Phi uses HMC on EMIB not HBM on an interposer.

            • willmore
            • 4 years ago

            Do’ah! Good point!

            • chuckula
            • 4 years ago

            They are both stacked memory technologies so there are some strong similarities.

            EMIB is an interesting potential substrate technology since it is supposed to be quite a bit cheaper than TSVs on a silicon wafer substrate.

      • BlackDove
      • 4 years ago

      The timing coincides(is a little late actually) with the release of Intel Knights Landing which is the main competitor for GP100.

      GPUs have supported ECC since GF100 or earlier, and pretty much all RAM has an ECC variant.

    • chuckula
    • 4 years ago

    16GB GPU to finally drive that 4K monitor… just a matter of waiting for the right products to actually ship now.

      • TwoEars
      • 4 years ago

      8GB should be plenty for 4k. Unless you want to future-proof like crazy.

      [url<]http://www.tweaktown.com/tweakipedia/90/much-vram-need-1080p-1440p-4k-aa-enabled/index.html[/url<]

        • chuckula
        • 4 years ago

        It’s more a function of the size of each memory stack (4GB looks like the “small” size) multiplied by the number of stacks (4 appears to be the high-end number).

        Of course, two stacks at 4GB each could come to 8 GB, and I’m sure it’s technically possible to only enable 2GB of RAM in a stack (not sure if that impacts bandwidth, however).

    • Fonbu
    • 4 years ago

    I wonder what HBM3 will bring? Cheaper than HBM1&2?

      • Dudeface
      • 4 years ago

      Smaller nodes, more stacks, higher clock speeds would be my guess. Maybe an even wider interface.

      • Hattig
      • 4 years ago

      Taller stacks and higher bandwidth per pin probably. Maybe efficiency improvements in the protocol. Maybe finer pitched interposer requirements (45nm instead of 65nm) and higher density TSV could double the bus width, but that probably won’t be necessary.

      I wouldn’t expect miracles, they will probably stick with good enough HBM2 for a couple of years at least, and HBM1 was really a first generation ‘look it works’ testbed which was easily improved upon.

        • Airmantharp
        • 4 years ago

        HBM1 was only really limited because of cost and because it was stuck at 4GB, which was just too little too late. Had it been designed with 8GB in mind (if it were feasible), it would likely have taken off and seen wider use, likely both in Nvidia products as well as Xeon Phi products.

          • Pwnstar
          • 4 years ago

          It was stuck at 1GB, not 4.

    • Duct Tape Dude
    • 4 years ago

    Suddenly 16GB and 32GB GPUs really don’t seem so far-fetched. I didn’t realize it would be 8-giga[b<]byte[/b<] chips, I thought it'd be 8 giga[b<]bit[/b<]. That's an awesome capacity bump after years of stagnation.

      • shank15217
      • 4 years ago

      Man, AMD doesn’t innovate…

        • ImSpartacus
        • 4 years ago

        I know you’re joking, but you’re not that far off of the rumor mill for the 2016 releases.

        The current going rumor is that pitcairn and Hawaii will get replaced while fiji remains top dog into 2016 (speculation suggests that a legacy hbm mode might get it to 8gb of hbm with modest engineering effort).

        Amd needs a pitcairn replacement because pitcairn is old as well as its performance level is now laptop territory and amd needs to compete in laptops. This rumor was substantiated by amd’s recent ces demo.

        Amd needs a Hawaii replacement because a very expensive chip (512-bit bus?!) that’s currently being sold at razer thin margins against a much more profitable gm204 part.

        But amd doesn’t have the r&d to replace every chip (neither does Nvidia), so fiji is rumored to hang around (remember that we haven’t seen Gemini yet, amd isn’t going to instantly make Gemini outdated after it’s released). However it’ll likely get a healthy refresh a la grenada.

        Note these aren’t my predictions, just the most popular rumors at this moment. I think they make sense though. These companies aren’t magicians (Nvidia included). They have limits.

      • ImSpartacus
      • 4 years ago

      We’ve known for months that there will be a 32gb pascal-based gpu. I think it was an official statement.

      And even before then, we’ve known what the limits if hbm2 be with respect to capacity density. None this info is new. This just an announcement that sammy publicly acknowledged production.

      • tipoo
      • 4 years ago

      Not far fetched, but when will they be useful?

      I mean, I’m not one to ever deny the train of ever increasing memory demands, but I think 16-32GB will be about as wise as that one time I got a x1650Pro with 512MB, lol. At up to 4K workloads that’s still going to be excessive for a while for the performance level of the chips attached to them. In a few more years and GPU architectures, maybe it will matter.

        • Duct Tape Dude
        • 4 years ago

        That’s a good point. Everyone was whining about the Fury only having 4GB of memory, but in benchmarks it kept pace up until the very highest resolution textures at 4k. 6GB seems sensible for even unoptimized high res gaming, 8GB for futureproofing, but 16GB and 32GB gaming GPUs are straight up marketing. Perhaps more dies mean more bandwidth and huge capacities are a nice side effect.

        For professional and compute-oriented cards though, the extra memory sounds more welcome.

Pin It on Pinterest

Share This