news evidence points to radeon r9 380x with high bandwidth memory

Evidence points to Radeon R9 380X with high-bandwidth memory

New evidence suggests AMD is working on a Radeon R9 380X GPU with high-bandwidth memory. The latest nuggets come from an unlikely source: the LinkedIn profiles of two AMD staffers apparently linked to the project. References to the GPU were spotted by a member of the 3D Center forums.

The 380X is mentioned by name in the profile of Ilana Shternshain, an ASIC physical design engineer "responsible for full-chip timing methodology and closure for AMD's most exciting chips." Those chips include the PlayStation 4's SoC, the Radeon R9 290X discrete GPU, and the R9 380X, which is described as the "largest in 'King of the hill' line of products."

The second reference doesn't call out the 380X by name, but it does mention high-bandwidth memory specifically. AMD system architect manager Linglan Zhang claims to have had a hand in developing "the world’s first 300W 2.5D discrete GPU SOC using stacked die High Bandwidth Memory and silicon interposer."

Entries in online résumés fall short of official confirmation, but they do seem more reliable than the usual rumors. We shouldn't have to wait too long to see if this information is accurate. AMD is expected to introduce the R9 380X early this year.

0 responses to “Evidence points to Radeon R9 380X with high-bandwidth memory

  1. and if the performance per watt ISN’T there, they’ll MAKE hot cakes. #rimshot #hashtagsdontworkonTR

  2. The theory is sound but your conclusion doesn’t match the data HardOCP presented. If it was truly CPU dependent, you’d find performance to plateau to a similar value regardless of the video card used. If you actually look at HardOCP’s data, you’ll see that each video card performs differently when the setting are disabled (i.e. the control group). In other words, even with everything disabled, the base settings they used is still GPU dependent.

    The other thing to note is that [i<]both[/i<] the R9 290X and GTX 780 perform relatively poorly with particular effects. The new tessellation engine in the Tonga based R9 285 is superior to the R9 290X for example. Tonga takes a distinctly smaller performance hit (10%) compared to the R9 290X (20%) even though ultimately the R9 290X is faster in absolute terms.

  3. sorry nvidia fanboy but I really think you missed the point. It’s not about one side or the other. It’s about the 290x and the 285 which is a different architecture than the 2oo series. Maybe you missed that? If the new cards are just rebranded rebrands they won’t bring the new architecture to the next release. Not good for you or me and not worth purchasing. And I bet nvidia will be doing some of the same. Just like they always do. Look how far they stretched the 8800 series.

  4. HardOCP morons are clearly AMDfanboys (or better way of saying it, dollarboys), and if you doubt it, see them in the AMD event launch of the R9 285.

    About what do you say, Far Cry 3 IS a GAME EVOLVED tittle, not a TWIMTBP tittle, and it had nothing of gameworks or other nvidia functions.

    In fact, many titles had and have AMD optimizations like GI “AMD flavor” (with horrendous performance out of AMD, at last in the release of a game), cripple tessellation like the “silloutte tess” of Tomb Raider (to hide the bad performance of AMD cards with a tweaked and partial implementation of tess with objects). Supersampling combined with multisampling and/or postprocess antialiasing (redundant and horrendous from the developer’s perspective), to displace the dependency of requirements in the graphic card to the memory subsystem, a leak of memory technique that is known as “TressFX” (or how you can missused of 300-400 MB of VRAM with only one scalp), etc.

    And, are you saying something about these abundant titles? NO.

    Yo are complaining about one title, because it used gameworks techniques. HardOCP said stupidities, when you disabled many and powerful techniques in a game, you are displacing the dependency of performance from the gpu to the cpu, and all the graphic cards tend to have similar fps with downgraded graphics.

    Far Cry 4 is cpu-dependant, you need a very good cpu to this game, and with medium to high cards, if you have a medium-class cpu (like a AMD top cpu or a quad of intel with low-frecuency, not Oced), then your cpu is a burden for the performance.

    So, if you downgraded the gpu-part of dependency, even though you have a excellent cpu, your system tends to run with similar fps, because the cpu is now more relevant to the fps ecuation than before.

    HardOCP are a group of clows, if you want to prove that the gameworks are biased to nvidia cards, clearly, you can’t test this with only 3-5 cards. It’s anti-scientific.

    Well, all its reviews are anti-scientific, when they didn’t make a clear testing protocol, without subjective and biased resolutions about the graphic conf. It’s full of crap.

  5. Well, this is the highend performance model, who knows what the “mainstream” model(s) would be like.

    Sounds like for your needs highend would not be the market segment you’d wanna look at – not for me either since I go for silent PC builds.

    All that means is our needs are different from what the highend model’s aiming for, nothing new there.

  6. They just announced that a batter mix comes up with every card to make your own hot cakes!

  7. I wonder if you can run a waterblocks through a couple of these in CrossFire and into a shower heating system. Remote dial starts up instances of [s<]furmark[/s<] [email protected] to control water temp.

  8. I’m willing to bet the second entry, since it says SOC, is probably a Seattle die with onboard, discreet-level performing Radeon graphics. So a server based APU that has shared HBM with a Radeon R7 class GPU.

  9. “The performance difference lessens quite a bit between the video cards with these lesser, non-GameWorks features turned on. The GTX 980 is now only 5% faster than the R9 290X. This proves that the NVIDIA GameWorks features are a more heavy burden for the AMD GPUs. However, NVIDIA GPUs are still faster even with those features disabled, just not as much.”

    had to sign up to say this. use a non-nvidia gameworks title to compare your gpus! even far cry 3 was terrible on amd gpus

    actually, just avoid ubi titles altogether, far cry 4 and every other one of their releases last year are still buggy as far as I know

  10. I just went to the page and I can’t find the bit about the memory.

    Looks like it was taken down.

    I’m thinking it’s legit.

  11. I don’t like the idea of sucking gobs of energy for my videogames.

    It’s not foolish or shortsighted or invalid.

  12. 300 watts and no new process node for 2015.

    yep. nothing to match the 970 looks like.

    anybody know if the coil whine is off the EVGA 970s yet?

  13. [quote=”HisDivineOrder”<] But you can't just come out and TELL people about product too far ahead of time because... [/quote<] It used to be called "Osborning" yourself. Look it up.

  14. I would have agreed with you as recently as two months ago, but with summer dropping the hammer down here, I’m starting to see the virtues of a video card that doesn’t double as a space heater…

  15. [url<][/url<] That's not 165W. When running Crysis, GTX 980 is pullling 230W+, and probably pulls even more with a maximum stress test. When you consider a maximum-draw test on the GTX 980, it turns out that the [url=<]GTX 980 is barely any better than the R9 290X[/url<] (around 30W savings), while the R9 290x has higher GPU Compute (at 5.6 TFlops). According to the Furmark tests from Anandtech, GTX 980 and R9 290x are about dead-even actually. 10% lower watts for 10% less TFlops. I'm betting that NVidia put a lot of effort into average-case performance. Shutting down parts of the chip that aren't being used, or other such tricks. Because really, the GTX 980 under maximum load isn't much better than the status-quo. GTX 980 under average load (aka: gaming load) is much better though, so in practice you end up saving a lot of power.

  16. “So TDP has gone up ~6x which isn’t really a big deal given the availability of 1000W+ PSU and performance is still growing exponentially.”

    Interesting corollary, we’ve had ~6 process shrinks from the 9700 (150nm to 130, 90, 65, 45, 32, 28* etc.). 2.6 GFlops * 2^6 = 166 GFlops. 6x TDP = ~1 TFlops. NVIDIA pointed out that GPU compute is growing faster than node scaling, and thus GPUs are beating moore’s law (better perf than node scaling alone).

    Except as a GPU programmer I take slight issue with that.

    The 9700 Pro had 20 GB/s of bandwidth. If we had the same scaling as GFlops, we’d have 1280 GB/s (*not* even accounting for TDP increase, with TDP…it’s ~7680 GB/s).

    The most brutal discrepancy tho, is that the 9700 Pro had 2.6 GPixels of fill-rate. The 290X has 64 GPixels. Sounds pretty good until you do the same GFlops scaling-math: 2.6 * 2^6 * 6 = ~998 GPixels of fill-rate if we saw the same scaling for all GPU functions.

    I’m not saying the shading performance isn’t welcome, but the fill/texture/memory perf would be as useful for a lot of modern rendering techniques.

  17. Per watt performance efficiency is great and all but total performance is still what matters most to most people.

    With 1440p monitors becoming more affordable and commonplace while 4k monitors become less expensive niche products the extra single card performance is much needed.

  18. Even a ‘mere’ 30% performance advantage would be a pretty big deal and would probably sell the card like hotcakes.

    Its been a long time since we’ve seen a performance jump that big between generations.

  19. If Nvidia can do it with the 900 series then AMD should be able to do it as well. Time for an attitude change.

  20. [quote<]Once again it's Linked-in (not a technical paper) but: "stacked die High Bandwidth Memory and silicon interposer."[/quote<] Linked-in is better; that's where people accidentally leak juicy secrets that don't get mentioned in technical papers

  21. I [i<]think[/i<] wire bonding would work to get a more traditional multi-chip module. The catch like every MCM is that you'll pay in terms of cost (still expensive), power and additional die space for larger pads. Of course this would only apply between the HBM and GPU. The stacking used within the HBM would still need to be through silicon vias.

  22. Actually, TSMC couldn’t get high performance 20nm to work without finFETs. They had no choice but to move on.


  23. 280x? Rumor says the 380x will challenge nVidia’s 980. So that is a fair bit faster than the 280x.

  24. It is possible it very could be at 4K resolutions. Memory bandwidth at that setting is a bottleneck and HBM could easily eliminate it.

    While nVidia isn’t going to go with HBM/stacked memory until Pascal, they do have the option of releasing additional Maxwell chips with a wider memory interface: 384 bit and 512 bit wide are very real options and can be done on 28 nm. The real question is: will it be enough to be competitive?

    The other thing that nVidia can do is play on price as I suspect HBM based cards are going to be expensive.

  25. Tonga has less memory bandwidth than the vanilla R9 280. It got a new compression scheme to compensate despite having the same number of ROPs and shaders.

    Then again, it is heavily speculated that Tonga has a 384 bit wide memory bus and what we’ve seen so far hasn’t been the true potential of the chip. It has been confirmed that the die has 2048 shaders which 1792 have been enabled on the R9 285.

    It wouldn’t surprise me to see Tonga become the R9 370 and R9 370X alongside the launch of the R9 380 & R9 380X.

  26. Well remember the Bitboys hype of yesteryear? Well the memory technology for that type of high bandwidth and low latency implementation is now practical.

    Depending how any HBM modules are in the package, we could be seeing GPU with over 1 TB of memory bandwidth, a three fold increase over the R9 290X. Combine that with the new compression and memory efficiency of the R9 285, and you have significantly reduced the memory bottleneck. The limiting factor will down to how many ROPs and shaders they can cram into the die. I’d expect 4K resolutions to become playable with a single card.

  27. I thought the rumor went like this:

    TSMC dropped 20nm entirely in favor of a quick move to 16nm because of delays and little reason not to go on and move lower since the delays had mucked things up pretty bad already. Partly due to this and partly due to their ongoing inability to meet their overall chip requirements for CPU/APU’s to GloFo, AMD switched their discrete GPU’s to GloFo and got access to a 20nm process. nVidia stuck with TSMC, which means they’ll be waiting until 16nm is ready. This means nVidia is going to be sitting on 28nm for another year, so nVidia will be using redesigned Maxwell for 28nm (from the 20nm originals) while they wait until the GPU after that and 16nm.

    Also, both 20nm and 16nm will be dominated by mobile SOC’s (including nVidia’s Tegra) for the early part of their first runs, so expect discrete GPU’s to follow after that (ie., mid-2015 for 20nm AMD GPU releases, late 2015/early 2016 for nV GPU’s).

  28. All GPUs are bandwidth limited at some time during their operation. They continuously switch between states that are sometimes BW limited, sometimes shader limited etc.

    The extent by which they do that depends on the application.

    So the question whether or not they are BW is not the right one. It should be: how much of the time are they BW limited, and to what extent are they limiting.

    If they are BW limited 100% of the time, but, in that state, a second unit is also very close to being the limiter, then it doesn’t make sense to increase BW relative to everything else: you’d simply have a very well balanced architecture.

    If you’re BW limited 30% of the time, and during that time, your other units are sitting completely idle, your architecture is very unbalanced, and increasing the BW will give you large benefit.

    With HBM, the BW will likely disappear completely as a limiter, which is great as long as it frees up a lot of units that were mostly idle before. For compute, that’s probably going to be the case for many workloads. For games/graphics, only Nvidia/AMD/Intel know.

  29. [url=<]Rumor has it[/url<] there won't be any 20nm GPUs released from AMD/

  30. It depends on what that buys you in terms of performance and features etc.

    A single metric doesn’t say much without context.

  31. [quote<][b<]Nudge, nudge,[/b<] wink, wink[b<], say no more![/b<][/quote<] Fixed that for you

  32. after just getting done reading HardOCP’s review on FarCry4 performance I don’t think I would want my new cards associated with the 290x in the same article. The 285’s architecture is way ahead if you read the story. Hopefully these really are NEW cards..??..!!

    what I did find interesting were the vram usages. ( even at 1080p ) Nobody anywhere will ever be able to say that 2gigs is enough anymore.

  33. Is that a rhetorical way of saying memory bandwidth is a limiting factor?

    I asked because I don’t know!

    What I’m trying to gauge is if this memory switch will be “nice to have”, like swapping out one modern SSD for another that’s still limited by SATA 3, or swapping out high speed DDR3 for low speed DDR4.

    Or if this memory switch AMD is probably doing will be significant, in the way that Sandy Bridge was significant for instructions-per-clock and Haswell and Maxwell were significant for efficiency-per-watt.

  34. As a petrolhead who has to also drive his car every day: yeah… so? I like to build small systems with SFX power supplies. I can wedge a 95w i5 or i7 and a 225W GPU into that power budget, but 300W is a stretch.

    Besides, power = heat = noise, and if AMD continues this trend of needing ~30% more juice to get the same performance as NVidia, it’s going to be a though sell. I’ll go for the cooler, quieter card any day of the week.

  35. King of the power consumption too, from the sound of it.

    That kind of power consumption would seem to indicate it is still 28nm, or a really fat 20nm core… the kind that AMD said themselves they were done making. Even so I still can’t wait to see what the card can do with HBM.

  36. HBM has thousands of pins, an interposer would make routing those signals – at silicon voltages/currents – far easier than in a package (which is more like a tiny multilayer PCB in many ways).

  37. This attitude always amuses me, it conjures up images of a petrolhead complaining about fuel efficiency.

  38. Looks like Linglan Zhang already changed her profile (unless I’m missing something).

    If only they were more specific and gave us dates…

  39. I don’t think the problem is a lack of performance/watt improvement, but an issue of comparing it to other available cards.

    GeForce GTX 980: [url=<]5 TFLOPS[/url<] / [url=<]165W[/url<] = 30.3GFLOPS/W

  40. Perhaps.

    Sometimes, I think “leaks” like this are done purposefully to get a whisper or two out there to do the job of making press where a company is coasting on older product and trying to convince the faithful to hold off on abandoning you.

    But you can’t just come out and TELL people about product too far ahead of time because… well, often it’s just cause. Sometimes, it’s because people take too much information too soon without actual product on the shelf as a really bad thing. Especially for sales. Lots of people want to buy asap and making them wait a month or more for product to ship is a great way to dampen enthusiasm that could have lead to impulse buying from your faithful (ie., R9 290X).

    So you want enough talk that something’s coming to keep the AMD Faithful true to AMD (and not swayed by the Geforce 970/980 that’s making the rounds), but you don’t have the product to even do a reveal yet.

    You leak and you rumor. The most useful part of this strategy is that the leaks don’t even have to be factually correct (ie., Bulldozer’s early word of mouth via indirect AMD sources). They just have to be circumspect and mostly unofficial.

  41. In this case it means that there is the GPU chip and a silicon interposer that connects the GPU to external chips (in this case memory chips). The GPU and memory both sit on top of the interposer, which is the source of the “2.5D” term.

    The “silicon interposer” is just a wafer of silicon that has conductive traces and contact points setup to accept both the GPU and the memory chips. They sit side-by-side on the interposer in most configurations.

    A silicon inteposer is a much more efficient and compact way to interconnect different chips, but it’s also a heck of a lot trickier to manufacture than mounting parts on a traditional PCB with conductive traces.

  42. So where does it say they’re both talking about the same GPU?

    Both stories seem to be plausible based on other “leaks”, but I’m not too sure about both talking about the 380X. Would the 390X then be the only GPU using the rumored 20nm node and thus not have to go much over 300W while still performing significantly better? Or does this and the rumored reference watercooler both suggest the 390X is actually gonna use significantly more than 300W?

    On another note; wouldn’t posting this kind of stuff actually break some kind of contract/NDA?

  43. [quote<]We shouldn't have to wait too long to see if this information is accurate. AMD is expected to introduce the R9 380X early this year.[/quote<] You know, if I wanted to hint at something I already had been told via NDA-based sources, reporting on a rumor and suggesting people'd find out soon in a "Your Honor, I did not say it directly" kinda way that's still relatively suggestive is the way I'd do it. Wink, wink.

  44. We have the PSU’s to handle it and to put some things in perspective check out just how far we have come:

    The ATI 9700 PRO, arguably one of the best GPUs of its time, had a TDP of roughly 45W-50W with a measly 2.6GFLOPS
    The AMD 290X has a TDP of 275W, will perform less than or equal to a 380X, and has a whopping 5.9TFLOPS.

    That’s an efficiency boost from .057GFLOPS/W to 21.4GFLOPS/W or 37543% increase.

    So TDP has gone up ~6x which isn’t really a big deal given the availability of 1000W+ PSU and performance is still growing exponentially.

    Now get off my lawn and enjoy your new insanely fast GPUs.

  45. I remember postings like that back in the late 90’s and thinking to myself “you really think you’re going to get one of the people who designed it?”

  46. [quote<]Speaking of which, can someone find me a software developer with at least 15 years of HSA development experience to fill an open position?[/quote<] Lol, I remember seeing job postings like that with Java. It was only out for a couple of years and postings were asking for 5+ years java coding experience.

  47. Hmmm make an R9 370x with a single 6 pin connector 1280-1440 shaders and about 195$ and I might bite.

  48. Let’s assume for a moment this is fact and not rumor. What does this mean for the 380X? What can we expect it will do better than the 280X?

  49. Once again it’s Linked-in (not a technical paper) but: “stacked die High Bandwidth Memory and silicon interposer.”

    That is interestly. The available documentation is a little scarce, but I’m not sure that it’s a requirement to use HBM on an interposer (as opposed to just using a PCB with traces). The silicon interposer definitely provides a more compact package and better signaling, but getting interposers working reliably in mass-produced parts is also a bit tricky.

  50. “discrete GPU SOC” –> Ah linked-in, where meaningless buzzwords take the place of real information.

    Speaking of which, can someone find me a software developer with at least 15 years of HSA development experience to fill an open position?