blog spitballing the performance of amds radeon vega frontier edition graphics card

Spitballing the performance of AMD’s Radeon Vega Frontier Edition graphics card

AMD's Radeon Vega Frontier Edition reveal yesterday provided us with some important pieces of the performance puzzle for one of the most hotly-anticipated graphics chips of 2017. Crucially, AMD disclosed the Frontier Edition card's pixel fill rate and some rough expectations for floating-point throughput—figures that allow us to make some educated guesses about Vega's final clock speeds and how it might stack up to Nvidia's latest and greatest for both gaming and compute performance.

Dollars and sense
Before we dive into my educated guesses, though, it's worth mulling over the fact that the Vega Frontier Edition is launching as a Radeon Pro card, not a Radeon RX card. As Ryan Smith at Anandtech points out, this is the first time AMD is debuting a new graphics architecture aboard a professional-grade product. As its slightly green-tinged name suggests, AMD's Frontier Edition strategy roughly echoes how Nvidia has been releasing new graphics architectures of late. Pascal made its debut aboard the Tesla P100 accelerator, and the market's first taste of Nvidia's Volta architecture will be aboard a similar product.

AMD's Vega Frontier Edition cards

These developments suggest that whether they bleed red or green, gamers may have to accept the fact that they aren't the most important market for these high-performance, next-gen graphics chips any longer.

Though gamers might feel disappointed after yesterday's reveal, this decision makes good business sense. As I mused about on Twitter a few days ago, it doesn't make any sense for the company to sell Vega chips on Radeon RX cards just yet when there's strong demand for this GPU's compute power elsewhere. In turn, AMD can ask much more money for Vega compute accelerators than it can for the same chip aboard a Radeon gaming card. Yesterday's Financial Analyst Day made it clear that AMD is acutely aware of the high demand for GPU compute power right now, especially for machine learning applications, and it wants as big a piece of that pie as it can grab.

Radeon Technologies Group head Raja Koduri put some numbers to this idea at the company's analyst day by pointing out that the high end of the graphics card market could represent less than 15% of the company's sales volume, but potentially as much as 66% of its margin contribution (i.e., profit). Nvidia dominates the high-end graphics card market regardless of whether one is running workstation graphics or datacenter GPU computing tasks, and AMD needs to tap into the demand from these markets as part of its course toward profitability. Radeon RX products might make the most noise in the consumer graphics market, but Vega compute cards could make the biggest bucks for AMD, so it only makes sense that the company is launching the Frontier Edition (and presumably the Radeon Instinct MI25) into the very highest end of the market first.

Sizing up Vega
Now, let's talk some numbers. AMD says the Vega GPU aboard the Frontier Edition will offer about 13 TFLOPS of FP32 and about 25 TFLOPS of FP16 performance, as well as a pixel fill rate of 90 Gpixels/s. AMD also says the chip will have 64 compute units and 4096 stream processors, and that FP32 TFLOPS figure suggests a clock speed range of about 1450 MHz to 1600 MHz. I propose this range because AMD seems to have used different clock rates to calculate different peak throughput rates. I'm also guessing the Vega chip in this card also has 64 ROPs, given the past layout of GCN cards and the way the numbers have to stack up to reach that 90 Gpixels/s figure.

GTX 970 1050 1178 56 104 1664 224+32 224 GB/s 3.5+0.5GB 145W
GTX 980 1126 1216 64 128 2048 256 224 GB/s 4 GB 165W
GTX 980 Ti 1002 1075 96 176 2816 384 336 GB/s 6 GB 250W
Titan X (Maxwell) 1002 1075 96 192 3072 384 336 GB/s 12 GB 250W
GTX 1080 1607 1733 64 160 2560 256 320 GB/s 8GB 180W
GTX 1080 Ti 1480 1582 88 224 3584 352 484 GB/s 11GB 250W
Titan Xp 1480? 1582 96 240 3840 384 547 GB/s 12GB 250W
R9 Fury X 1050 64 256 4096 1024 512 GB/s 4GB 275W
Vega Frontier Edition ~1450? ~1600? 64? 256? 4096 ??? ~480 GB/s 16GB ???

Regardless, that clock-speed range and the resulting numbers suggest that AMD will meet or exceed its compute performance targets for its first Vega products. The company touted a 25 TFLOPS rate for FP16 math when it previewed the Radeon Instinct MI25 card, and the Vega Frontier Edition could potentially top that already-impressive figure with 26 TFLOPS or so at the top of its hypothetical clock range. Assuming those numbers hold, the raw compute capabilities of the Vega FE for some types of math will top even the beastly Quadro GP100, Nvidia's highest-end pro graphics card at the moment. These are both high-end pro cards with 16GB of HBM2 on board, so it's not far-fetched to compare them.

  Peak pixel
fill rate


Asus R9 290X 67 185/92 4.2 5.9
Radeon R9 295 X2 130 358/179 8.1 11.3
Radeon R9 Fury X 67 269/134 4.2 8.6
GeForce GTX 780 Ti 37 223/223 4.6 5.3
Gigabyte GTX 980 Windforce 85 170/170 5.3 5.4
GeForce GTX 980 Ti 95 189/189 6.5 6.1
GeForce GTX 1070 108 202/202 5.0 7.0
GeForce GTX 1080 111 277/277 6.9 8.9
GeForce GTX 1080 Ti 139 354/354 9.5 11.3
GeForce Titan Xp 152 343/343 9.2 11.0
Vega Frontier Edition ~90-102? 410?/205? 6.4? 13.0

Taking AMD's squishy numbers at face value, the 25 TFLOPS of FP16 the Vega FE claims to offer will top the Quadro GP100's claimed 20.7 TFLOPS of FP16 throughput. In turn, AMD claims the Vega FE can deliver about 26% higher FP32 throughput than the Quadro GP100: 13 TFLOPS versus 10.3 TFLOPS. The GP100 might deliver higher double-precision math rates, but we can't compare the Vega FE card's performance on that point because AMD hasn't said a word about Vega's FP64 capability. Even so, the $8900 price tag of the Quadro GP100 gives AMD plenty of wiggle room to field a competitor in this lucrative market, and it seems the performance will be there to make Vega a worthy compute competitor (at least until Volta descends from the data center).

The things we still don't know about the Vega chip in the Frontier Edition are facts most relevant to the chip's gaming performance. AMD hasn't talked in depth about the texturing capabilities or geometry throughput of the Vega architecture yet, but it's simply too tantalizing not to guess at how this Vega chip will stack up given its seeming family resemblance to Fiji cards. Beware: wild guesses ahead.

Assuming Vega maintains 256 texture units and GCN's half-rate throughput for FP16 textures (and this is a big if), the card might deliver as much as 410 GTex/s for int8 textures and 205 GTex/s for bilinear fp16 filtering. For comparison, the GTX 1080 can deliver full throughput for both types of texturing. Even so, that card tops out at 277 GTex/s for both int8 and fp16 work. The Vega FE's impressive texture-crunching capabilites might be slightly tempered by that 90 GPix/s fill rate, which slightly trails even the GTX 1070's theoretical capabilities.

Either way, none of these dart throws suggest the eventual RX Vega will have what it takes to unseat the GeForce GTX 1080 Ti atop the consumer graphics-performance race, as some wild rumors have postulated recently. I'm willing to be surprised, though. We also can't account for the potential performance improvements from Vega's new primitive shader support or its tile-based Draw Stream Binning Rasterizer, both of which could mitigate some of these theoretical shortcomings somewhat.

All of those guesses square pretty nicely with my seat-of-the-pants impressions of Vega's gaming power during AMD's demo sessions, where the card delivered performance that felt like it was in the ballpark with a GeForce GTX 1080. I gleaned those impressions from AMD demo darling Doom, of course, and other games will perform differently. It's also possible that the Radeon RX Vega will use a different configuration of the Vega GPU, so AMD Vega FE numbers may not be the best starting point. Still, if it's priced right, the Radeon RX Vega could be the high-end gaming contender that AMD sorely needs. We'll have to see whether my guesses are on the mark or wide of the mark when Radeon RX Vega cards finally appear.

This article initially speculated, without sourcing, that AMD would include 4096 SPs on the Vega FE GPU. The company did, in fact, confirm that the Vega GPU on this card would include 4096 SPs on a separate product page that I overlooked. While this new information does not affect any of the guesses put forth in this piece, I do regret the error, and the piece has been updated to include numbers from AMD's official specs.

0 responses to “Spitballing the performance of AMD’s Radeon Vega Frontier Edition graphics card

  1. The RBEs/ROPs being made part of the coherent cache network (per-CU L1Ds and memory-controller attached L2s) is the thing that’s supposed to help most with deferred rendering, since intermediate render target buffers will be able to be fed to post-processing stages without requiring higher-level synchronization barriers.

    Tiling/binning rasterizers could in theory improve multi-pass rendering if the G-buffers pieces can be kept resident in L2 between generation and lighting stages, but that’s probably a lot easier to say than to do.

  2. I personally base Vega perf on the way each new arch gets 50-70% improvement… Vega is the follow-on to Fury so 50-70% faster than Fury is VERY FAST… And that’s per clock… Vega is set to be 50% higher clocks…

    The FE got 60-70 fps and Raja’s AMA says there are gaming features specific to RX Vega… He also says some RX chis will be faster than FE…

    I don’t hope they’re faster or slower… I just hope I can go 4K with an 1800X… You can also look at what MS did with Scorpio, programming the GPU with DX12 which cuts CPU usage by up to 50%…

    We’ll all see next week I guess…

  3. Dont know where your info is from about the gtx 1080 @ 50 fps 4k/ultra but heres my gtx 980 ti on battlefront 4K/ultra . I have a gtx 1080 and it obviously outperforms my 980 ti.


  4. I’m glad that made sense, but despite those issues, I think HBM2 is ultimately a good idea on AMD’s large chips due to the slight power savings.

    As you pointed out, their gaming part(s) will almost certainly come “pre-overclocked” and that means the power budget suddenly matters. Even if HBM2 saves 10-20W over an equivalent GDDR subsystem, that’s still valuable.

    And based on the Vega FE’s 2×8-pin connectors (confirmed by Raja on r/AMD yesterday), it looks like its TDP could very well be at or near 300W (I don’t expect higher due to losing the PCIe spec).

  5. The AMD hardware has always been stellar for compute and everyone knows this. But actually putting it to use is so much more complicated. nVidia provides a framework for deep-learning that doesn’t even require you to write a single line of code ( On the other extreme end, AMD provides a “translator” for CUDA code (meaning that you get the pleasure of debugging the output so you have to understand both CUDA and OpenCL/GCN) and a compiler/assembler so that you can code your own low-level convolution kernels in assembly. It really doesn’t get more complicated than this.

    Anyway, AMD does have their own “CUDA”, the ROCm package. Amusingly, it does not support the Bulldozer architecture, only Xeons, Intel Core i3/5/7 and Ryzen, so I haven’t been able to test it yet. Now that I have a Ryzen, I should be able to make it run with my RX480.

  6. This is a new GCN iteration, specifically optimized for higher frequency. They are supposed to clock higher, but I really think that 1600 is pushing them way past their sweet-spot, just like Polaris.

  7. I think you summarized it quite well. I don’t dissociate technical and supply problems (=price/availability) because the end result is a product that must sell at a given price and generate profit. In no particular order:
    1. HBM incurs a very big latency. Even though this is tolerable in GPUs, it requires a departure from the traditional design and is a significant strain on the driver team and possibly even on the developers. Despite the massive amount of compute resources, Fury could not compete very successfully. The transition is not trivial.
    2. HBM is restricted to the densities you mention. Although I feel 8GB is OK for almost anyone, it does seem inadequate against the 11GB 1080Ti. On the other hand, 16GB is a huge overkill and adds meaningless cost.
    3. A product with HBM must be priced above a certain level because the supply is limited and prices are high. To be priced at that level, it must perform well, otherwise it doesn’t make sense.

    The move to pro solves all these problems: 16GB is a natural fit and an actual advantage in applications, latency is easy to hide in predictable workloads like deep-learning and price is not a concern in this market. However, AMD is sorely lacking the driver infrastructure, especially for deep learning. Until ROCm and especially miOpen are debugged and competitive with CUDA and cuDNN, buying AMD is not an option. The price of the cards is not a concern in this segment, but hiring a developer to do GCN assembly coding is simply a no-go unless you’re a huge institution.

    Will they be able to “salvage” Vega for gaming? Probably. I would guess an 8GB part, clocked way beyond the frrequency/power sweet spot (like the RX 580) and selling below 1080 prices but above 1070 prices. If they can manage a performance that is clearly above 1070 and not much below 1080 (90-95%) I might be tempted to buy one. At best, they will push it to 105% of a 1080 (nonTi) with the occasional (Doom) win against the Ti and call it a win, at the cost of obscene power consumption. Whatever the exact positioning, Vega will not disrupt the gaming landscape. However, if it does disrupt the compute landscape, AMD investors will be happy.

  8. Yes, the problem is AMD is a follower in the GPGPU space, always seeming to be trailing what Nvidia does. What’s worse is that they come with good hardware but don’t seem to do much to tackle the software side of things and depend on open source and others for that. This simply won’t get them much of that market they’re after IMO.

    I can’t help but think that they should have bit the bullet years ago and done their own CUDA to go along with GCN.

  9. I expect moderately better than 1080 performance assuming a clock speed of 1.6Ghz.

  10. Do you think hbm was a mistake for supply reasons or actual technical restrictions?

    Supply issues on fast near-2 Gbps HBM2 is all-but-confirmed as a serious issue. That’s obviously a problem when you planned for 2 Gbps to get to 512 GB/s.

    And the other technical restrictions are troubling, e.g. 4, 8 or 16 GB config limits could become a problem in a world where Nvidia can improve GP102’s cost and market segmentation by disabling a controller to drop bandwidth and capacity by a relatively granular ~8%. AMD can just double capacity or cut it in half.

    But I think the efficiency that HBM brings is valuable to AMD. They are clearly an underdog in efficiency, being forced to effectively pre-overclock their products and destroy their perf/W. Things like HBM and closed loop coolers are very helpful in extracting every last drop from the power budget. It’s not a cheap way to do that, but if your architecture isn’t doing it for you, you do what you gotta do.

  11. The helpful folks at r/AMD pointed out that AMD has confirmed the Vega FE GPU will have 4096 SPs on [url=<]a separate spec page I overlooked[/url<] in the rush from AMD's analyst day news. While that information doesn't change any of the conclusions I reached in this piece, I'm glad for the opportunity to use official sources rather than guessing.

  12. Don’t get me wrong, I think the RX480 is a fantastic card and am very happy with it. However, I believe that AMD made the wrong bet, again: they chose HBM too early and just now realized it doesn’t make sense for anything other than pro. This is a huge strategic mistake and they will pay for this.

    The turn to pro is a sensible move, at least they realized it would not work anywhere else. Just don’t hold your breath for the gaming versions. At least I hope they don’t rebrand the 580 to 680, again.

  13. I’m gonna say it nicely, but you calculated guesstimate is not even WCCFTECH worthy. You got your basics all wrong. You could have said i think VEGA is slower then 1080Ti and gone home.

  14. I think this is far too harsh an assessment. AMD doesn’t have the resources to build a balls-out compute chip and a balls-out graphics chip like Nvidia does right now. If Vega is as powerful at compute tasks as it appears it will be, and the price is right, AMD might be able to undercut Nvidia in some parts of the highly lucrative compute market while also delivering enough gaming performance on the desktop to at least remain in the high-end game. Balancing those two competing demands for performance is hard work and Raja has likely done the best he can under the circumstances.

  15. Looking at AMD’s target for Moar Margins, and ‘big Vega’ apparently featuring some as-yet-unannounced 8-stack HMB2 dies which are presumably pretty pricey and may not have the best yields, I wonder if instead of a Fury/X like branding and pricing structure we’ll instead see AMD launch something similar to Nvidia’s Titan cards, trying to straddle the very upper echelons of consumer pricing with ‘hey, this is totes nearly a pro card’ marketing. Maybe ‘lower’ Vega chips will even drop back to GDDR5x or GDDR6, as it would be a minimal bandwidth hit – if any hit at all, with Vega’s halved interconnect width – and a good cost saving, while targeting the same sort of pricing as the Fury/X did.

  16. The idea is that it reduces the bandwidth cost of overdraw by discarding overdrawn fragments before they get written to VRAM. Those fragments still get shaded to whatever extent the pipeline is forward, they just don’t leave the die. Deferred renderers at this stage are (mostly) shader-light and bandwidth-heavy, and saving that overdraw bandwidth is a really big deal. Forward renderers don’t tend to have any bandwidth issues at this stage in the first place, and they still have to pay the full shading cost for overdraw regardless.

  17. Intuitively, without doing any math, I have to agree with Jeff’s estimate. I expect 1080 performance. At best, slightly higher than 1080, at worst slightly lower than 1080.

    This is easily sufficient for 95% of the market, but AMD will be forced to price them relatively low, and they are expensive to make. This is why they are targeting the pro market first. They can’t really sell this card very profitably in the gaming market. If they could, they would be doing it already.

    Raja failed and must be thrown out.

  18. They haven’t really mentioned how this works but they’ve said it is more beneficial for deferred rendering workloads.

  19. Wait, why can’t the drawstream binning rasterizer help out with a forward renderer? You can still have (massive) overdraw issues on a forward renderer.

  20. And AMD has their work cut-out for themselves creating software to tempt deep learning customers.

    After all, Nvidia went trough the trouble of Creating DIGITS, and adding support for their product’s compute features in many 3rd-party apps.

    [url<][/url<] Those sweet margins don't come for free. They come when you're the first to open-up a new market, something that's pretty much alien to AMD.

  21. While the ALU/texturing/fillrate etc. performance estimates look great I suspect game performance will be seriously hampered by the four shader engine limit. For comparison polaris 10 (rx 480/580) have four SEs. Its a fundamental limit in GCN and its ability to break up work. The [url=<]vega linux patches[/url<] have indicated its still 4 SEs in vega. Navi can't come soon enough for AMD. :-\ From the linux patch: [code<] case CHIP_VEGA10: + adev->gfx.config.max_shader_engines = 4; + adev->gfx.config.max_tile_pipes = 8; //?? + adev->gfx.config.max_cu_per_sh = 16; + adev->gfx.config.max_sh_per_se = 1; + adev->gfx.config.max_backends_per_se = 4; + adev->gfx.config.max_texture_channel_caches = 16; + adev->gfx.config.max_gprs = 256; + adev->gfx.config.max_gs_threads = 32; + adev->gfx.config.max_hw_contexts = 8; [/code<] (edited with patch notes)

  22. I think we’re putting too much stock in Doom demos entirely. Being a forward renderer, it doesn’t give the DSBR a chance to do anything (which also seems like a plausible reason it was the demo getting shown early, assuming the DSBR was still being worked on).

  23. [quote<]Radeon RX products might make the most noise in the consumer graphics market[/quote<] You could also say they're quite hot at the moment.

  24. The Frontier Edition had two 8-pin power connectors and one of them is water cooled, so I think 275+W is very reasonable as far as expectations go.

  25. Even in late February it was still at least 4 months away from release. What games and settings? Are you sure that Ryzen’s gaming performance wasn’t the bottleneck? Even AMD’s FAD yesterday acknowledged that Ryzen is still slower than Intel even at 4k, even taking into consideration the platform’s performance improvements over the last 3 months.

  26. I thought Vega was demoed on games other than Doom though? I’m sure I’ve seen video footage of Battlefront running at 4K/Ultra/constant 60fps on Vega somewhere. If not on TR, perhaps on a linked article from TR.

    The 1080 manages ~50fps at 4K/Ultra. The 1080Ti about 85fps, A quick google shows that the Vega 4K/Ultra Battlefront demos FRAPSing at 65-70fps. Problem is, the videos are never of the same level, so it’s not really an apples-to-apples comparison.

  27. Poor sentence structuring on my part. I was referring specifically to the “stable” release drivers rather than the operating system.

  28. I tried a Vega ES at AMD’s Ryzen demos in late February and I wouldn’t say the performance was wildly different from my experience with it in December.

  29. #shotsfired “won’t be running a variant of Windows, since stability is such a concern.”

  30. Jeff I think you are putting far too much stock in the performance of the DOOM demo of a card 6-9 months before it will be released. If the card was ready back then, they would have released it a long time ago. It has probably been respun twice since then and who knows what speed the memory was running at or what state the drivers were in. Why they showed anything so far in advance was a highly questionable marketing decision but they did.

  31. The thing that no one has called them out on is the fact that margins are so high in these segments as Intel and Nvidia literally have ~100% market share in these segments so they can gouge their customers. It is a very natural assumption that competition in these segments will drive the margins of the segment down potentially significantly. So this is no silver bullet for AMD IMO.

  32. People running these devices for compute/deep learning likely aren’t going to be running the monthly release drivers that we as enthusiasts typically use (I know, they aren’t actually monthly anymore); rather they will be using WHQL drivers, or the equivalent since I’m sure most of these customers won’t be running a variant of Windows, since stability is such a concern.

  33. Those clock rates are pretty interesting considering how Polaris guzzled power going past ~1200… I’m a little behind on my rumor feed, are these made by GloFo too?

  34. Nope, 384-bit bus, 11.4 GT/s GDDR5X.

    EDIT: Oh, 587. Yes, it should be 547.

  35. Is the Titan Xp bandwidth figure a typo? 587 GB/s would need GDDR running at 12 2 Gbps. Nvidia has the bandwidth crown for consumer gaming cards, but I doubt think it’s quite that impressive.

  36. If they’re going after compute and halo graphics, they really need to tighten up the drivers. Those segments will literally remove and discard a working part if the supporting software trashes expensive work. It only takes one clearly driver-caused crash or error to burn a lot of goodwill and get your brand blacklisted at a site/under an admin.