The curtain comes up on AMD’s Vega architecture

2016 will be remembered for a lot of things. For graphics cards, last year marked the long-awaited transition to next-generation process technologies. It was also the year that the graphics card arguably came into its own as a distinct platform for compute applications. Not an Nvidia presentation went by last year wherein Jen-Hsun Huang didn’t tout the power of the graphics processor for self-driving cars, image recognition, machine translation, and more. The company’s various Pascal GPUs set new bars for gaming performance, too, but it’s clear that gaming is just one job that future graphics cards will do.

A block diagram of the Vega architecture.

AMD is, of course, just as aware of the potential of the graphics chip for high-performance computing. Even before ATI’s merger with AMD and the debut of graphics cards with unified stream processor architectures, the company explored ways to tap the potential of its hardware to perform more general computing tasks. In the more than ten years since, graphics chips have been pressed into compute duty more and more.

An unnamed Vega chip

AMD’s next-generation graphics architecture, Vega, is built for fluency with all the new tasks that graphics cards are being asked to do these days. We already got a taste of Vega’s versatility with the Radeon Instinct MI25 compute accelerator, and we can now explain some of the changes in Vega that make it a better all-around player for graphics and compute work alike.

Memory, memory everywhere

In his presentation at the AMD Tech Summit in Sonoma last month, Radeon Technologies Group chief Raja Koduri lamented the fact that data sets for pro graphics applications are growing to petabytes in size, and high-performance computing data sets to exabytes of information. Despite those increases, graphics memory pools are still limited to just dozens of gigabytes of RAM. To help crunch these increasingly enormous data sets, Vega’s memory controller—now called the High Bandwidth Cache Controller—is designed to help the GPU access data sets outside of the traditional pool of RAM that resides on the graphics card.

The “high-bandwidth cache” is what AMD will soon be calling the pool of memory that we would have called RAM or VRAM on older graphics cards, and on at least some Vega GPUs, the HBC will consist of a chunk of HBM2 memory. HBM2 has twice the bandwidth per stack (256 GB/s) that HBM1 does, and the capacity per stack of HBM2 is up to eight times greater than HBM1.  AMD says HBM stacks will continue to get bigger, offer higher performance, and scale in a power-efficient fashion, too, so it’ll remain an appealing memory technology for future products.

HBM2 is only one potential step in a hierarchy of new caches where data to feed a Vega GPU could reside, however. The high-bandwidth cache controller has the ability to address a pool of memory up to 512TB in size, and that pool could potentially encompass other memory locations like NAND flash (as seen on the Radeon Pro SSG), system memory, and even network-attached storage. To demonstrate the HBCC in action, AMD demonstrated a Vega GPU displaying a photorealistic representation of a luxurious bedroom produced from hundreds of gigabytes of data using its ProRender backend.

 

Geometry processing gets more flexible

Today’s Radeon GPUs retain fixed-function geometry-processing hardware in their front ends, but the company has observed that more and more developers have been doing geometry processing in compute shaders. Koduri notes that some of today’s games can have extremely geometrically complex scenes. He cited parts of the Golem City section of Deus Ex: Mankind Divided (that we incidentally use in our graphics-card testing) to prove his point.

The middle portion of that benchmark run has over 220 million polygons, according to AMD, but only about 22 million might need to be shaded for what the gamer actually sees in the final frame. Figuring out which of those polys need to be shaded is a hugely complicated task, and achieving better performance for that problem is another focus of the Vega architecture.

To accomodate developers’ increasing appetite for migrating geometry work to compute shaders, AMD is introducing a more programmable geometry pipeline stage in Vega that will run a new type of shader it calls a primitive shader. According to AMD corporate fellow Mike Mantor, primitive shaders will have “the same access that a compute shader would have to coordinate how you bring work into the shader.” Mantor also says that primitive shaders will give developers access to all the data they need to effectively process geometry, as well. 

AMD thinks this sort of access will ultimately allow primitives to be discarded at a very high rate. Interestingly, Mantor expects that programmable pipeline stages like this one will ultimately replace fixed-function hardware on the graphics card. For now, the primitive shader is the next step in that direction.

To effectively manage the work generated by this new geometry-pipeline stage, Vega’s front end will contain a new “intelligent workgroup distributor” that can consider the various draw calls and instances that a graphics workload generates, group that work, and distribute it to the right programmable stage of the pipeline for better throughput. AMD says this load-balancing design addresses workload-distribution shortcomings in prior GCN versions that were highlighted by console developers pushing its hardware at a low level.

 

Higher clocks and better throughput with the NCU

To achieve higher performance in certain workloads, Vega will be the first AMD GPU with support for packed math operations. Certain workloads, like deep learning tasks, don’t need the full 32 bits that GPUs offer for single-precision data types. Prior AMD GPUs, including Fiji and Polaris, have included native support for 16-bit data types in order to benefit from more efficient memory and register file usage, but the GCN ALUs in those chips couldn’t produce the potential doubling of throughput that some Nvidia chips, like the GP100 GPU on the Tesla P100 accelerator, enjoy.

All that changes with Vega and its next-generation compute unit design, called the NCU. (What the N really stands for remains a mystery). The NCU will be able to perform packed math, allowing it to achieve up to 512 eight-bit ops per clock, 256 16-bit ops per clock, or 128 32-bit ops per clock. These numbers rely on the fact that a GCN ALU can perform up to two operations per cycle in the form of a fused multiply-add, of course.

AMD also emphasizes that the single-threaded performance of the compute unit remains a critical part of its engineering efforts, and part of that work has been working to optimize Vega’s circuitry on its target process to push clock speeds up while maintaining or lowering voltages. AMD isn’t talking about the clock speeds it expects Vega to hit yet, but corporate fellow Mantor says that one source of IPC improvement is that Vega’s enlarged instruction buffer lets operations run “continuous at rate,” especially with three-operand instructions.

More efficient shading with the draw-stream binning rasterizer

AMD is significantly overhauling Vega’s pixel-shading approach, as well. The next-generation pixel engine on Vega incorporates what AMD calls a “draw-stream binning rasterizer,” or DSBR from here on out. The company describes this rasterizer as an essentially tile-based approach to rendering that lets the GPU more efficiently shade pixels, especially those with extremely complex depth buffers. The fundamental idea of this rasterizer is to perform a fetch for overlapping primitives only once, and to shade those primitives only once. This approach is claimed to both improve performance and save power, and the company says it’s especially well-suited to performing deferred rendering.

The DSBR can schedule work in what AMD describes as a “cache-aware” fashion, so it’ll try to do as much work as possible for a given “bundle” of objects in a scene that relate to the data in a cache before the chip proceeds to flush the cache and fetch more data. The company says that a given pixel in a scene with many overlapping objects might be visited many times during the shading process, and that cache-aware approach makes doing that work more efficient. The DSBR also lets the GPU discover pixels in complex overlapping geometry that don’t need to be shaded, and it can do that discovery no matter what order that overlapping geometry arrives in. By avoiding shading pixels that won’t be visible in the final scene, Vega’s pixel engine further improves efficiency.

To help the DSBR do its thing, AMD is fundamentally altering the availability of Vega’s L2 cache to the pixel engine in its shader clusters. In past AMD architectures, memory accesses for textures and pixels were non-coherent operations, requiring lots of data movement for operations like rendering to a texture and then writing that texture out to pixels later in the rendering pipeline. AMD also says this incoherency raised major synchronization and driver-programming challenges.

To cure this headache, Vega’s render back-ends now enjoy access to the chip’s L2 cache in the same way that earlier stages in the pipeline do. This change allows more data to remain in the chip’s L2 cache instead of being flushed out and brought back from main memory when it’s needed again, and it’s another improvement that can help deferred-rendering techniques.

The draw-stream binning rasterizer won’t always be the rasterization approach that a Vega GPU will use. Instead, it’s meant to complement the existing approaches possible on today’s Radeons. AMD says that the DSBR is “highly dynamic and state-based,” and that the feature is just another path through the hardware that can be used to improve rendering performance. By using data in a cache-aware fashion and only moving data when it has to, though, AMD thinks that this rasterizer will help performance in situations where the graphics memory (or high-bandwidth cache) becomes a bottleneck, and it’ll also save power even when the path to memory isn’t saturated.

By minimizing data movement in these ways, AMD says the DSBR is its next thrust at reducing memory bandwidth requirements. It’s the latest in a series of solutions to the problem of memory-bandwidth efficiency that AMD has been working on across many generations of its products. In the past, the company has implemented better delta color compression algorithms, fast Z clear, and hierarchical-Z occlusion detection to reduce pressure on memory bandwidth.

 

So how’s it play?

High-level architectural discussions are one thing, but everybody wants to know what Vega silicon will look like in shipping products. AMD constantly demurred about actual implementation details of its Vega chips last month, but we’ve speculated that the Vega-powered Radeon Instinct MI25 would pack 4096 stream processors running at around 1500 MHz thanks to its 25 TFLOPS of FP16 power.

AMD did have an early piece of Vega silicon running in its demo room for the press to play with. In the Argent D’Nur level of Doom‘s arcade mode, that chip was producing anywhere between 60 and 70 FPS at 4K on Ultra settings. Surprisingly, the demo attendant let me turn on Doom‘s revealing “nightmare” performance metrics, and I saw a maximum frame time of about 24.8 ms after large explosions. I also noted that the chip had 8GB of memory on board, though of course we couldn’t say what type of memory was being used.

Though that performance might not sound so impressive, it’s worth noting that all of the demo system’s vents (including the graphics card’s exhaust) were taped up, and it’s quite likely the chip was sweating to death in its own waste heat. By my rough estimate, that puts the early Vega card inside somewhere between the performance of a GTX 1070 and a GTX 1080 in Doom. If that’s an indication of where consumer Vega hardware will end up, we could have a competitive year to look forward to from AMD. Until we learn more about Vega, though, we’re left with only this tantalizing taste ahead of the card’s first-half-of-2017 release.

Comments closed
    • Ph.D
    • 3 years ago

    That Logitech K120 keyboard at the end though…
    Now there’s a goddamn product!

    • jts888
    • 3 years ago

    Random speculation on the High Bandwidth Cache controller:

    It just occurred to me that it could actually be just what it sounds like: a massive block of CAMs that allow the HBM to be accessed as an enormous L3 cache. It wouldn’t be possible with 8+ GBs of normal 64B cache lines, but if host transfers were done on, say, whole 4-16 kiB pages, Vega would only need a few million on-die table entries (several dozen register bits plus dedicated XOR matching tree each), which would be an extravagant but not completely insane expenditure of transistors.

    I think this approach would work best with read-only data, since tracking coherency state per 64B host-native cache line would be costly and be PCIe-choking to update multi-kB lines every time a single bit flip caused a remote invalidation. However, this overall system would be the most robust way I can think of to have perhaps terabytes of assets available in a completely software transparent manner.

    Even if the entire HBM pool isn’t treated as cache to save die space (i.e., dedicated purely local buffers don’t need tracking), the cache pool doesn’t need to be much larger than the working set for geometry/texture assets for an individual frame, which can’t practically exceed a few GBs and can’t fundamentally exceed 8 GB on a 500 GB/s bus in 60+ fps situations.

    • Klimax
    • 3 years ago

    Now things are getting interesting. Biggest difference I see between Nvidia and AMD is approach to memory. While Nvidia keeps HBM(-like) only for GP100 for pure compute, AMD puts it to regular high-end as well. Also AMD is promoting size quite a lot more.

    But as there are differences there are similarities or outright near-identical things. Like going after geometry and its (in)efficiency. Including similar approaches.

    Well, comparisons will be fun. Bigger fun will be seeing how many DX12/Vulcan games will get updated for Vega. And those who will not how will they fare. Remember, there is whole new set of problems with low level APIs. There will be in future, testing, testing and more testing.

    Expect fun times ahead and even more fun arguing!

    • rudimentary_lathe
    • 3 years ago

    Yay for higher clocks. Double yay for more performance per core.

    I wonder if AMD is using its flagship in these demonstrations. Hopefully not, because even with beta drivers and possible thermal issues from the lack of ventilation, performance between a 1070 and 1080 would be disappointing. Especially with the 1080 Ti on the way soon. Hopefully they’re demoing the 1080 competitor, and have the 1080 Ti competitor waiting in the wings to wow us all.

    • vaultboy101
    • 3 years ago

    Dear Damage,

    I hope you are reading this!

    Can you or anyone confirm with clarity please.

    Is Vega part of the GCN architecture or has AMD totally discarded GCN for Vega NCU in the same way VLIW was dumped for GCN.

    As an RX480 Polaris owner I am slightly alarmed at the implications for driver support for pre VEGA GPUs going forward.

    Clarification from you, Raja Koduri or even Jeff Kampmann would be most reassuring.

      • jihadjoe
      • 3 years ago

      It looks like it might be a break away from GCN. From [url=https://www.reddit.com/r/Amd/comments/5m5eep/spoilers_exclusive_amd_vega_presentation/dc0yb9j/<]a reddit comment[/url<] on the Vega details: [quote<]2. NCU confirmed new Compute Units (CU). Not based on GCN as the end-notes are distinctly bringing up the fact that GCN is composed of CUs with 64 Shaders (4x 16 ALU SIMDs). NCU is not going with this config anymore. The config is flexible and per their patents, it can be 1x 16 ALU + 3-5 x 8/4/2 ALU SIMDs & 2x Scalar ALU (twice as many as GCN, really fast for certain ops). From the looks of their diagram, I would guess, twice the # of CUs, hence, twice the Ops per clock when compared to GCN (which AMD is comparing against Fiji/Fury X). This means per CU, ALU count will drop to 32. The derived SIMD config will be potentially: 1x 16 ALU & 1x8 & 1x4 & 1x2 + 2 Scalar ALU = 32.[/quote<]

      • Srsly_Bro
      • 3 years ago

      -casual

    • synthtel2
    • 3 years ago

    [url=https://techreport.com/r.x/amdvegapreview/oldl2.png<]This slide[/url<] and the associated text saying textures also don't get to use the L2 explain a bit about Fiji's ill performance and why AMD didn't want to use anisotropic filtering in their own Fiji benches. That many TMUs with only a tiny L1 between them and high-latency HBM sounds like quite a problem. This also does a lot to generally explain why GCN has always seemed to need so much more memory bandwidth than equivalent Nvidia parts. It isn't quite the problem I've been predicting, but it's pretty close. Vega is sounding fairly awesome for both compute and gaming, if this all works out. Packed math is a big deal, it sounds like they've been doing good work on their rasterization weaknesses (and that texturing thing), and various stuff here bodes well for power efficiency. The big thing that strikes me as conspicuously absent is something to compete with Nvidia's SMP. It mostly isn't a huge deal outside of VR, and in fixing up rasterization they've done a lot to reduce the penalty of not having it, but it is still a pretty big deal for VR.

      • jts888
      • 3 years ago

      I think only the ROPs/RBEs are considered the “pixel engine” in this explanation.
      I.e., texture samplers in CUs go through local L1 to global striped L2 and have full coherence, while packed RBE writes bypassed the L2 cache since there has been traditionally few occasions to read that data again any time soon. However, bypassing L2 also meant bypassing any sort of coherence controls, so shaders that need to read output buffers need to employ costly higher level synchronization barriers.

      The new setup doesn’t even need to necessarily cache writes, maybe just handle snooping/invalidation broadcasts to make some types of shading effects cheaper than they were previously.

        • synthtel2
        • 3 years ago

        That’s what I had been thinking and what the slide alone implies, but the text below it, not so much:

        [quote<]In past AMD architectures, memory accesses for textures and pixels were non-coherent operations, requiring lots of data movement for operations like rendering to a texture and then writing that texture out to pixels later in the rendering pipeline.[/quote<] It sounds a lot like TMUs are counted under "pixel engine" for that purpose, and that would seem to explain some things nicely.

          • jts888
          • 3 years ago

          In GCN, texture address generation and loading is done in the CUs (4 per CU I think?), and compute shaders have access to them in addition to the ALUs but not to the RBEs etc.

          The formerly missing coherence between the “compute engine” and the ROP/RBE “pixel engine” comes from here I think. Thanks for highlighting the text info though.

            • synthtel2
            • 3 years ago

            That’s why I hadn’t considered this before – TMUs are a pretty integral part of the CUs, unlike all the rasterization stuff. I had assumed that this meant they use the cache hierarchy in the same way as the rest of the CU, but I don’t think there’s any reason it has to be that way.

            • jts888
            • 3 years ago

            Modern TMUs are really just independent address generators bolted on to load/store units, which need to be coherent for the sake of “typical” compute shaders anyway.

            • synthtel2
            • 3 years ago

            A lot of their work is actually done by the associated SPs these days, but is it really that much? It seems like a kind of problem that could still give chips with more fixed function hardware for it a notable perf/W advantage.

            I need to go scan through AMD’s publications at some point to figure out more FFU implementation details. I assume what you’re referencing is in there somewhere?

            • jts888
            • 3 years ago

            Just look for the slides from Layla Mah’s GCN presentation. It’s the best architecture intro IMO.

            • synthtel2
            • 3 years ago

            I’ve read [url=http://www.slideshare.net/DevCentralAMD/gs4106-the-amd-gcn-architecture-a-crash-course-by-layla-mah<]that one[/url<] before and don't recall too much on this (though it is a good presentation). Reading it again, it looks like the most relevant slide is 49, which says TMUs still do a lot more than you were just saying.

            • jts888
            • 3 years ago

            I glossed over the decompression and format conversion bits, but I’m pretty confident in my general interpretation/assertion that the TMUs are just clients of the memory hierarchy via the L1D$s the same as “pure” ALU shaders.

            Similarly, I still think that “pixel engine” = just RBEs, which in GCN are known to have coalesced, writes not cached via the MC (i.e., L2) as well as private depth/color caches. It would be nice for Jeff to weigh in, but I believe that his statement only meant that TMU and RBE writes and non-coherent [i<]with respect to each other[/i<] since the RBEs ignore L2, and that there was no intended implication that TMUs were considered part of the "pixel engine". Again, TMUs have always been available in compute shaders, so it would seem rather odd to me for AMD to suddenly start saying they weren't part of the "compute engines".

            • synthtel2
            • 3 years ago

            It would make more sense internally that way, but I’m still skeptical because of how perfectly it would explain various performance results we’ve seen. I suppose I could arrange to test this, though I won’t have the proper resources handy for a while yet. Actually, if I ever get my hands on Fiji for some reason, there’s a lot I’d like to test about it.

            • jts888
            • 3 years ago

            I’m a more general computer engineer, uninvolved in any way professionally with GPU or SIMT design, and I’ve only been speculating on what I think sensible design would be combined with public semi-technical details.

            If you’re into (GP)GPU programming at either the professional or enthusiast levels, I’m sure you can verify actual artifacts far better than I can reason from first principles and no personal data, so I’d still be delighted to hear whatever you can find in your own digging, since that just provides more clues to ponder over…

            • synthtel2
            • 3 years ago

            Hey, I’m mostly guessing too, and different angles on this sort of thing are always interesting. I’m a game developer / programmer with a lot of interest in graphics and hardware, but I wouldn’t call myself a GPU programmer quite yet. The plan for work is that I’ll be spending a lot more time this year hands-on with graphics stuff, and in the process I should get the skills I need to investigate this sort of thing first-hand (which will be great fun). If I find out anything solid on this, I’ll definitely post about it over in the forums, and I could ping you about it too.

      • juzz86
      • 3 years ago

      Don’t want to intrude on the chain but it’s been enlightening listening to you both – thanks for sharing your thoughts.

        • tootercomputer
        • 3 years ago

        I concur with your sentiment. The only concern I’ve had reading this back and forth discussion is that I have absolutely no idea what they are talking about.

          • juzz86
          • 3 years ago

          Rare to see such substance out here!

    • Mat3
    • 3 years ago

    [quote<]By my rough estimate, that puts the early Vega card inside somewhere between the performance of a GTX 1070 and a GTX 1080 in Doom [/quote<] Isn't Fury X already doing that?

      • ronch
      • 3 years ago

      Well, this is the next generation approach to getting that sort of performance. So um, yeah.

      • DoomGuy64
      • 3 years ago

      Yes. From what I’ve heard, the current performance is slightly above or at the 1080, but drivers are beta and the doom benchmarks were done with a mostly unoptimized Fiji driver while running debug software. The Vega optimizations haven’t been completely patched in yet, so we should see further performance boosts at the official launch.

      Sub-par launch drivers seem to be a typical AMD problem, so I don’t doubt this rumor too much. It may take AMD an extra month or so to work everything out, but seeing as performance is already on par with the 1080, I’m not too worried about it.

    • Chrispy_
    • 3 years ago

    Great to see more about Vega, but after reading that I have only one question:

    Why tape up the vents?

      • DPete27
      • 3 years ago

      They tape the vents so nosy tech press can’t take pictures of the card inside. Seems like an oversight when they could’ve just used a case with a front door (taped shut) and no side/top vents.

      • meerkt
      • 3 years ago

      To hide their real trump card against Nvidia: RGB*W* LEDs!

    • AnotherReader
    • 3 years ago

    I wish we had some performance data to go with the architectural details. Some thoughts about this are below:
    [list<] [*<]ROPs having access to the L2 cache should help in deferred renderers; such games are increasingly common. [/*<][*<] I wonder if AMD’s shader compiler can add the primitive shader transparently, or if it has to be explicitly called by the game. [/*<][*<] [url=http://images.anandtech.com/doci/11002/Raja_575px.jpg<]This image[/url<] suggests that at least one variant of Vega has only 2 HBM2 stacks. That would limit it to having the same bandwidth as Fiji at best. [/*<] [/list<]

      • RAGEPRO
      • 3 years ago

      Unless the HBM stacks run at a higher clock rate.

        • AnotherReader
        • 3 years ago

        HBM2 is supposed to top out at [url=http://www.anandtech.com/show/9969/jedec-publishes-hbm2-specification<]twice the bandwidth of HBM1 per stack[/url<]. I suspect that the new bandwidth saving techniques will allow this to be sufficient if it is the top end model. Do we know if there are multiple Vega die sizes like Polaris?

        • ImSpartacus
        • 3 years ago

        That’d be out of spec for HBM available for purchase today.

        Given the limited production nature of HBM and how GP100 actually [i<]underclocked[/i<] their HBM by over 25%, I wouldn't expect AMD to factory [i<]overclock[/i<] their HBM. The extra validation/binning would add even more expense to an already-pricey technology. And besides, 512 GB/s is already plenty. Comparatively, GP102 churns out 480 GB/s in the Titan X, and the top Vega is anticipated to perform in the same ballpark.

          • RAGEPRO
          • 3 years ago

          Just offering up a possibility guys.

      • Ninjitsu
      • 3 years ago

      That would make sense, since memory bandwidth was not Fiji’s problem.

        • jts888
        • 3 years ago

        Fiji had several problem, and its realizeable bandwidth actually was surprisingly low.

        IIRC, it peaked at only ~65% of its theoretical limit (333 GB/s of 512 GB/s), while better GDDR5 designs seem to reach ~75%.

      • ptsant
      • 3 years ago

      The bandwidth is the same at 512GB. It is more than competitive. Cost is much lower and also you get 8GB instead of 4GB. I don’t know how latency fares vs HBM1, but it should be lower.

        • ImSpartacus
        • 3 years ago

        I think their only problem is that they won’t have a 16GB option unless they get a bigger interposer and cram two hbm chips per controller “butterflied” (or whatever the term is).

        Meanwhile, the 1080 Ti could easily sport 12GB of vram.

        We’re getting away from the times where people judge graphics solely on vram capacity, but it’s not hard to find folks like that in the wild.

    • deruberhanyok
    • 3 years ago

    Underwhelming for a reveal. I guess expecting some concrete info on Vega-based products was unrealistic, but maybe it’s going to be much later in “1H 2017” than we thought?

    (I was under the impression they were targeting “Q1” for consumer products, was that wrong?)

    (Also, if performance falls about where you guys are guessing, how late can they push this before NVIDIA decides to do a Pascal refresh / launch Volta, and then they’re playing catch-up again?)

      • RAGEPRO
      • 3 years ago

      I think it’s worth mentioning that you may not want to discard the possibility that final Vega silicon will be competitive with a refreshed Pascal, or even Volta.

        • deruberhanyok
        • 3 years ago

        Agreed, although I was just basing that on the (very early) reported performance shown by that Doom test mentioned in the article. And there’s no way of knowing how final that is – could have been running at half speed and sitting happily between 1070 and 1080 performance, in which case they’d be in good shape even with new NVIDIA parts later this year.

        But since they didn’t really announce any products here, just told people, “hay guyz, Vega is this thing we are working on” (which we already knew) “here is some detailz” (which we didn’t know), “oh hai one benchmarkz” (which TR got to play with for only a tiny bit), we’re just left to speculate. And AMD has a history of lots of talk without solid details leading to ultimately underwhelming product.

        If they wanted to drum up excitement among gamers that would be big champions of the platform, this isn’t doing it.

        • Wild Thing
        • 3 years ago

        This is a hardcore NV fanboy website.
        Of course they won’t consider such a “preposterous” outcome.

          • derFunkenstein
          • 3 years ago

          You’re hilarious, because “he” is part of the “they” and here he is, considering a “preposterous” outcome.

            • RAGEPRO
            • 3 years ago

            Nevermind my name. 🙂

            • deruberhanyok
            • 3 years ago

            I think that was directed at me. I’m a hardcore NVIDIA fanboy now. That’s cool.

            I am a little annoyed that he seems to think preposterous is a made-up word that requires quotation marks, though.

          • I.S.T.
          • 3 years ago

          Are you serious? If it has a bias, it’s closer to AMD/ATi historically. The reviews have felt more even the last few years, even with Damage going to work at AMD.

          However, the key word in that second sentence is if. I don’t think it has one, I just think the reviews were slightly too praising of AMD/ATi at times. Which, well, can happen without company wide bias. It just means you like a product to a super high degree that you praise it a bit too much. That’s fine and normal human behavior. Nothing to feel guilty about, and [i<]certainly without a shadow of a doubt[/i<] nothing to yell fanboy over. This isn't 2003 with people praising the 5800 other than it's technically superior filtering(If it wasn't for driver ****ery it would have been better due to the angled AF solution that both companies wound up using for many, many years. AMD used it for about a decade!) and it's better support of OGL and old games. Other than those traits, it was a useless POS. Sites that put it above the Radeon 9700 Pro and 9500 with the physical hack applied were biased, or given how ****ing shady Nvidia was until the last few years(And it's still shady, just not nearly as much. I don't trust them one bit), I would stake 30 bucks on them buying off a few sites. I say all of this as someone who uses Nvidia products exclusively for their usually better and more consistant drivers, their better performance for somewhat older games as well as unfortunately being locked into PhysX 'cause I own like four or more PhysX related games and there's more that I want to play. At least five. AMD's main advantage is they've always balanced their products for future games much, much, [b<]much[/b<] better than Nvidia ever has and probably ever will. Even when their GPUs are better than Nvidia's(The first GCN high ends cards wound up faster than the 680 when proper driver support in the backend of the driver and the higher clockrates of the GHZ Edition as well as the later re-releases), they still wind up far, far better off several years later than Nvidia cards of the same gen and same tier. It's happened so many times that it has become verifiable fact. So, please, drop the bias talk. There's objective facts involved, and often Nvidia is better than AMD in the short to medium term, and for years they had much better frame pacing(AMD caught up and that was the best thing they've done since GCN by far). They also have often strongarmed companies into giving better graphics for Nvidia cards. It's immoral, but it is an advantage and makes the games look better. At least the gameworks crap works on AMD cards, just way slower for reasons that should be obvious. Drop the fanboyism. It's not healthy for the mind. It leads to anger. Trust me, I have been there and done that and become much better and had a healthier relationship with the subjects I was hardcore fanboying over since. Less anger is usually a good thing unless it's time for a righteous fury(Think civil rights, etc. Not video cards).

            • Spunjji
            • 3 years ago

            Bloody well said, sir.

        • Klimax
        • 3 years ago

        That would be wildly optimistic and unbacked by anything either in recent past or current. Very improbable.

          • RAGEPRO
          • 3 years ago

          I didn’t say it’s probable. 🙂 For what it’s worth I don’t think it’s especially likely.

          Still, it is a completely new GPU design. It certainly isn’t as far-fetched as other recent events.

      • Pholostan
      • 3 years ago

      The only release window have been H1 2017 since their investor presentation back I don’t remember when.
      So June probably.

    • odizzido
    • 3 years ago

    neat.

    • Ninjitsu
    • 3 years ago

    why is “HBC” a thing. Yay more confusion.

      • jihadjoe
      • 3 years ago

      They should introduce a product with a low-bandwidth cache, just to emphasize the difference.

        • Ninjitsu
        • 3 years ago

        lol. But I mean, it’s not a cache, is it? It’s VRAM, just HBM? Or that’s what i understood from the article.

          • RAGEPRO
          • 3 years ago

          The idea is that they’re trying to use terminology to change the way people think about “video RAM.” Since the cards are no longer just “video cards” but in fact multi-function compute accelerators, they’ve created a system such that the card can use resources from anywhere, including local or networked storage.

          In that context, the local memory is now “high-bandwidth cache” for a much larger dataset.

            • Voldenuit
            • 3 years ago

            Then system RAM is just as misleading, because pagefile.sys.

            • meerkt
            • 3 years ago

            AKA “marketing”. 🙂 Also the CPU’s RAM is actually “a cache” to resources such as local and remote storage…

            • RAGEPRO
            • 3 years ago

            It’s a little different, I think.
            [quote=”Jeff Kampman”<]HBM2 is only one potential step in a hierarchy of new caches where data to feed a Vega GPU could reside, however. The high-bandwidth cache controller has the ability to address a pool of memory up to 512TB in size, and that pool could potentially encompass other memory locations like NAND flash (as seen on the Radeon Pro SSG), system memory, and even network-attached storage. To demonstrate the HBCC in action, AMD demonstrated a Vega GPU displaying a photorealistic representation of a luxurious bedroom produced from hundreds of gigabytes of data using its ProRender backend.[/quote<]This reads to me like the GPU can access such resources directly, as if they were local memory; perhaps some kind of shared addressing scheme or something. I don't really know enough about the low-level of this stuff to talk about it knowledgeably though. Point is, I don't think it's 100% just "marketing".

            • jts888
            • 3 years ago

            There might be logic to facilitate or automatically trigger local loading, but even host DRAM accesses over PCIe take a few tenths of a microsecond, so remote accesses absolutely cannot be in general handed in a completely transparent fashion.

            • RAGEPRO
            • 3 years ago

            Sure, of course. You clearly know more about it than me, but judging by your other post in this comment chain I think you have the right idea. In any case it’s not simple “marketing”, heh.

            • meerkt
            • 3 years ago

            That’s just good ol’ virtual memory, page fault handling, etc.

          • jts888
          • 3 years ago

          I’m guessing they want to lean more on their asynchronous IO model, so that entire hierarchies of non-volatile and even networked data can be viewed in a transparent flat address space.

          Seems to mesh with the new 49b virtual address space.

            • Tirk
            • 3 years ago

            I agree with you about it working nicely with asynchronous IO model.

            Also, I see this move as another step towards HSA’s goals. If they try to compete by always having more on chip cache than their competitors they will lose to anyone who starts using a smaller node first. They need a heterogeneous memory environment so they can find cheaper ways to compete. This seems to allow them memory stretching room by having a chip that can handle multiple memory types much more easily. E.G. a discrete GPU with HBM and GDDR5x both included on board. Or an APU that better balances having HBM and additional DDR4 system memory. Intel’s Iris Pro graphics only start to compete with AMD’s APUs graphics wise when they shove on huge on chip caches. AMD doesn’t have the same real estate on chip to devote to such large caches. I see this as one potential solution.

            If AMD tries to compete merely by trying to emulate their competitors strategies they will lose. They have a smaller research budget and market share that has thus far prevented that sort of brute strength strategy. Out of the box strategies have been what has kept AMD afloat and I hope it continues to do so.

          • willyolio
          • 3 years ago

          The point is that the shaders (or compute units) on the GPU don’t need to worry about how much memory is available and how fast. The HBC takes everything – on-board VRAM, system ram, SSD, on-board SSD (like the weird video card AMD put out a few months ago) and automatically figures out what needs to be stored where, what needs higher priority, what the CUs are going to need next, etc.

          thus a 4GB video card isn’t limited to 4GB, it sort of behaves as if it has unlimited memory and the HBC takes care of the background work and shuttles things between all available memory pools more efficiently.

    • chuckula
    • 3 years ago

    The estimated die size for that GPU is on the order of 500 – 550 mm^2. That definitely puts in the high-end class of GPUs. For reference, the GP104 die for a GTX-1080 is about 315mm^2 and AMD’s biggest current generation GPU is Fiji at about 596 mm^2.

    Incidentally, the GP102 die for the Titan X is only 471mm^2.

      • the
      • 3 years ago

      It is in the same range as GP100 though and smaller than the massive 601 mm^2 that the GM200 used in the original TitanX.

      • AnotherReader
      • 3 years ago

      How was the die size estimated?

      Edit: I read that one HBM2 stack from SK Hynix has dimensions of [url=http://www.anandtech.com/show/9969/jedec-publishes-hbm2-specification<]7.75 mm × 11.87 mm[/url<]. So a photograph of both the Vega die and the HBM stacks would allow us to estimate die size.

        • chuckula
        • 3 years ago

        Using the photograph that is provided in the article (which shows the long edge of an HBM2 die by that guy’s finger) I get 572 mm^2 using pixel counting. That is likely a somewhat high estimate, but this chip is definitely big and 550 mm^2 is certainly a reasonable ballpark estimate.

          • AnotherReader
          • 3 years ago

          Thanks for taking the time to reply and take the measurements. That sounds reasonable. It would have to be the long edge; if it were the small edge, it would be too small to be a 1080 competitor. Even doubling Polaris 10 would increase die size to 464 mm^2 and this should be smaller with the simpler HBM PHYs. I suspect that the changes to accomodate higher clock speeds and more fp64, fp16, and int8 units have made it enormous.

      • AnotherReader
      • 3 years ago

      As AMD doesn’t have the marketshare of the GPGPU market that Nvidia has, it can’t afford to design two separate GPUs for gaming and GPGPU. Therefore, we should compare Vega’s die size to GP100 rather than GP102.

      • Srsly_Bro
      • 3 years ago

      It’s Titan XP, stop being a casual. The p is for Pascal.

      • ptsant
      • 3 years ago

      What is the GPU portion and what is the HBM2 portion of the 500mm2?

        • ImSpartacus
        • 3 years ago

        The gpu is in the ~500mm2 range.

        [url<]http://wccftech.com/amd-vega-gpu-pictures-hbm2-official/[/url<] They generally know the size of an hbm chip, so they use that to estimate the size of the gpu on the same package. So yeah, if this isn't very competitive with GP102, then AMD is going to have a bad day.

          • jts888
          • 3 years ago

          GP102 isn’t really Nvidia’s strongest showing either. It’s 50% bigger than GP104 (and who knows how much more costly to manufacture), but after yield salvaging cuts and downclocks in TXP form is only ~30% faster than a stock 1080, which overclocks much better.

          Both the 1080 Ti and bigger Vega should strive to match or exceed the perf/area of the 1080.

            • ImSpartacus
            • 3 years ago

            The 1080 Ti will almost certainly be based on gp102, so I hope it’s a “strong” enough showing to do battle with big Vega (and every indication is that it is).

            You’re right that GP104 is that sweet spot for efficiency. But it’s been that way for a couple years now (980, 680, etc), so I don’t expect much change in that arena.

            I think it’s short sighted to judge gp102 based on the 2016 Titan X. That was just a quick grab at a “halo” card because Nvidia seems to love their single gpu performance crown. I expect that it’s cut down and “underclocked” nature were a function of time to market.

            My humble prediction is that the 2017 Titan X (if that’s its name) will be a fully enabled GP102 with higher clocks (slightly higher than whatever the 1080 Ti has) and 24GB of vram (potentially 11Gbps gddr5x if the timing works). This would be similar to how the Titan Black used unlocked cores, higher clocks and “butterflied” vram to make itself into an upgrade to the original Titan despite both of them using gk110.

            Nvidia has released one titan per year every year since the first titan. So it’s looking like there’s got to be some kind of titan for this calendar year.

            • jts888
            • 3 years ago

            I’m not expecting uncut GP102 to become a new Titan so much as to keep selling as even more pricey workstation cards. There’s a very narrow band of potential Vega performance levels where a better GP102 SKU is both necessary and sufficient.

    • cmrcmk
    • 3 years ago

    Why were the demo box’s vents sealed? Were they trying to make a point or did someone not unpack it correctly?

      • Goty
      • 3 years ago

      Probably more to stop nosey journalists with cameras from trying to sneak a peak.

        • chuckula
        • 3 years ago

        They have a point. Using a cheap smartphone camera picture taken at a bad angle of components that are covered by heatsinks in a dark box, I could easily reverse engineer Vega’s design down to the transistor level using my CSI enhance button.

          • tipoo
          • 3 years ago

          [url<]https://www.youtube.com/watch?v=hkDD03yeLnU[/url<]

            • derFunkenstein
            • 3 years ago

            I can also create a gooey interface using virtual basic

            • tipoo
            • 3 years ago

            I love (aka internally cringe) when people say Graphical User Interface Interface

            • derFunkenstein
            • 3 years ago

            Yeah, that goes with PIN number and ATM machine on the Redundant Department of Redundancy Department’s list of great acronyms.

            • Welch
            • 3 years ago

            HDD drive…

            • Mr Bill
            • 3 years ago

            ICBM missile
            LCD display

            • Srsly_Bro
            • 3 years ago

            That is common in many commonly used acronyms. You must cringe a lot.

            • derFunkenstein
            • 3 years ago

            I know I do!

          • Goty
          • 3 years ago

          I never claimed it made any sense!

      • Ninjitsu
      • 3 years ago

      Maybe it was too noisy? 😮

      • TEAMSWITCHER
      • 3 years ago

      The only reason to do this is to prevent people from noticing that it is not a Engineering sample.

        • Krogoth
        • 3 years ago

        Occam’s Razor my friend.

        It is more likely that it seal-up to prevent journalists from taking a sneak peek at [b<]confidential[/b<] IP that is still under NDA. It is common practice among the big hardware guys at trade shows in recent years.

        • jts888
        • 3 years ago

        A shot from the last event didn’t have everything taped up quite perfectly, and it looked like a tower-style CPU cooler might have been bolted onto the card. The chip may or may not have been final silicon, but it was clearly not very near a shippable complete product.

          • Voldenuit
          • 3 years ago

          A tower cooler on a GPU, hehe.

          I mean, it makes sense*. GPUs have been drawing power than (most) CPUs (AMD’s FX line notwithstanding) for a while now, yet are constrained by the size limitations of the PCIE spec.

          This could suggest that Vega is not going to be a low-power card, although that’s a bit obvious (it’s a high end card, after all). Whether it is much better, about the same, or much worse at fps/watt compared to a 1080 is probably too early to tell.

          *EDIT: makes sense for an engineering sample/tech demo, that is. I’m not suggesting that AMD would sell GPUs with 165mm-tall heatsinks on them.

        • ImSpartacus
        • 3 years ago

        savage af, son

      • gerryg
      • 3 years ago

      I think it’s to keep the hardworking graphics gnomes inside from escaping. No way it could have been real hardware in there.

      • ptsant
      • 3 years ago

      Industrial espionage is real. I can imagine someone with a miniscule camera taking photos of the interior and deducing some key variables (say, board design, power consumption, cooling needs, whatever). It is quite impressive what people deduce from fuzzy photos of the die on the internet.

      Anyway, I don’t think AMD is sealing the vents for fun.

      • Umbral
      • 3 years ago

      Why? Because they’re running it on the best Intel CPU they can get their hands on.

      • Mr Bill
      • 3 years ago

      Interesting don’t ya think, that it ran fine smothered in heat? Wonder how much faster it is when cooling is good.

    • DPete27
    • 3 years ago

    Interesting that Vega includes tile-based rasterization after the David Kanter article. Although that was only 5 months ago so AMD was probably already headed down that path with Vega already.

      • DPete27
      • 3 years ago

      Oh, some are saying Vega is still ~6 months out. So maybe AMD only found out about Nvidia’s tile-based rasterization from the Kanter article. I guess the same as AMD was only alerted about frame-time consistency with Scott’s inside the second article. Oh well, I guess it’s easier to follow in the footsteps of others.

        • tipoo
        • 3 years ago

        I don’t think even 1 years notice is enough to implement that in a new GPU architecture. Seems at the foundational level of how it works. And AMD and Nvidia no doubt do their own investigations on each others hardware to more depth than any of us read, so maybe they knew Nvidia did it well before us.

        • deruberhanyok
        • 3 years ago

        There was some guesswork when we saw the performance increase of Maxwell over Kepler about how they did it on the same process node, and the Kanter article confirmed what some suspected but didn’t have the means to confirm.

        Anyways, TBR has been around for well over a decade – it was in PowerVR Kyro cards I reviewed back in 2001-ish – and the performance benefits it provided were pretty obvious even then. They’ve been using it in mobile GPUs ever since, since it helps with lower bandwidth.

        Honestly I’m just surprised it has taken them this long. I figured it would be a feature rolled into GCN since that was a big architecture change from the VLIW5 they’d used prior (and the very short-lived VLIW4), but it never came along. Some stuff like fast z-clear and hi-z-buffer used some of the same principles, but Vega might have been the first “big enough” change where AMD had opportunity to implement it.

      • jihadjoe
      • 3 years ago

      Just goes to show how on-point David Kanter is! I hope we hear/see him again on another podcast once Ryzen and Vega are out.

      • Andrew Lauritzen
      • 3 years ago

      Nah, way too short time window. And honestly the whole industry is heading towards a hybrid tiling IMR sorta place long before even Maxwell shipped. What NVIDIA did is neat, but I doubt it was a shock to any of the main GPU IHVs. I certainly know several folks who knew about this going on soon after Maxwell launched too 🙂

      • mesyn191
      • 3 years ago

      Supposedly NV and AMD have had some form of tile based rasterization for efficiency purposes for years. I think AMD has had it since at least the 4000 series and NV since the GF3 series GPU’s.

      Its very much a low level thing though and the way both are implementing it is very different from how a proper hardware TBDR like the Kyro II would actually work. Both AMD’s and NV’s GPU’s are still primarily IMR’s, they just use some tiling methods to boost efficiency for certain things.

        • Voldenuit
        • 3 years ago

        Are you thinking of early Z culling? That’s not the same as a tile based renderer.

          • mesyn191
          • 3 years ago

          Its a form of tiling though. And while it isn’t the same thing as a proper TBDR, as I pointed out already in the post you replied to, its clearly using some of the concepts.

          Not even the new NV GPU’s that Kanter did a (great) article on a while ago now are proper TBDR’s. Its still a IMR. It is however using TBDR techniques to improve performance. But that is presented as if its something totally new on various sites and it really isn’t. Both NV and AMD have been cribbing from the TBDR concept in various ways for years to improve efficiency and/or performance.

          Neither has actually implemented a proper TBDR because for key things they’re either slower than a IMR or very difficult to make as fast as IMR (not sure which really but I’ve seen both reasons given at times) hence what I guess is now being called a “hybrid” approach by some.

            • Mat3
            • 3 years ago

            A true TBDR simply isn’t possible for a desktop performance part. The storage cost to needed to bin all the triangles before rasterization is too high.

    • NovusBogus
    • 3 years ago

    I’m in the market for something one notch above a 1060/480, so I’m eagerly awaiting this one. All signs point to a continuation of their long-overdue refocus on efficiency and optimization.

      • jts888
      • 3 years ago

      There are rumors currently of a big and small Vega being released, but this ~550 mm^2 chip seems to be well over 2x as powerful as the RX 480, not exactly “one notch”.

      A ~3/4 size variant using 384b GDDR5(X) is the best shot the new architecture has at a “mid-high” range product, unless a 1x HBM2 or 256b GDDR5(X) version could outstrip the RX 480 by a surprising degree.

    • ronch
    • 3 years ago

    He-Man + Skeletor = MANTOR !!!

    • ronch
    • 3 years ago

    Back in the 90’s and up to the Playstation 2 hardware makers were throwing around polygon performance quite often but since the early 2000’s those numbers seem to have become harder and harder to come by. Any idea how today’s graphics chips do in polygon pushing? I bet my lowly HD7770 can chew my Voodoo3 3000* and spit it out like a boss.

    * = capable of a whopping 7 million triangles in 1999!!

      • derFunkenstein
      • 3 years ago

      It’s kinda variable these days thanks to GPU boost clocks, but the triangles per clock * the number of Hz = theoretical triangles.

      The full GP104 (GTX 1080) can do [url=https://techreport.com/review/30281/nvidia-geforce-gtx-1080-graphics-card-reviewed<]four triangles per clock[/url<] for around 6.5-7 billion per second, based on a 1.6-1.8GHz max boost speed. One of those apparently got disabled on the GTX 1070, which can [url=https://techreport.com/review/30413/nvidia-geforce-gtx-1070-graphics-card-reviewed<]only do three[/url<]. Polaris 11 only draws [url=https://techreport.com/review/30328/amd-radeon-rx-480-graphics-card-reviewed<]two per clock[/url<] so you're maxing out around 2.4-2.5 billion. The numbers are more concrete in other reviews. The GTX 1060 can do [url=https://techreport.com/review/30404/a-quick-look-at-nvidia-geforce-gtx-1060-graphics-card<]3.2 billion[/url<]. That's probably based on a 1.6GHz boost clock so it's most likely two triangles per clock.

        • Anonymous Coward
        • 3 years ago

        So… we are talking about a 1000x speedup since Voodoo3, measured in theoretical triangles per second. That seems too high, but maybe…

          • derFunkenstein
          • 3 years ago

          We’re talking about 18 years intervening, and look at how graphics chips are the biggest chips you can buy (by mm^2), and graphics work is so inherently parallel that it’s easy to do.

          • tipoo
          • 3 years ago

          2^9 is 512, for 18 years / 2 for Moore’s general Guideline. Allowing plus or minus one shrink in there for a margin of error, it seems about right (especially as it’s 18 months, not 2 years). I actually had the opposite reaction reading the first part of your comment and went “only 1000x in 18 whole years!?”, lol.

            • BobbinThreadbare
            • 3 years ago

            At 18 months, it’s 2^12 for 4096x speed up. So looks like we’re lagging behind 2 generations.

            • tipoo
            • 3 years ago

            True, that again sounds about right with Moores general guideline slowing way down recently.

      • Meadows
      • 3 years ago

      That is mostly due to the fact that during the mid-to-late 2000’s the crowd’s ooo’s and aaah’s shifted from polygon count to “pixel shaders”.

      Interestingly, when polygon counts returned with the advent of standardised tessellation, nobody seemed to want to bring back the actual numbers anymore. Couldn’t tell you why.

    • torquer
    • 3 years ago

    poop.

      • danny e.
      • 3 years ago

      [url<]https://www.amazon.com/gp/product/B01LWN4SNN/[/url<]

      • albundy
      • 3 years ago

      you are giving it far too much credit. this is chip porn at its finest with a side of melodrama.

    • ronch
    • 3 years ago

    All this tech mumbo jumbo is nice but in the end it’s the performance/watt that will tell if an architecture rocks. And then that’ll determine the price. I just hope Vega rocks in the efficiency department; AMD graphics have performed fine these past few years but you have to admit their efficiency leaves something to be desired.

      • tipoo
      • 3 years ago

      Going tile based and the new memory setup sound like big steps towards it. Whether it overshoots Nvidia or not, we’ll only know in half a year (sigh).

    • tipoo
    • 3 years ago

    ” In past AMD architectures, memory accesses for textures and pixels were non-coherent operations, requiring lots of data movement for operations like rendering to a texture and then writing that texture out to pixels later in the rendering pipeline. AMD also says this incoherency raised major synchronization and driver-programming challenges.”

    How is this set up on Maxwell 2 or Pascal, anyone?

      • mczak
      • 3 years ago

      Nvidia has unified L2 cache (including ROPs) since Fermi.
      Unifying ROP and L2 cache is something I suspected was going to happen for every new GCN revision since GCN 1.0, so I’m really glad it’s finally happening :-).
      I would hope this also means that radeons no longer have to do a manual decompress pass of compressed framebuffer (both color and depth) data if that data is accessed for texturing (in particular with msaa, without it gcn 1.2 improved things a bit since the texture units could access compressed framebuffer data at least for color but afaik still not for depth, albeit of course a rop cache flush was still necessary even for color). Something the green team could do since Fermi (and, aside from some binning rasterizer optimizations Maxwell also has, one of the reasons why Nvidia chips were more bandwidth efficient).

    • tipoo
    • 3 years ago

    “AMD is significantly overhauling Vega’s pixel-shading approach, as well. The next-generation pixel engine on Vega incorporates what AMD calls a “draw-stream binning rasterizer,” or DSBR from here on out. The company describes this rasterizer as an essentially tile-based approach to rendering that lets the GPU more efficiently shade pixels,”

    “Three, I really need to tile my bathroom floor, but we’ll defer that for later.”
    [url<]https://techreport.com/review/27969/nvidia-geforce-gtx-titan-x-graphics-card-reviewed[/url<] Guess Scott finally got around to tiling his bathroom floor 😉

      • Damage
      • 3 years ago

      Bah dum bum tsh

        • Chrispy_
        • 3 years ago

        Ohhai

    • derFunkenstein
    • 3 years ago

    Need a pool on what the N in NCU will stand for. Since “neural networks” are all the rage, I’ll go with Neural Compute Unit. It’s just dorky enough.

      • CampinCarl
      • 3 years ago

      Ahh, but for forget, good sir, that AMD is both dorky AND silly. My guess is “Next Compute Unit”.

        • derFunkenstein
        • 3 years ago

        Darn it, you’re probably right.

        • dodozoid
        • 3 years ago

        Next is lame
        All in for NEW Compute Unit

        • natesland
        • 3 years ago

        Next-generation Compute Unit

        • DoomGuy64
        • 3 years ago

        Nexteral. Some side effects may occur, ask your doctor if NCU is right for you.

      • ozzuneoj
      • 3 years ago

      The N stands for Nvidia because they are secretly merging to take over the world with a GPU powered “Neural Network” AI called the VeGeforce Rageon GTX 1990 TI Pascolaris RX XTX XT.

      I heard this while eavesdropping in a bathroom stall at CES about 20 minutes ago.

        • RAGEPRO
        • 3 years ago

        I mean, real talk: I’d buy a GPU called Rageon.

        • dodozoid
        • 3 years ago

        Vege Force?
        I gues I sould eat as my steak while I still can.
        Vegan terminators incoming.

          • ozzuneoj
          • 3 years ago

          Exactly. I suspect the Vegan graphics card idea is mostly an Nvidia thing, with them being the “Green Team”.

        • meerkt
        • 3 years ago

        Isn’t Pascolaris a disease?

          • ozzuneoj
          • 3 years ago

          It will be! =0

      • Mr Bill
      • 3 years ago

      NP-complete computing unit!

      • gerryg
      • 3 years ago

      Nether Compute Unit, due to the dark magics at work within…

      • Mr Bill
      • 3 years ago

      [url=https://en.wikipedia.org/wiki/Nat_(unit)<]Nat Compute Unit[/url<]

      • Srsly_Bro
      • 3 years ago

      Next few AMD isn’t good enough?

      • Mr Bill
      • 3 years ago

      IF… GCN = Graphics Core Next
      Then… NCU = Next Compute Unit

    • Kretschmer
    • 3 years ago

    It’s hard to believe that Vega is still ~6 months out. Best of luck, AMD.

    • tipoo
    • 3 years ago

    Well folks, it’s official:

    [url<]http://i.imgur.com/sdNX40R.png[/url<]

      • chuckula
      • 3 years ago

      The Wassoning is upon us!

        • CScottG
        • 3 years ago

        You are too late!

        [url<]https://www.youtube.com/watch?v=tm-xiztBsFA[/url<]

    • chuckula
    • 3 years ago

    Was that demo box also running a zen CPU on an X370 board?
    Since Vega is clearly farther out than Zen’s launch, are there more Zen demo systems on display at CES?

Pin It on Pinterest

Share This