AMD opens up machine learning with Radeon Instinct

Whether in self-driving cars or personal assistants, deep learning and artificial intelligence are some of the fastest-growing markets for GPUs right now in the form of compute accelerators. Nvidia’s beastliest Pascal chip, GP100, isn’t even available on a traditional graphics card yetβ€”its home is on the Tesla P100 accelerator. That decision highlights just how important powerful GPUs are to deep-learning tasks like training and inference right now. With its Pascal Titan X and its family of Tesla accelerators, all running CUDA deep-learning programs supported by its cuDNN machine-learning libraries, Nvidia has established a dominant position in this exploding corner of high-performance computing.

Last week in Sonoma, California, AMD laid out an alternative path for companies and institutions looking to take advantage of the power of machine learning, and today, the company is revealing its hand with a new initiative called Radeon Instinct. Instinct is a hardware and software stack built around the Radeon Open Compute Platform, or ROCm. Formerly known as the Boltzmann Initiative, ROCm provides the foundation for running HPC applications on heterogenous accelerators powered by AMD graphics cards. With the release of Radeon Instinct, ROCm can accelerate common deep-learning frameworks like Caffe, Torch 7, and TensorFlow on AMD hardware. All that software runs atop a new series of Radeon Instinct compute accelerators that we’ll discuss in a moment.

On top of ROCm, deep-learning developers will soon have the opportunity to use a new open-source library of deep learning functions called MIOpen that AMD intends to release in the first quarter of next year. This library offers a range of functions pre-optimized for execution on Radeon Instinct cards, like convolution, pooling, activation, normalization, and tensor operations. AMD says that convolution operations performed with MIOpen are nearly three times faster than those performed using the widely-used “general matrix multiplication” (GEMM) function from the standard Basic Linear Algebra Subprograms specification. That speed-up is important because convolution operations make up the majority of program run time for a convolutional neural network, according to Google TensorFlow team member Pete Warden.

Deep-learning applications built atop ROCm and MIOpen will be accelerated by three new Radeon Instinct cards for use in servers. All three of these accelerators are passively-cooled, and each one has a specific niche. The Radeon Instinct MI6 uses a Polaris GPU with 16GB of RAM to deliver up to 5.7 TFLOPS of FP16 or FP32 throughput. It’ll also offer up to 224 GB/s of memory bandwidth. All that performance comes in a 150W TDP. This card is meant for inference work. Meanwhile, the Radeon Instinct MI8 employs a Fiji GPU equipped with 4GB of HBM RAM to offer 8.2 TFLOPS of FP16 or FP32 throughput alongside 512 GB/s of memory bandwidth. AMD expects this card to be useful as an inference accelerator or for more general HPC tasks.

The most exciting Radeon Instinct card, the MI25, echoes the debut of Nvidia’s GP100 GPU on the Tesla P100 accelerator. This card is the first product with AMD’s next-generation Vega GPU on board. Most of the details of Vega and its architecture remain secrets for the moment, but we can say that the Vega GPU on board will offer support for packed FP16 math. That means it can achieve twice the throughput for FP16 data compared to FP32, an important boost for machine-learning applications that don’t need the extra precision. The Polaris and Fiji cards in the Radeon Instinct line support native FP16 data types to make more efficient use of register and memory space, but they can’t perform the packed-math magic of Vega to enjoy a similar performance speedup.

The MI25 will slot into a TDP under 300W. AMD isn’t talking about the amount or type of memory on board this card yet. You’ll have to hold tight for Vega’s official announcement to learn what the “High Bandwidth Cache and Controller” are, as well. Sorry.

We can’t resist using these early Vega specs to make some guesses about whatever fully-enabled consumer Radeon card will come with this graphics chip on board, though. If the MI6 and MI8 take their names from their peak FP16 throughput, we can work back from the name and peak FP16 performance of the MI25 to make some educated guesses. If the Vega GPU on board this card features similar raw specs to AMD’s last big chip, the Fiji GPU on the R9 Fury X, a roughly 1500-MHz clock speed and a theoretical 4096-stream-processor shader array would shake out to about 12.5 TFLOPS of FP32 performance. AMD hasn’t (and wouldn’t) confirm anything about real-world implementations of Vega, though, so take our guesses for what they are.

If Radeon Instinct products have any hope of competing in the marketplace, they need to performβ€”and it seems as though they will. AMD teased a machine-learning benchmark result for two of its cards using the DeepBench GEMM test. Compared to the Maxwell Titan X as a baseline, the Pascal Titan X and the Radeon Instinct MI8 both offer about a 1.3x speedup for this bench, at least going by AMD’s numbers. The MI25 achieves nearly two times the monster Maxwell’s performance. According to DeepBench’s maintainers, the program uses vendor-supplied libraries where possible, so it should be at least somewhat representative of the performance one could expect to achieve using cuDNN or MIOpen-accelerated code.

Radeon Instinct accelerators will also support hardware virtualization through the company’s MxGPU feature. In this case, MxGPU will present multiple virtual Radeon Instinct slices to guest operating systems performing deep-learning tasks. Nvidia touts its Tesla accelerators for virtual desktop infrastructure work, but it doesn’t offer similar flexibility for deep-learning acceleration as far as we’re aware.

While high-performance computing and machine learning may not be TR’s regular cup of tea, our feeling is that AMD’s Radeon Instinct salvo could offer some much-needed competition in a space that has been dominated by Nvidia’s hardware and CUDA programming tools so far. Whether AMD’s trio of accelerators and a more open software stack are enough to tempt companies to make the switch remains to be seen, but if they do, Instinct could help AMD establish an important foothold in the exploding, lucrative HPC market being driven by machine learning and artificial intelligence. We’ll find out when Instinct products make their debut in the first half of next year.

Comments closed
    • TheMonkeyKing
    • 3 years ago

    Vega…

    For my wife, who is much younger than me, a 90s child, Vega means:
    [url<]https://www.youtube.com/watch?v=VZt7J0iaUD0[/url<] For me, Vega means: [url<]http://www.popularmechanics.com/cars/a6424/how-the-chevy-vega-almost-destroyed-gm/[/url<]

      • Krogoth
      • 3 years ago

      For Astronomy geeks like myself, it means Alpha Lyrae.

      [url<]https://en.wikipedia.org/wiki/Vega[/url<]

    • JosiahBradley
    • 3 years ago

    Finally a single card possible of replacing my current setup. Only problem is to take advantage of it I’ll need 2.

    • jts888
    • 3 years ago

    For anyone who’s not seen it yet, there is a little leaked footage from the private Vega showing last week:
    [url<]https://www.youtube.com/watch?v=4_oI1K6rt48[/url<] It's just D44M running at 4k, but it seems to be doing pretty well, and it's something maybe a little more familiar to a lot of people here.

      • chuckula
      • 3 years ago

      So we’re almost exactly at the same point with Vega today where we were this January with Polaris and Raj’s demonstrations.

      When it launches in May I expect it to be noticeably faster than the GTX-1080.

        • jts888
        • 3 years ago

        This is clearly AMD’s most favored benchmark, but 70-75 fps in outdoor areas of Doom in 4k Ultra is actually on par with Titan X. The 1080 gets maybe 50-55.

        I do agree it’s rather unlikely we’ll see Vega on shelves in Q1, but it would be nice to see it and the 1080 Ti before May…

        • ultima_trev
        • 3 years ago

        Being that it was only about 10% ahead in a benchmark that favors AMD, I expect Vega to pull slightly ahead of the GTX 1080 in DX12/Vulkan but slightly behind in DX11.

        Hopefully they won’t price it to match the GTX 1080 however, price it in line with GTX 1070 similar to how they did with the R9 290X release. And yet more hopefully they won’t be using a god awful reference cooler this time.

    • Meadows
    • 3 years ago

    In your face, Valor and Mystic.

    • ronch
    • 3 years ago

    I honestly doubt this will get very far.

    • Tristan
    • 3 years ago

    Vega perf/W is the same as Polaris. Is this redesigned CU ?

    • liquidsquid
    • 3 years ago

    Stupid as this comment sounds, gaming could certainly use a little learning “AI” to make some games a bit more challenging as they learn to counteract your moves, not just spew out more projectiles. On top of some great graphics, games really do need some underlying architecture to keep things fun.

    Imagine a game like Doom that learns this mistakes it made when you went through last time, and changes things up to try and beat you.

      • wingless
      • 3 years ago

      You should play against the Terminator-like AI in Arma3!

      • blastdoor
      • 3 years ago

      I guess I’ve always assumed that for games like Doom, the challenge in designing the AI is to make it good enough to make the game interesting but not so good as to make the game impossible (and therefore not fun and not profitable). If FPS designers wanted to make an AI that could utterly destroy every human opponent, I suspect they could have done it 15 years ago.

      The greater challenge is a game like Civ.

      • Marios145
      • 3 years ago

      Nice try, skynet.

      • albundy
      • 3 years ago

      i think that’s called never passing the 1st level. if it’s always counteracting your move, you’ll spend more time waiting for the level to reload rather than playing. i can however see them adding this to the difficulty options, but who would use it?

      • dodozoid
      • 3 years ago

      With AI in games, you dont realy need “smarter” foes. You need them to make more believable mistakes. It is super easy to make a bot shoot you in your face as soon as there is direct LOS. It is equaly easy to to make it go directly your way to find you imediately. Its hard to make it lifelike and that might use some machine learning.

      • BaronMatrix
      • 3 years ago

      That would be cool but would require SERIOUS storage and bandwidth… A complex game would never stop growing…

      • ptsant
      • 3 years ago

      I found DOOM AI quite good. What I liked is how every opponent has his own personality and attack style. Sure, it’s not revolutionary, but I did thoroughly enjoy the game.

      • SHOES
      • 3 years ago

      Not stupid at all. A.I. that can self adjust to be a challenge but playable would be a really cool feature imo.

    • Srsly_Bro
    • 3 years ago

    Great article, Jeff. There is a double “of” on one of the URLs. Are you planning to attend AMD’s event tomorrow?

    Thanks, bro.

    • Unknown-Error
    • 3 years ago

    Until the independent reviews are out, AMD’s own slides and data should be taken with extreme skepticism. Perhaps even disregarded altogether.

      • shank15217
      • 3 years ago

      That would apply to any vendor released benchmark, thanks Capt. Obvious

        • Unknown-Error
        • 3 years ago

        Actually no. When Intel shows slides on their new CPUs, or nVidia on their GPUs or Apple with their Ax SoC prior to the independent reviews, we can afford to be somewhat optimistic about their ability to deliver on those promises. That does not apply to AMD. We were optimistic about AMD’s ability deliver during the K7, K8 and even K10 (Phenom II) days, but that is long gone. Off-course nothing is certain until the independent reviews are out irrespective of the company, but expectations, optimism on who can deliver on their promises vary vastly between AMD and their rivals.

          • Pwnstar
          • 3 years ago

          You’re wrong. Remember “CEO math”? They all do it.

          • freebird
          • 3 years ago

          Y don’t you get sucked back into your “black hole” of negativity and pop out of some Qua(ck)sar…into te “unknown” … you sound like one of those butt-sore losers that was shorting AMD this year and lost his shirt.

          FACT if you prefer 32-bit OSes and CPUs without DX12 capabilities fine don’t go with AMD. AMD delivered in SPADES on 64-bit CPUs and pushed the whole environment towards DX12/Vulkan via Mantle… In my opinion M$ was dragging it’s feet on DX12 and AMD knew it’s hardware wasn’t being taken full advantage… Nvidia wasn’t pushing DX12 because even Pascal doesn’t do Async Compute well.

          If it was up to Intel, WinDoze & Nvidia, we still be running 32-bit Pentium 19, 32-bit Window 2016 (based off WfWG 3.11) & DX-9 in all it’s glory or gory…

          Amd Haters: “You CAN’T HANDLE the TRUTH!!!” πŸ™‚ πŸ™‚ πŸ™‚
          [url<]https://www.youtube.com/watch?v=9FnO3igOkOk[/url<]

            • derFunkenstein
            • 3 years ago

            I don’t agree with Unknown-Error’s point, but holy crap you’re an asshat

      • beck2448
      • 3 years ago

      Lots of hype and vaporware. I’ll wait for the quarterlies.

    • YukaKun
    • 3 years ago

    Is that “Vega” sample image true to it’s projected size?

    There’s a lot of speculation one can do from that picture πŸ˜›

    Cheers!

    • mkk
    • 3 years ago

    Maybe NSA could gobble up a few thousand of these cards. Whatever brings in some cash to AMD is fine with me… πŸ˜‰

    • NTMBK
    • 3 years ago

    High Bandwidth [i<]Cache[/i<]? Very interesting... I wonder if this will have HBM2 cache, combined with GDDR5(X)/DDR4 for higher capacity. Similar to what Intel did with Knight's Landing's HBM cache.

      • AnotherReader
      • 3 years ago

      Yeah that is reminiscent of Knights Landing.

      [b<][Wild Speculation][/b<] Now, continuing on from Jeff's speculation about Vega, we can estimate 24.5 to 25.5 fp16 TFLOPs for Vega which gives us a clock speed range from 1329 MHz for a 72 CU part to 1556 MHz for a 64 CU part. I think the 1330 MHz, 72 CU configuration is more likely. Now, FirePros are usually clocked lower than the related consumer AIBs. The FirePro W9100 has a boost clock of 930 MHz compared to 1000 MHz for the 290X. So we can expect a 72 CU gaming Vega to clock about 100 MHz higher than the MI25. [b<][/Wild Speculation][/b<]

        • Magic Hate Ball
        • 3 years ago

        Or maybe even a 80 CU max core size for the enterprise, high margin market that runs at 1200mhz?

        And then the 72 CU harvested for desktop environment?

      • ImSpartacus
      • 3 years ago

      Nah, I think AMD is just trying to be cagey about Vega. I seriously doubt it’ll have complex multi-tier vram at this stage.

      But in a year or two? I wouldn’t be surprised.

    • Tristan
    • 3 years ago

    300W pasivelly cooling
    Interestin

      • chuckula
      • 3 years ago

      It’s “passive” in as much as there are no fans on the card itself.

      However, in a server rack there tend to be arrays of fans in the case that are forcing quite a bit of air over the fins of the heatsink, so it’s still pretty much an “active” cooling system. The difference is that you don’t buy the fans with the card.

        • NTMBK
        • 3 years ago

        Yup, same idea as this passive Tesla: [url<]http://images.anandtech.com/doci/7521/NVIDIA_Tesla_K40_GPU_Accelerator_3Qtr1.jpg[/url<]

          • jihadjoe
          • 3 years ago

          The Tesla P100 is ‘passive’ too:

          [url=http://images.nvidia.com/content/tesla/pdf/nvidia-tesla-p100-datasheet.pdf<]Datasheet[/url<]

            • derFunkenstein
            • 3 years ago

            How is what you wrote different than what he posted?

    • Freon
    • 3 years ago

    Smart move, this is likely the best path forward for both Nvidia and AMD manufacturers. In a post Moore’s law world, machine learning and AI are the way forward.

      • chuckula
      • 3 years ago

      [quote<] In a post Moore's law world, machine learning and AI are the way forward.[/quote<] Do you do movie trailers?

        • Growler
        • 3 years ago

        Needs more “[b<]BWOOOM[/b<]" sounds. I'm not sure I'm comfortable with machine learning, because eventually it becomes machine judging. I don't need computers to be disappointed with me, too.

          • RAGEPRO
          • 3 years ago

          Tags are case-sensitive brogham.

            • Growler
            • 3 years ago

            Thanks. It looks better now.

            • Srsly_Bro
            • 3 years ago

            A simple bro will suffice!

        • derFunkenstein
        • 3 years ago

        In a world where technology stagnates, doing the best we can with what we have is the only way forward.

      • blastdoor
      • 3 years ago

      [quote<]In a post Moore's law world, machine learning and AI are the way forward.[/quote<] I don't understand this statement. Machine learning and AI require gobs of transistors. If anything, the sputtering of Moore's Law strikes me as good news for anyone worried about Skynet.

        • chuckula
        • 3 years ago

        Yes, but nobody wants to see the movie with the tag line: “In a post Dennard scaling world, massively parallelizable workloads are the way forward.”

          • blastdoor
          • 3 years ago

          Didn’t that movie come out 10 years ago? And haven’t we been trying to parallelize workloads ever since?

          I think the tag line now is: “In a post Moore’s law world, [url=https://en.wikipedia.org/wiki/Khan_Noonien_Singh<]genetically engineered super-human intelligence[/url<] is the way forward."

            • Waco
            • 3 years ago

            Khaaaaaaaaaaaaaaaaaaaaaaan!

            • Krogoth
            • 3 years ago

            [url<]https://youtu.be/wRnSnfiUI54[/url<]

        • Tumbleweed
        • 3 years ago

        You’re right, you really DON’T understand that statement.

        If you can’t scale on a single chip, you scale with MANY CHIPS. Hence, machine learning in the cloud.

          • NTMBK
          • 3 years ago

          If you can’t scale down the transistors and their power consumption, then your cloud is still going to be power (and space) constrained. Supercomputers have been dealing with this for literally decades.

          The cloud isn’t magic. It’s just a big bunch of servers, in a datacenter.

          • derFunkenstein
          • 3 years ago

          but what if what we really need isn’t inherently parallel in nature?

    • RAGEPRO
    • 3 years ago

    CUDA is a closed compiler with black box libraries that requires proprietary hardware. Frankly speaking, no one sane should have touched it with a twenty-foot stick.

    I realize the historical context-they were really the first to offer any real tools for GPU compute-but looking at it now it’s really depressing how popular CUDA is.

    I hope this new compute push from AMD bears fruit. MI25 looks like a beast.

      • Ninjitsu
      • 3 years ago

      OTOH if you’re guarenteed performance, consistent results and support services, I don’t [i<]really[/i<] see why it's all that bad. [quote<]CUDA is a closed compiler with black box libraries that requires proprietary hardware. Frankly speaking, no one sane should have touched it with a twenty-foot stick.[/quote<] Can probably replace CUDA with iOS/MacOS as well πŸ˜›

        • AnotherReader
        • 3 years ago

        It is also used for accelerating science codes, and for that purpose, we should have complete documentation. There is a reason there’s no Kazushige Goto for CUDA.

        • beck2448
        • 3 years ago

        Nvidia has entrenched GPGPU solutions that have become many industrys standard.Look at the bloated mess called Word and yet it still dominates because once a solution is accepted, it’s extremely difficult to get buyers to change horses.

      • ptsant
      • 3 years ago

      nVidia was smart enough to lure users in the walled CUDA garden with fanatical support. I know a colleague who does Deep Learning and he got a Titan for free simply by sending a cool description of his current work to nVidia. They are marketing very agressively.

        • SHOES
        • 3 years ago

        Hard to argue with performance. If AMD can maintain a solid lead for an extended period of time they just might get this off the ground.

      • stefem
      • 3 years ago

      Did you ask yourself why CUDA is so pervasive?
      [quote<]CUDA is a closed compiler with black box libraries that requires proprietary hardware. Frankly speaking, no one sane should have touched it with a twenty-foot stick.[/quote<] Frankly this statement sound opinionated at best. The NVIDIA CUDA compiler is based on LLVM, you can build your own compiler that support your own programming language if you want, Google developed an alternative open source CUDA compiler last year for example and I bet you know AMD's "Boltzmann initiative"... It's not a big deal porting your code from CUDA to OpenCL, 99% of the work you made rewriting your code to take advantage of a CUDA GPU will be good for other GPU and frameworks (it even happens that the new implementation perform better on x86 than the old one). In fact you don't even need to learn a new programming language, you need to learn how to rethink your algorithm to adapt to a throughput oriented processor.

    • chuckula
    • 3 years ago

    [quote<]This card is the first product with AMD's next-generation Vega GPU on board. Most of the details of Vega and its architecture remain secrets for the moment, but we can say that the Vega GPU on board will offer support for packed FP16 math. That means it can achieve twice the throughput for FP16 data compared to FP32, an important boost for machine-learning applications that don't need the extra precision. [/quote<] It's like 1991 in reverse. Back then it was a big deal to go from 8 bits to 16 bits on your Nintendo. Now it's a big deal to go from 32 bits down to 16 bits on your HPC accelerator.

      • Krogoth
      • 3 years ago

      It depends on the workload in question. Not every HPC workload needs 32bit and 64bit floating precision.

      Nvidia does the same thing for their HPC product line-up as well.

    • derFunkenstein
    • 3 years ago

    Anything that can use AMD’s current tech to get into high-margin spaces, infuse the company with cash, and fund R&D is a good thing in my book. Too bad they didn’t come up with all this a year earlier, though betting on HBM on the high end probably reduces Fiji’s viability thanks to its small memory capacity. Edit: I know the MI8 uses it, but it seems like these HPC apps really need lots of RAM. Maybe not.

Pin It on Pinterest

Share This