AMD proudly reveals the Vega-powered Instinct MI25 accelerator

Late last year, AMD let us peek behind the curtain at its upcoming Radeon Instinct compute accelerators and Radeon Open Compute platform (ROCm). These accelerators are designed to challenge Nvidia in the fast-growing deep learning and machine intelligence markets. At the time, AMD was a bit coy about the Radeon Instinct MI25, an accelerator based on the highly-anticipated Vega architecture. Now that these cards are close to shipping, AMD opened up about the MI25's tech specs.

Like the MI6 and MI8 compute accelerators we learned about last December, the MI25 is a passively-cooled dual-slot card intended for deep learning and HPC applications. It boasts 64 compute units and 4,096 stream processors, and has 12.3 TFLOPS of FP32 and 24.6 TFLOPS of FP16 peak performance on tap. AMD didn't officially announce the GPU's clock rates, but using the above numbers, a little napkin math puts the highest clock at about 1500 MHz.

That kind of theoretical performance deserves potent memory, and AMD doesn't seem to be cutting corners in that regard. The MI25 will ship with 16GB of HBM2 ECC memory. With its 512-bit memory interface, the MI25 should have 484 GB/s of memory bandwidth on tap. AMD is once again teasing us regarding the card's High Bandwidth Cache Controller (HBCC), apparently wanting to keep details about it under its hat for just a little longer.

Source: AMD

The company also touts the MI25's power efficiency. The accelerator's 300W TDP might sound hefty to those used to reading power consumption figures for consumer graphics cards, but power-hungry hardware can still be efficient if it can get things done quickly. Using the MI25's peak FP16 performance to determine per-watt performance, AMD points out that the card ought to outstrip Nvidia's competing Tesla products.

AMD plans to release the Radeon Instinct compute accelerators in the third quarter of this year. These units aren't likely to show up at the shelves of your local Microcenter, however. Instead, AMD will ship units to hardware partners including Boxx, Colfax, Exxact Corporation, Gigabyte, Inventec, and Supermicro.

There's also software to go along with new hardware. AMD's ROCm platform is evolving at a rapid pace, and the company just announced that version 1.6 is ready for a June 29 rollout. The big news for this release is the inclusion of MIOpen 1.0, AMD's GPU-accelerated deep learning library. MIOpen offers support for multiple frameworks including Caffe, Google's TensorFlow, and Torch.

Comments closed
    • ptsant
    • 2 years ago

    Something that is not often discussed is the use of HBM as cache, allowing the use of all system memory. Deep learning is limited not only by computation speed, but also by dataset size. For example, two days ago I tried to make a huge dataset only to realize that it wouldn’t fit in VRAM (8GB) but not even in main RAM (32GB).

    If the HBM cache scheme allows the transparent use of tens (or hundreds of GB) or main RAM with a small performance penalty, this will be a huge advantage over the competition in real-life datasets.

    Also, the MIOpen library could make or break the product: hand-optimized kernels can bridge the gap between AMD and nVidia, especially considering that this wasn’t even available until now. nVidia has provided cuDNN for a very long time and it is a major competitive advantage.

    Anyway, I can see the Instinct doing relatively well. Just like the server market, AMDs penetration in the AI/DeepLearning market is effectively zero, so it can only improve.

      • tipoo
      • 2 years ago

      Would 64GB of NAND strapped right to the GPU be of interest?

      [url<]https://techreport.com/news/30435/radeon-pro-solid-state-graphics-keeps-big-data-close-to-the-gpu[/url<]

    • ronch
    • 2 years ago

    24.6 TFLops. I’d love to see someone match that with a calculator.

      • Mr Bill
      • 2 years ago

      I can spell words on android HP 15C calculator app. Does that count?

      • tipoo
      • 2 years ago

      That’s about every person on earth doing 3514 operations per second at once

    • ronch
    • 2 years ago

    BTW, be careful how you read that efficiency graph, folks. Look at the numbers. Typical bar graph deception.

    • Leader952
    • 2 years ago

    Typical AMD marketing slight-of-hand touting their unreleased card against a year old competitors card while ignoring the competitors new offering that should be available soon also.

      • Krogoth
      • 2 years ago

      It is because GV100 parts aren’t commercially and publicly available yet. Nobody has their hands on them.

      Nvidia marketing does the same thing as well with their product launches. They always compare it with solutions that exist in the market at time of the launch/announcement.

        • Leader952
        • 2 years ago

        [quote<]It is because GV100 parts aren't commercially and publicly available yet.[/quote<] Neither is the MI25. [quote<]Nobody has their hands on them.[/quote<] Same can be said about the MI25.

          • Krogoth
          • 2 years ago

          MI25 is about to become available and AMD marketers want to gauge it against existing solutions on market for prospective buyers.

          This is is marketing 101 stuff.

          Nvidia does the same bloody thing too. GV100 is still a year away and Nvidia hasn’t made any formal launches for it yet.

            • psuedonymous
            • 2 years ago

            [quote<]MI25 is about to become available and AMD marketers want to gauge it against existing solutions on market for prospective buyers.[/quote<] [url=https://techreport.com/news/32128/nvidia-readies-up-a-pcie-version-of-the-tesla-v100<]Erm[/url<]... [quote<]The PCIe Tesla V100 is arriving roughly at the same time as AMD's Vega Frontier cards, and some comparisons are inevitable despite the difference in architecture. [/quote<]

      • SHOES
      • 2 years ago

      What should they be comparing it to?

    • thibaud
    • 2 years ago

    Seriously..
    let me correct that graph for AMD
    [url<]http://i.imgur.com/eOjMMm9.png[/url<] non 0 axis graphs should be automatically flagged

      • stefem
      • 2 years ago

      Agree, and I would also add “theoretical” at the beginning of the graph description

      • spiketheaardvark
      • 2 years ago

      As a Postdoc I know most people do not share my strong feelings towards data presentation. But non zero rooted axis should almost never be used. (log graph are an exception but that data better be log normal). Any software that makes graphs should make you click a box that you acknowledge and accept your shame anytime you change the root of a graph from zero.

        • jensend
        • 2 years ago

        Agreed. And Tufte’s [url=https://www.edwardtufte.com/tufte/books_vdqi<][u<]The Visual Display of Quantitative Information[/u<][/url<] ought to be mandatory reading for every single journalist of any description, every marketing major, and of course every STEM major.

          • Wirko
          • 2 years ago

          [quote<]every marketing major[/quote<] So they learn early enough what to *avoid*.

            • jensend
            • 2 years ago

            Of course.

          • DPete27
          • 2 years ago

          Most AMD marketing graphs I can recall share this non-zero axis tom-foolery. Regardless of whether you [i<]KNOW[/i<] about it, using a non-zero axis certainly paints your product in a much better light than "real-life"

        • torquer
        • 2 years ago

        NERD ALERT

    • deruberhanyok
    • 2 years ago

    Would love to see a gaming-related Vega reveal from AMD one of these days.

    Gamyng enthusyasts wyll be especyally ynterested.

      • Waco
      • 2 years ago

      The gaming version is going to be “Vyga”. 😛

        • Krogoth
        • 2 years ago

        “Epic” cannot be trademarked. That’s why AMD went with the somewhat silly “Epyc” moniker which itself is a tongue-in-cheek reference to Intel’s EPIC computing model.

          • Waco
          • 2 years ago

          Oh, I know, but I still think the name is a bit too…”gamer” for datacenters.

            • Krogoth
            • 2 years ago

            I kinda missed “Opetron” brand name myself. I suppose the only reason AMD changed brand names is because they wanted to distance themselves from Bulldozer/Excavator-based Opetrons.

    • kcarlile
    • 2 years ago

    If they’ve decided to actually put some resources into good Linux drivers, that would probably help them out here…

    • Voldenuit
    • 2 years ago

    300W and “passively cooled”?

    That does not compute.

      • EzioAs
      • 2 years ago

      I’ve no knowledge on the use of these cards, but if I had to guess I’m thinking they’re probably used in cases where cooling comes from the chassis fans and external fans.

      • general_tux
      • 2 years ago

      I would imagine that these are cooled by the airflow from the server chassis. It is passive since there is no fan on the card itself but there is a ton of airflow provided by the server.

      • Chrispy_
      • 2 years ago

      Almost everything in a rackmount server is “passively” cooled.
      Rack Servers have their own front-to-back airflow systems and every card is required to conform to it or duct accordingly.

      This is likely a massive vapor-chamber design with fully-open front and back that uses the powerful airflow of a server. A typical 2U server will probably have 4-5 80mm*38mm focused-flow 7000RPM fans. It will deafen you under load but there are probably more CFMs of airflow through this MI25 than any consumer card.

      • Krogoth
      • 2 years ago

      What they mean is that the card itself is not equipped with active cooling. It is entirely depended on chassis cooling (Wind Tunnel) which is commonplace for enterprise-tier GPGPU solutions.

      • Waco
      • 2 years ago

      All server parts are “passively cooled” and have very stringent guidelines on inlet temperatures and linear feet of flow per minute.

      • deruberhanyok
      • 2 years ago

      Actually, I believe compute is the entire point.

      • ronch
      • 2 years ago

      Oh but it does! And it does so while being passively cooled too!

      • chuckula
      • 2 years ago

      Forced air in servers can get the job done. It’s “passive” in as much as the fans aren’t on the card (they’re in the chassis).

        • Voldenuit
        • 2 years ago

        Yeah, yeah, even the CPUs are passively cooled in rack servers, but I don’t think any get to 300W TDPs. Even thePOWER8s and POWER9s are under 200W unless they’re run on turbo mode, in which case they get to 247W, and that’s usually with bigger fins than the nv and amd compute cards.

          • NTMBK
          • 2 years ago

          That’s why the 300W part has a bigger heatsink strapped to it.

      • Srsly_Bro
      • 2 years ago

      Go read the V100 article and they are passively cooled, also. smh @ you bro

      • ImSpartacus
      • 2 years ago

      These are bound for a data center, which would have its own cooling fans in big racks (much easier to replace one big fan on the outside of the rack than a tiny fan attached to the damn GPU heatsink).

      You’re not supposed to put it in a desktop.

      EDIT Just noticed everyone else already commented on that, lol.

      • droid126
      • 2 years ago

      If it’s intended for a server it does. High performance rack mount servers move tons of air. Far more than a desktop computer or the fans on a consumer GPU. My 970’s fans spin up to 4500rpm, I’ve seen 10000rpm fans in Dell poweredge servers.

    • ronch
    • 2 years ago

    That efficiency graph is probably what would make or break this product. As with server chips, these accelerators need to be real efficient if IT dude is gonna deploy quite a few of them. Ok I’m no expert but nonetheless the Radeon lineup really needs to pull alongside Nvidia in terms of efficiency. They’ve been lagging behind long enough.

    Edit – BTW, crazy how this thing is about 10x more powerful than my good ol’ Pitcairn.

      • ptsant
      • 2 years ago

      Although I usually agree, it appears that these are very, very expensive cards and the nVidia Volta is apparently also ridiculously expensive (they sell a “workstation” with a several of them for ~100k).

      Consider electricity cost:
      5kw @ 24/7 is something like $8700 (local prices)
      4kw @ 24/7 is $7000

      My point is, that if you are paying $150k for 10 of these cards, then a $2k gain (+20% efficiency!) *3 years of electricity is small change. If the cards were significantly cheaper, then the running costs would be a major item. But at these prices, the price of acquisition seems to dominate the total cost over time.

      Then again, I don’t have a farm of Deep Learning hardware, so I might be horribly wrong.

        • psuedonymous
        • 2 years ago

        Remember that in large installations, you usually have hard limits on building power intake and a closely coupled limit on maximum cooling capacity. It’s no point saving a bunch of theoretical money on components by throwing more watts at the problem, if you have no more watts to throw.

        Most HPC centres are at power limit upon construction.

    • tsk
    • 2 years ago

    The Tesla V100 would be leading that graph.
    Edit; 112 GFLOPS/W for the Tesla.

      • Waco
      • 2 years ago

      Yep. I imagine the price difference between the two is going to be pretty dramatic, though.

      • BryanC
      • 2 years ago

      Actually, it’s (120000/300) = 400 GFlops/W for the V100 if you’re using tensor cores. Which you would be if doing deep learning in mixed precision. And half of those 400 GFlops/W are actually 32-bit adds (important for those long accumulations), so V100 has higher accuracy as well as higher efficiency.

      • ptsant
      • 2 years ago

      The V100 is an impressive product. When is it coming to market? And are they priced similarly?

        • stefem
        • 2 years ago

        We don’t know the price but it will released before the end the year

      • ImSpartacus
      • 2 years ago

      Yeah, and AMD rescaled the bars so they appear a lot more.

      In reality, 82 is less than 10% more than 75, but that’s not what those bars suggest, lol.

Pin It on Pinterest

Share This