AMD FirePro S9300 X2 is a Radeon Pro Duo for the HPC set

AMD's Radeon Pro Duo combines a pair of Fiji GPUs for 16 TFLOPS of workstation-class computing power on the desktop. Today, the red team is taking similar power to the data center with its FirePro S9300 X2. This card combines two fully-enabled Fiji packages, each with 4GB of non-ECC HBM RAM, in a server-friendly form factor. Each GPU runs at 850 MHz. That's good for 13.9 TFLOPS of theoretical FP32 compute performance in a 300W power envelope, according to AnandTech.

As high-performance computing cards go, the S9300 X2 is targeted at applications that need the highest-possible single-precision (or FP32) performance available. FP32 performance is apparently especially important to modeling applications in the oil-and-gas industry. In a wave-equation modeling performance demonstration, AMD says the S9300 X2 can deliver as much as 3.5 times the performance of Nvidia's Tesla K40 and two times as much performance as the Tesla K80. Other applications that might benefit from the S9300 X2 include tasks like deep learning.

The S9300 X2 will be available in the second quarter of 2016 for a suggested price of $6000. It'll complement the FirePro S9100 series, AMD's existing lineup of high-performance computing cards. Those Hawaii-based cards offer HPC-critical features like ECC RAM and fast double-precision performance.

Comments closed
    • AJSB
    • 4 years ago

    Meanwhile…

    [url<]http://www.bitsandchips.it/images/2016/03/31/zhash1.png[/url<] [url<]http://www.bitsandchips.it/images/2016/03/31/zmem1.png[/url<] [url<]http://www.bitsandchips.it/images/2016/03/31/zraytrace1.png[/url<] :")

      • synthtel2
      • 4 years ago

      So about half the per-core IPC at ray-tracing (of Skylake), memory bandwidth still has some issue, and I have no idea what to think of hashing. Looks like dedicated hardware, but why would they use dedicated hardware for hashing of all things? Is there something out there that benefits that much from being able to hash at 17 GB/s instead of 5?

        • AJSB
        • 4 years ago

        Don’t worry…my April’s Fools day victim here 😀 :p :”)

          • synthtel2
          • 4 years ago

          Dammit, I walked right into that one. I dodged all the rest, I swear! xD

            • AJSB
            • 4 years ago

            That’s what they always say :p

            Granted, the trap as well set, chipset codename & RAM speed were legit and even the relatively modest clock of the CPU seem also legit….then, not winning all categories but have decent values was the cherry on top of the cake 😉

    • ronch
    • 4 years ago

    OK, now it should come with a fierce Asus-like cooler on it. /s

    • ronch
    • 4 years ago

    I wonder how many of these Pro-grade GPUs AMD and Nvidia really sell, and how much of the pie each has.

      • MathMan
      • 4 years ago

      AMD doesn’t have a history of selling a lot of these. Nvidia does. In the last quarter, Nvidia reported $97M in revenue for their datacenter business (Tesla and Grid) and $203M for Quadro.

      AMD doesn’t break out between CPU and GPU anymore, but their total GPU business is unlikely to be more than $300M (vs $1.2B for Nvidia.)

        • ronch
        • 4 years ago

        So AMD can boast of making everything from CPUs to graphics chips to embedded stuff to forks and knives… It’s just that they don’t really sell a lot of each kind of product to make up for R&D costs be profitable.

    • TheMonkeyKing
    • 4 years ago

    Yes but will it bitcoin. That’s still a thing, bitcoin. Right?

    • ultima_trev
    • 4 years ago

    I didn’t realize Fiji was such a non-compute focused card. Probably for the best, the power consumption with decent DP64 support would be too damn high.

      • Laykun
      • 4 years ago

      They had to, it’s the only way they could get more life out of 22nm for gaming cards, ditch DP for the gaming friendly SP units.

    • Neutronbeam
    • 4 years ago

    I don’t think it can play Crysis, so I’m Krogothing it. 🙂

      • kuttan
      • 4 years ago

      You may be right this HPC monster comes without a display connector 😛

        • xeridea
        • 4 years ago

        You could probably use the power of the card with DX12/Vulkan to play other games.

      • Srsly_Bro
      • 4 years ago

      -krogothed

      • tipoo
      • 4 years ago

      You’d probably need some sort of neural net for a computer to play Crysis. This can *run* Crysis just fine though. The issue is outputting it to a screen 😛

      Firepros can actually run games fine, I guess there’s an impression of otherwise. My FirePro M5100 performs similarly to a M370X in games for example, with only a small performance hit for the more pro oriented driver.

    • jts888
    • 4 years ago

    So Fiji doesn’t have half-rate fp64 like Hawaii then?

    Also, I don’t think
    [quote<]Hawaii-based cards offer HPC-critical features like ECC RAM[/quote<] is technically correct. Last I've heard, HPC GPGPUs support ECC only at the data structure level instead of the memory hardware. GDDR5 modules have 32b data interfaces, which makes independent memory interfaces using the popular and efficient (72, 64) SEC-DED Hamming Code pretty much impossible. You'd have to have memory chips, card PCBs, and GPU ASICs with 36b physical data paths, and sell both 32b and 36b variants of the memory chips, the latter of which would have a minuscule market.

      • chuckula
      • 4 years ago

      [quote<]So Fiji doesn't have half-rate fp64 like Hawaii then? [/quote<] It's 1/16th rate, which leads to somewhat better performance than an equivalent Maxwell part but it is nowhere near a full-bore 64 bit compute card. For comparison, even the old single-chip Xeon Phis that have been out for 3 years have roughly the same FP64 performance as these two-chip parts, although obviously these parts have superior FP32 performance.

        • jts888
        • 4 years ago

        Yeah, I think AMD’s reigning FP64 champ is the FirePro S9170, with 5.2/2.6 TFLOPS using Grenada XT.
        16/1 TFLOPS of Pro Duo/S9300 is nothing to sneeze at either, but I wasn’t initially sure if that was a gimped consumer level FP64 level or intrinsic to the chip.

      • ImSpartacus
      • 4 years ago

      Nah, fiji was up against the reticle limit 28nm, so the fp64 units were one of last things that could removed to get more gaming-useful stuff onto the chip.

    • chuckula
    • 4 years ago

    [quote<]As high-performance computing cards go, the S9300 X2 is targeted at applications that need the highest-possible single-precision (or FP32) performance available. [/quote<] Considering the double precision performance is nothing to write home about, it's understandable that AMD would play up FP32, just like Nvidia played up games that also rely on 32-bit floating point calculations when it launched Maxwell.

      • jts888
      • 4 years ago

      The thing that I still can’t fully grasp is Nvidia’s playing up of half-precision/FP16 performance in Pascal.

      1b sign/5b exponent (bias=15)/10b fraction yields only three decimal digits of precision, which you really have to be careful when using, or have extremely sloppy simulation simulation constraints.

      The last time they tried FP32/FP16 splitting IIRC was the NV30/FX 5800/DustBuster, where ATI’s R300/9700 went with pure 24b and soundly trounced it.

      Is there going to be any consumer benefit whatsoever for the feature this time?

        • ImSpartacus
        • 4 years ago

        Certain use cases don’t need much precision. So for them, they getting extra fp32 units for free using space that would’ve otherwise been full of big fp64 units. When you’re up against reticle limit on 28nm, then have to make those compromises.

        Obviously the trade-off was made for gaming, but if certain hpc use cases can take advantage, then why not?

        • MathMan
        • 4 years ago

        FP16 was important for their mobile chips because it saves power and because most other mobile chips have it.

        FP16 is crucial for deep learning, where FP32 is very much overkill in terms of precision, and where bandwidth and memory size are always a factor. With FP16 you double the BW (in terms of numbers per second) and you double the memory (in terms of numbers per GB.)

        In many cases even FP16 is overkill for neural networks, and there are papers out there where they still get good results with 8 bit neural coefficients.

          • ImSpartacus
          • 4 years ago

          How does that work? Are they just iteratively trying to predict better coefficients for a linear model in some kind inductive fashion? I keep hearing about learning, but I’m not in an industry uses that kind of thing.

            • MathMan
            • 4 years ago

            It’s not a linear model.

            You have a bunch of inputs which are interconnected in some linear fashion, but then fed through a non-linear activation function. That’s one layer. The outputs of that layer goes into the next layer. And so forth. You typically have around 10 layers for a deep neural network.

            The final layer gives a solution.

            You feed the network with thousands or hundreds of thousands examples, and when the output is wrong, you backtrack and adjust the coefficients to nudge the answer in the right direction. Repeat as nauseam until answers start being correct.

            There are many nuances to it, especially new types of layers that work extraordinarily well on GPUs (convolutional layers), but it’s an incredibly powerful technique and all tech giants (Google, Baidu, Facebook …) are deploying tools left and right that use it. Google is even transitioning their search algorithm towards neural networks.

            There’s also AlphaGo which uses neural networks (in combination with other techniques) to evaluate the strength of a board.

            Right now, Nvidia has cornered this market completely by releasing cuDNN, a library that optimizes neural net operations. AMD current doesn’t play at all, which is a big miss because their hardware should be very good at it.

        • synthtel2
        • 4 years ago

        Plenty of framebuffers (the majority even?) in modern games are RGBA16F. If you’re doing a big string of operations on one and don’t know that it’s safe, you probably want to do 32-bit math just because of floating point shenanigans, but there’s no shortage of cases where 16-bit math would be fine (and with Polaris/Pascal, much faster). I’m sure we will see some visual artifacts in some game or another when someone messes this up, though.

Pin It on Pinterest

Share This