Nvidia DGX-1 uses eight Tesla P100s to speed up deep learning

Holy mother of GPUs. If you've been reading the news today, then you know Nvidia has launched the Tesla P100, the HPC equivalent of the Schwerer Gustav mega-cannon. But really, who wants just one of those cards for crunching tons and tons of deep-learning data? Take a good look at Nvidia's DGX-1 Deep Learning System.

Nvidia claims the DGX-1 is the "world's first deep-learning supercomputer in a box." It includes eight Tesla P100 cards, all meshed together with the NVLink interconnect. According to Nvidia, the DGX-1 could be 75 times faster than a dual-socket Xeon E5-2697 v3 CPU system when running neural network training tasks, and it can perform up to 170 TFLOPS—a claimed 56 times the performance of the same pair of Xeons. The company says deep-learning users can expect as much as a 12-fold speedup in neural-network training tasks with the DGX-1 compared to a server with four Maxwell chips inside.

There are some other meaty numbers to go with those claims. The DGX-1's Tesla GP100s each have 16GB of HBM2 RAM to work with. The main system is powered by two 16-core Intel Xeon E5-2698 v3 CPUs and 512GB of DDR4 RAM. A RAID-0 array of four 1.92-TB SSDs and dual 10GbE ports round out the main specs. The unit's maximum power draw is rated at a whopping 3200W. Nvidia says interested parties can order a DGX-1 today for $129,000.

Comments closed
    • Mr Bill
    • 3 years ago

    Mr Slippery will be thrilled to find out that house AI computing modules are starting to get into the affordable range.

    • blastdoor
    • 4 years ago

    Does anybody know to what extent IBM’s True North (or a future derivative thereof) might be a competitor to this?

    [url<]http://arstechnica.com/science/2016/03/ibms-brain-inspired-chip-finds-a-home-at-livermore-national-lab/[/url<] IBM's approach seems much closer to actually being a neural network as opposed to just emulating one in software. And IBM's approach appears to be *much* more efficient in terms of power.

      • chuckula
      • 4 years ago

      It’s the typical tradeoff of a circuit that is ultra-customized for exactly one task beating a more general-purpose processor at that one task.

        • blastdoor
        • 4 years ago

        That’s clearly true…. But I don’t think True North is customized for exactly one task. My (admittedly limited) understanding is that True North is customized to run, in hardware, a specific neural networking package/language. So it’s more that True North is customized for a specific class of problems that may be narrower than the class of problems that can be addressed by a GPU, but it’s not exactly one task.

        What I don’t understand is just how narrow the class of problems that True North can solve is. Is it so narrow that True North is not economically viable?

        My uninformed hunch is that a derivative of True North will beat the pants off of GPUs for the types of pattern recognition problems that human brains currently excel at relative to computers. For example, my hunch is that a commercialized self-driving car will be run by something that bears a closer resemblance to True North than to a GPU.

        But I admit my hunch is uninformed…. I’d love to hear from someone who knows more about this stuff.

    • Anovoca
    • 4 years ago

    looks like I found my 4k plex server

      • brucethemoose
      • 4 years ago

      Does that monster even have a hardware video encoding/decoding block? Seems like it wouldn’t be very useful for compute stuff.

        • Terra_Nocuus
        • 4 years ago

        I’m pretty sure it has the grunt to do it all in software

    • DeadOfKnight
    • 4 years ago

    I think I found my next gaming PC. I guess I’ll have to keep saving for the new Ferrari. 5K gaming here I come!

    • Fractux
    • 4 years ago

    This is exactly what Tay was looking forward to.

    • albundy
    • 4 years ago

    pretty hot deal if it indeed can perform as much as the dual xeons. having that many dual or even quad socket blades will probably net you 3 times that cost on infrastructure, let alone the substantial electricity and cooling costs.

    • anotherengineer
    • 4 years ago

    “The unit’s maximum power draw is rated at a whopping 3200W. ”

    pffffff that’s it!!

    My clothes dryer has it beat at 5200W.

      • Leader952
      • 4 years ago

      [quote<]My clothes dryer has it beat at 5200W.[/quote<] But can it play Crysis?

        • Jigar
        • 4 years ago

        It can bake it, I guess …

        • Ninjitsu
        • 4 years ago

        Someone has done this properly after a long, long time. 😀

      • Beahmont
      • 4 years ago

      Hmm… New Business Ideal! Beahmont’s HPC Center & and Laundromat!

      • Deanjo
      • 4 years ago

      You have an AMD bulldozer powered clothes dryer?

        • anotherengineer
        • 4 years ago

        Nope.

        But I wish I did then I could fold and dry at the same time 😉

          • Captain Ned
          • 4 years ago

          He’ll be here through Thursday. Try the veal.

          • Deanjo
          • 4 years ago

          Ya but it would still probably be quicker to fold by hand. ;D

            • ImSpartacus
            • 4 years ago

            That really puts the “home” in Folding@home…

      • prb123
      • 4 years ago

      [url<]https://what-if.xkcd.com/35/[/url<]

    • sreams
    • 4 years ago

    Tesla P100 for $129,000? Sounds about right. But I’d prefer a P100D for that second motor, and I’d definitely order the Ludicrous Speed option.

      • the
      • 4 years ago

      Can’t forget the plaid color chassis.

      • Beahmont
      • 4 years ago

      $129k for eight Tesla P100’s and a lot of other very expensive stuff. 1.9 Terrabyte PCI SSD’s are likely a couple grand each. An interposer for a 610mm^2 die GPU is also going to be stupidly expensive as well. And HBM2 4GB blocks to go on that interposer are also not going to be cheap considering that only 2 of the 3 manufactures have recently started mass production. Add in a “Shiny New Toy” premium and yeah. $129k per box.

      And yet, even for that price these things will sell as quickly as they can be made and then some, because 5 of these boxes put you on the Top500 list. 10 boxes would put you firmly in the lower 200’s. 20 boxes could put you below 100 on the list.

      $1.29 mil for a lower 200 Super Computer and $2.6 mil for a potential below 100 Super Computer with more or less 848 T/flops of f64 power? They won’t be able to make these things fast enough. Even some of the poorer Colleges all over the world can find 2-3 mil for a Super Computer that fast.

        • Waco
        • 4 years ago

        If only Linpack meant anything besides a number to brag about…

          • Deanjo
          • 4 years ago

          Could be worse, think of all the energy and resources wasted doing meaningless coin mining data over the years.

            • Waco
            • 4 years ago

            True, but in a datacenter doing HPC (well, simulation type HPC workloads), peak FLOPs rarely mean much. Linpack, in particular, very very rarely means anything at all.

            Granted, it’s a great stress test, but beyond that? Not many real problems map similarly. It’s almost entirely compute-bound, which describes almost zero workloads in the wild.

    • synthtel2
    • 4 years ago

    Am I reading this wrong, or would it take only four or five of these to get on the [url=http://www.top500.org/list/2015/11/?page=5<]top 500[/url<]? (85 TFLOPS FP32 per DGX-1) Edit: I was reading it wrong by a factor of two.

      • Ushio01
      • 4 years ago

      Isn’t the top500 FP64? and yes 5 of these would get you onto the Top500 so if you have $645,000 to buy them I’m sure techreport would be happy to test them for you.

      Edit
      Even better 1 of these boxes could have last got onto the Top500 in June 2011 at position 440 and 6th place in June 2006.

        • synthtel2
        • 4 years ago

        The descriptor I saw was “full” precision, which I interpreted to mean FP32 (in the usual half/single/double –> FP16/32/64 terminology). If it’s FP64, then eight or nine boxen, which is still very impressive.

        Edit: nope, it’s FP64, so 8 or 9 boxen it is. Good catch.

    • odizzido
    • 4 years ago

    Wonder what the dimensions are on this? 3200W sounds like a lot to deal with.

      • arbiter9605
      • 4 years ago

      It looks like would be a 3u server rack i would guess. Since that is likely where they expect these to be mounted. Most likely 3u. Just a guess so could be wrong.

      Edit, Did some quick thinking using USB ports on left as some what a gage to figure out scale, its about 10 of them, which are half an inch so if that case what it is, then it would be 3u.

      edit 2, TPU has on their story its a 3u rack so would guess that is what it will be.

        • curtisb
        • 4 years ago

        The picture looks like 4U. It’s still a ton of power to cram into that small of a chassis.

        • odizzido
        • 4 years ago

        ahh good catch. I did not see those USB ports.

      • Leader952
      • 4 years ago

      Where are the air vents?

      Or is this liquid cooled?

        • moose17145
        • 4 years ago

        See those oval shaped things on either side?

        Probably those.

        Could also be that front panel has an air gap between the actual air intake vents an itself while leaving the outer edges open for air flow.

        Granted this is probably not the best solution for air flow… that being said… with the kind of forced air flow that happens inside a server chassis I doubt it will be an issue. Keep in mind there is absolutely no restriction on the amount of noise these things can make.

      • Meadows
      • 4 years ago

      A lot of server rooms already have to remove a lot more than that constantly, so I doubt this would tip their scales too much.

        • cygnus1
        • 4 years ago

        The cooling probably won’t be a problem but I think the average rack PDU might only power 2 or 3 of these, definitely not a full rack of them. That power draw is going to necessitate some extra planning if people are buying more than a few to install together.

    • orik
    • 4 years ago

    but can it run crysis?

      • chuckula
      • 4 years ago

      That box looks tough enough to run over Crysis.

        • jihadjoe
        • 4 years ago

        I remember when this joke was about Doom, then Quake.

        Back in college my buddies and I were huge Quake nerds. There was one time when a female friend was telling us about one of the “cool guys” in their department, and my one of my friends’ sarcastic reply was “Yeah, but can he play Quake?” loool

      • cmrcmk
      • 4 years ago

      Forget running Crysis, this box has enough horse power to disassemble and recode Cyrsis so that even Intel Extreme HD graphics could run it.

      • Neutronbeam
      • 4 years ago

      That was My first question. ☺

      • gc9
      • 4 years ago

      But you’ll have to imagine the view.

      [sub<](I assume compute cards use binned chips with no video out, and may be missing other fixed-function graphic function blocks such as rasterizers, hardware decoders, etc. Wafer defects can show up anywhere.)[/sub<]

Pin It on Pinterest

Share This