Intel crams 100 GFLOPS of neural-net inferencing onto a USB stick

It's been a couple years since Intel floored an old friend of ours with a computer miniaturized to the size of a disappointingly-small candy bar. Today, Intel announced a new type of compute stick, but this one isn't designed to replace your desktop computer. Instead, the Movidius Neural Compute Stick provides deep neural network processing inside a self-contained, low-power package.

As the product's name suggests, the Neural Compute Stick was made possible by technology from Movidius, a company that Intel acquired last September. The device contains one of Movidius's Vision Processing Units, which power the company's low-power machine vision devices. The Neural Compute Stick consumes so little power that it merely needs to be plugged into a USB 3.0 port, yet Intel claims that it provides more than 100 gigaflops of performance. This allows the device to handle AI inference applications without being connected to a network.

Developers using the Neural Compute Stick will still need to do algorithm training on much more powerful equipment, but these diminutive sticks will allow them to perform inference anywhere that real-world data needs to be interpreted. With a convolutional neural network from a deep-learning framework like Caffe, users can quickly put machine-learning capabilities into low-power devices. From its promotional materials, Intel seems to think that one of these sticks would do interesting things inside of a drone. And if one stick alone doesn't provide enough processing power, a simple USB hub can be used to harness the capability of a couple more.

Intel plans to show off the Neural Compute Stick at the upcoming conference on Computer Vision and Pattern Recognition starting on Saturday. Those interested can already purchase these devices through one of Intel's partners for the strikingly reasonable price of $79.

Comments closed
    • NoOne ButMe
    • 2 years ago

    I just realized, this isn’t x86.

    I’m amazed it’s making it to market!

      • tipoo
      • 2 years ago

      It’s little, but hopefully a sign they’re past the “x86 or bust” post-itanium theology.

        • frenchy2k1
        • 2 years ago

        This was not developed by intel, either.
        They bought the startup this comes from (Movidius) a year ago about.
        This is part of their investment return, basically monetizing the products they had until they can integrate them better.

    • cynan
    • 2 years ago

    [quote<]Intel seems to think that one of these sticks would do interesting things inside of a drone[/quote<] Intel is literally trying to become [b<]Sky[/b<]net.

      • CuttinHobo
      • 2 years ago

      What’s the worst that could happen? 😛

    • tipoo
    • 2 years ago

    If you’re pondering why 100Gflops would be of interest at all – forget PCs, think more mobile, like drones (or robot vacuums or mini cars or a bunch of things). At 100 GLOPS, it has enough performance to run basic networks on its own, without the help of a server. At 1 watt, small batteries can power it without substantial loss.

    It could have interesting applications for sure. Very niche, as it’s meant for developers, but interesting.

    Also that’s 1 watt for 100Gflops – ASICs like this will clearly scale up past GPUs, which is why even Nvidia is making dedicated silicon for neural nets now. Look at how dedicated silicon ousted GPUs on bitcoin.

      • ptsant
      • 2 years ago

      For inference (not training), 100GFLOPS should be able to run a simple NN at 200ms, which is OK for a lot of stuff (face recognition, pattern recognition). It also depends on the efficiency of the software stack, which is probably optimized to the max for that platform.

    • Chrispy_
    • 2 years ago

    $79 for 100 GFLOPS?

    Am I missing something, or is this not pathetic compared to the 400 GFLOPS of the low-power Intel HD 620 IGP in that laptop?

    Or, you know – for $79 you could pick up a used Radeon HD 7950 with 30x more performance.

    I am definitely missing the point, this offers less than 1/4 the performance of the typical $0 extra IGP that people already have in their laptops and 1/100th the performance of a modern $500 GPU. Perhaps if people were making USB farms it would make sense, but it still seems to fail, because a USB farm of 100 of these things would cost $8000, which would buy you an Nvidia Tesla or two.

    What is the point?

      • chuckula
      • 2 years ago

      The point is plugging that thing into a Raspberry Pi and having it offload visual object recognition inferencing in a low power envelope.

      There are other solutions that could do it too but if your GPU is spending all its time running neural networks then its resources are taken away from doing other things like producing smooth graphics. Maybe OK on a huge GPU, but in situations where a 1 Watt device is being used you probably aren’t using a huge GPU.

        • Chrispy_
        • 2 years ago

        Ah, okay. So things like drones, route-planners, roombas, toys etc. The second picture was very misleading with a full laptop and multiple devices plugged into a hub giving the impression that you’d stack them to boost a laptop’s FLOPS performance, which is just daft.

        • willmore
        • 2 years ago

        Too bad the SDK is x86 only. So, no RPI love there.

          • Drachasor
          • 2 years ago

          RPI couldn’t use it anyway since it only has USB 2, but there are non-x86 systems that could have, which sucks.

          • chuckula
          • 2 years ago

          The SDK being x86 only doesn’t mean you have to use these things with a PC.

          It just means that you need a PC to program the neural network, which isn’t surprising or really a big limitation when you think about it. After the neural network is in place on the device, anything that uses Caffe — and there is a framework for the Raspberry Pi — can use the programmed neural network.

          [url<]https://github.com/benjibc/caffe-rpi[/url<]

    • Drachasor
    • 2 years ago

    I’m interested in if this could be used to have a lower-power but high quality digital assistant in the home or on the go. Right now that stuff requires a lot of processing power. The Open Source one I’ve seen still requires that you have a pretty powerful server and even then there can be significant delays while it processes requests. It would be nice to be able to make a Smart Home that didn’t require an internet connection to stay functional and could act as a digital assistant and be…well…smart.

    • DPete27
    • 2 years ago

    I’ve always wanted to try an Intel Compute Stick. Does anyone have experience with them? I picture the perfect usage of Netflix and local movie streaming.

      • juzz86
      • 2 years ago

      The early ones were a bit under-powered for streaming.

      I now have a Cherry Trail one in the bedroom and it’s leaps and bounds better.

      • tsk
      • 2 years ago

      Although I’d recommend a chromecast instead for Netflix, they work great for local streaming.

    • NoOne ButMe
    • 2 years ago

    … yes, that’s like, less than almost any PC you would plug it into already?

    I feel I am missing the point.

      • Redocbew
      • 2 years ago

      Assuming that PC has a discrete GPU, and you can get your algorithm to run on it.

      I’m thinking these little widgets would be most useful for things that are not PCs anyway.

        • chuckula
        • 2 years ago

        As you can see from the drone in the video, these things are probably not being used with PCs outside of testing/configuration. After that it’s an embedded device.

        • synthtel2
        • 2 years ago

        A dual-core Haswell or later only has to run 1.6 GHz to do 100 GFLOPS FP32. Admittedly, that’s a lot of wasted precision and it’s probably still less efficient per FLOP at inferencing than dedicated hardware, but 100 GFLOPS really isn’t very much.

        If it’s cool, it’s because it can do that in a tiny power envelope for $80. It probably beats Atoms, cat cores, and ARMs pretty easily.

          • ptsant
          • 2 years ago

          How did you calculate that number?

          LINPACK benchmarks I found quickly show that with the Intel MKL and AVX2 a Haswell 4770K (the fastest quad-core) would get ~ 182 GFLOPS at 3.5GHz (https://www.pugetsystems.com/labs/hpc/Linpack-performance-Haswell-E-Core-i7-5960X-and-5930K-594/). You would need a high-end Haswell running at least ~2GHz to approach 100GFLOPS. A Ryzen 1800X (not particularly good for HPC, lacks AVX512) gets to 220 GFLOPS, maybe a bit more with software tuning (need to install ACML). Will test this at some point, out of curiosity.

          To put this into perspective, the Radeon Instinct is rated for 25 TFLOPS while doing neural net training and Volta (a $$$$ card) can approach 100TFLOPS if using the “tensor cores”.

            • synthtel2
            • 2 years ago

            A Haswell or later core can do two 256-bit FMAs per cycle – counting the mul and add in the FMA as an op each (as is typical), that’s 32 FP32 ops per cycle. 64 ops/cycle (two cores) times 1.6 GHz is 102.4 GFLOPS. Even things like Linpack can’t make full use of that, but whenever I’ve seen a marketing dept throwing around figures like this, it’s always the theoretical maximum.

            Nominally, 4C Haswell at 3.5 should be able to do 448, so it looks like Linpack is getting [s<]~40%[/s<] efficiency there. Zen's got both half the L1/L2 bandwidth and half the vector ALU per-core (a quarter of both compared to a 7900X / AVX-512), so at 3.6 it's a nominal 460 and [s<]~48%[/s<] efficiency. The 7900X with AVX-512 skews all the charts with a nominal 2.1 TFLOPS, or almost as much as my GTX 960. o_O I haven't been paying that much attention to neural nets and could be wrong, but AFAIK they make pretty good use per FLOP of generic FMA-capable hardware, and most of the improvements from dedicated hardware are in cutting out circuitry (including precision) that isn't useful to NN. Edit: As chuckula said and I forgot, Linpack is FP64, so the real efficiencies there are 80% and 95%.

            • ptsant
            • 2 years ago

            You are right about the numbers. I was thinking about double-precision, although of course this is not required for NN.

            The difference between my Ryzen 1700X and my RX480 while training a net is approximately 15x. This is a very rough estimate, which, given the RX480 at 5TFLOPS would put the 1700X at 200-400GFLOPS. Although I don’t remember whether the CPU ran single or double precision…

            • chuckula
            • 2 years ago

            Those numbers from LINPACK are 64-bit double-precision while these devices generally operate at much lower precision levels (even 32 bit is considered too much for many applications).

            However I do agree with your overall point.

            • ptsant
            • 2 years ago

            You are right. For single-precision float it would be approximately double.

            • synthtel2
            • 2 years ago

            I knew I was forgetting something about LINPACK. Thanks.

        • ptsant
        • 2 years ago

        Raspberry Pi would be an obvious application.

        Also, don’t forget the ecosystem. If the thing is plug and play, it might be much more convenient, even for a PC user. Currently, running a full Deep Learning infrastructure on your PC is not trivial. Typically you’d need a linux installation with a bunch of SDKs, gigabytes of software (compilers, scripting languages, a database etc). That is OK if you are training networks/doing research, but if you just need inference for a specific application you are much better having a self-contained system.

          • Drachasor
          • 2 years ago

          RPi doesn’t have USB 3.0, but tThere are a few similar sub-$100 devices that do have USB 3.0. The ODROID-XU4 is one I know off the top of my head.

        • NoOne ButMe
        • 2 years ago

        for the cost, for anyone doing it on PC, a cheaper CPU will be faster.

        power much higher, probably, but people who are just playing around with this stuff, I imagine power not important. For serious users obviously doesn’t matter. For startups, raising enough money for Nvidia ‘personal’ super-computer, or AMD Vega based FP16 cards seems better performance/dollar.

        I like concept, and a version that uses 5W, or even 20W+ on USB-C I think could be much more interesting.

          • Drachasor
          • 2 years ago

          It looks like this would use a lot less energy. If you connected it to something Raspberry Pi-like (there are similarly-sized systems with USB 3.0 under $100), then you do have something cheap and low power. That means a drone or portable device can run using it for a long time.

          It also means that if you have it running, say, a smart home AI, then it can run off backup battery power for a long time in case of a power outage — and a lot of IoT devices could theoretically be quite low power at least (and some are).

          Also, traditional CPUs are not suited to neural network work. That benefits from lots and lots of threads, which is why you have GPUs and similarly designed devices doing this work. So that 100GFlops is actually not directly comparable with a CPU at all.

          So theoretically this could be very nice for when you want low power applications. Granted if you want a lot more power and can supply more power than a GPU ought to be better. You’d need about 100 of these to equal a Geforce 1080, assuming they are roughly equally efficient with their flops on the given task.

          So, seems like whether it will be useful really depends on how much power it uses.

    • DavidC1
    • 2 years ago

    They just announced they are going to completely abandon wearables. The only thing they were showing the last IDF.

    Then little before that they gave up on hobbyist IoT like Galileo. How long will it take before they abandon this one?

    Anyone doubting the CanardPC claim that Brian Kraznich is crazy and just abandons projects after not putting much effort and making other insane decisions should note these cancellations. They spent $100 million for wearables. What’s more irresponsible than treating other people’s money like it’s garbage?

    Last year they gave up mobile just like that. Brian can’t handle the pressure so he just gives up. Anyone can’t help but think these behavioral problems at the high end is similar to the no strings, no consequences relationship thinking? I’m pretty sure if they didn’t make these irrational decisions and committed themselves they’d actually go somewhere outside their core CPU business.

      • DancinJack
      • 2 years ago

      Yeah, I’m sure they’re giving up market segments “just like that.”

      C’mon

        • DavidC1
        • 2 years ago

        Last IDF they had skateboarders flying over the CEOs head with sensors powered by Intel and drones flying everywhere on top of that. Other times it was all about watches.

        If a year later it disappears entirely, it is “just like that”.

        They spent multiple years giving away money for companies to use their chips and they suddenly announce they won’t make it at all. That’s also “just like that”. At least give it more chance or the forethought to not make stupid decisions like throwing $10 billion over couple of years in the first place.

        I don’t believe in companies and research teams hiding some alien-like technology and dribble down to consumers to maximize profit. However, most tend to slack when there’s zero threat to them. They try to squeeze suppliers and customers in any way possible.

        On one hand it seems like these companies need competition so they don’t do stupid things like that. On the other hand when you look at certain comparisons, you can’t help but notice REAL advancement would come if true cooperation occurs. I think its ridiculous how we couldn’t have Fury’s HBM and Maxwell’s efficient uarch all in one product. They go what seems to be quite a different ways, but end up mostly the same result. Problem is I think the society breeds us to believe stepping on others backs are the only way to success.

          • frenchy2k1
          • 2 years ago

          You realize that nvidia fits their top end processors with HBM2, right?
          So, you DO have HBM + the latest uArch (in a sense, you are right, no Maxwell with HBM, only the 2 successors), in GP100 (last year) and GV100 (last month).

          Nvidia only fits HBM memory on their top, HPC oriented cards, as this is still quite expensive and hardly necessary for consumers.

          I agree with you that intel has been dropping the ball lately. They have retreated to their core markets and dropped their experiments very quickly. Many investors are starting to complain about that lack of leadership. As said, Intel needs high margin products and has problem moving away from server and client computing without cannibalizing those high revenue markets. This is the main reason they failed in mobile (they *could* not release a high performing low power cheap chip without killing their higher margin client computing business, so their efforts were capped from the start).

      • brucethemoose
      • 2 years ago

      Maybe it’s a margin thing? Like, if they can’t get the insane profits they have for servers or laptops soon, it’s not worth it anyway?

      Not saying that’s reasonable, but I can picture execs falling into that mode of thinking.

        • CuttinHobo
        • 2 years ago

        I suspect you’re right, but just about anyone would agree that diversification is rarely a bad thing.

          • w76
          • 2 years ago

          I’d disagree, it can be a distraction from a core competency. Intel makes outstanding processors and does an outstanding job in chip fabrication. Leave wearables and other fashion fads to other companies like Apple that have an eye for those things.

          That said, this seems a lot closer to their core competencies than some of their other abandoned ventures.

            • CuttinHobo
            • 2 years ago

            That’s a good point about distraction. I do think they can diversify while still not straying too far from their core. It doesn’t need to be a $7.7 billion dollar albatross like McAfee was. They could could, for example, offer a massive compute solution other than Xeon Phi by gluing (as they put it) a dozen Iris GPUs together and putting it on a PCIe card. They wouldn’t so much be competing with themselves as just offering customers a solution for needs that aren’t well served by Phi. That’s a relatively low hanging fruit since the Iris development has already been done but is offered in so few models that it seems poorly monetized.

            As you touched on, in some ways they’re like Google and cook up these odd, billion dollar products in search of a home before dropping them like a hot potato once things get challenging.

      • cynan
      • 2 years ago

      “I’m inclined to take CanardPC’s claims with a healthy load of salt on that one, Jim. But then again, if it looks like a duck and sounds like a duck…”

      • blastdoor
      • 2 years ago

      All good points.

      But I can see how they are in a tough spot. Their rise to prominence correlated directly with the decline of the vertically integrated computer company. Intel does not make products that consumers buy. They have to lead from behind — they can only have a hit product if the OEMs who buy their chips have hit products. Their fate is not really in their hands.

      And really, for a very long time, they didn’t even have to lead from behind. Microsoft did it for them.

      Now we are in a world where Google and Microsoft might be designing their own CPUs, and of course Apple already does that to great success.

      Intel’s best bet might be to abandon all of these efforts to drive from the backseat, and accept what their comparative advantage is and always has been — manufacturing. Sure, milk x86 for all its worth while you can, but Intel’s best hope for the future might be as a foundry.

        • DavidC1
        • 2 years ago

        The sad part may be that the advantage might have been due to being an IDM. A close collaboration between manufacturing and design teams yielding world-class products. As soon as they become a foundry, they might lose that.

        And they were GOOD at it. Failure at execution in one area is going to be true in another. I’m afraid going all foundry isn’t going to make it any better for them.

          • blastdoor
          • 2 years ago

          It might be more accurate to say that by the time they become a foundry, they will have lost it.

          If, two years ago, Intel had aggressively and sincerely pursued Apple’s business using their 14 nm process, they would have most likely won that business. How could they not have? Their 14 mm process was the best by far.

          But now? Even if they tried, which we have no evidence that they are, it’s not clear that they could lure Apple away from TSMC.

          I know Intel has dipped a toe in the foundry business, but it’s too little, too late. They have given TSMC too much time to catch up.

            • DavidC1
            • 2 years ago

            I don’t think gaining Apple as a mobile customer would have been beneficial at all. It might have made sense in the short term, but its quite clear at this point. They absolutely squeeze their suppliers.

            Let’s say Paul Otellini agreed with Steve Jobs and got their chip in the iPad and iPhone. How long would that have lasted? Would it have been a positive for them if the mobile chips inside Apple phones/tabs were cannibalizing their desktop lines? At the moment its not directly possible because the OS and ISA are completely different. And if Apple figured they could make better mobile chips than Intel they would have dumped them on the spot!

            Same with being a foundry. Except you make even less money than making chips for their devices. Its very well known Apple switches fab suppliers almost every generation. If they had Apple’s business and we fast forward to 2017 we might have heard they lost the business.

            Their execution problems would have troubled them either way. I will tell you who’s successfully branching out of the PC-only business by making GOOD products: Nvidia. Intel could have done so, it would have took time, but everything does. Not everything can be an instant success.

            • blastdoor
            • 2 years ago

            The numbers don’t back you up at all — as in, not one little tiny bit.

            TSMC is doing very well as a foundry as is Samsung. Both compete aggressively for Apple’s business because Apple provides huge volume for expensive SOCs that fund deployment of new fab processes.

            And Intel would have done even better with Apple as a customer because Intel had a significant technical lead.

            The only way that it makes sense for Intel not to sell to companies like Apple is if they can make more money doing something else. But they can’t. That’s the whole point.

        • Delta9
        • 2 years ago

        They have top of the line foundries AND a huge $ advantage. That was most obvious when they announced the Core Duo chips while still having 2-3 P4 revisions to go, and then dumped the chip on the market while throwing away 80% of the wafers for the first 6 months of the chip’s availability. They have the ability to use brute force when it comes to fabrication of their own products to keep market share. I doubt they would take that kind of hit for a 3rd party. The amount of fab space they have is ridiculous.

Pin It on Pinterest

Share This