ARM DynamIQ makes big.LITTLE more flexible

ARM's big.LITTLE scheme pairs clusters of hefty CPU cores along with groups of less capable, more power-efficient cores. The company is now taking the concept to the next level with its DynamIQ technology. The new tech grants chip designers increased flexibility in grouping cores of varying characteristics within a single SoC. This move has should net increased performance, power efficiency, and reliability. ARM says that some of DynamIQ's features are useful for artificial intelligence, machine learning, and VR scenarios.

The company's big.LITTLE technology previously required CPU cores to be arranged in clusters of up to four cores each in both the big and LITTLE portions of the SoC. DynamIQ allows chip designers more flexibility when setting up core configurations, like 1+3, 1+7, or 2+4. ARM's new tech also offers "substantially more granular and optimal control" over the cores. Each cluster can contain between one and eight cores with distinct power and performance characteristics.

A feature called "autonomous CPU memory power management" dynamically adjusts the amount of memory that's available to a CPU depending on the application. A game or augmented reality application might require all available memory, while a music streaming application could get by with less RAM, thereby saving power.

DynamIQ also adds special instructions designed specifically for machine learning and artificial intelligence. ARM didn't go into a lot of detail on this matter, but the company says users can expect up to a "50x boost in AI performance over the next three to five years relative to previous systems."

ARM says DynamIQ-based chips will be ready for use in advanced driver assistance systems for autonomous vehicles. The company says the SoCs will be compliant with ASIL-D, the most stringent level of automotive safety measures. ARM says a "low-latency port" lets the CPU communicate with the outside world with "up to 10x quicker response", potentially allowing a machine-learning application to make safety-critical decisions quickly. Meanwhile, DynamIQ's improved power efficiency should prove a boon for hybrid and battery-powered vehicles, where every watt counts.

According to The Verge, ARM has already licensed DynamIQ to select customers. The first wave of SoCs packing the new technology is expected to come to market in early 2018.

Comments closed
    • DPete27
    • 3 years ago

    I feel like 2+4 would be the optimal configuration.

    • ronch
    • 3 years ago

    This should make for some interestingly (and confusing for the common folks) spec’d SOCs.

    • tipoo
    • 3 years ago

    Oh boy, I can’t wait for the first Chinese SoC paired with 3 A53s paired with 7 A53s paired with 9 A53s paired with 1 A53 paired with 13 A53s!

    • f0d
    • 3 years ago

    i dont get it – i thought big.LITTLE didnt have to be four cores per cluster? how does the 2x A72 and 4x A53 in the snapdragon 650 in my xiaomi note 3 pro work then?

    [quote<]The company's big.LITTLE technology previously required CPU cores to be arranged in clusters of four cores each in both the big and LITTLE portions of the SoC. DynamIQ allows chip designers more flexibility when setting up core configurations, like 1+3, 1+7, or 2+4.[/quote<]

      • tipoo
      • 3 years ago

      I think 2+4 was probably not supposed to be in that list. Current HMP big.little seems fine with any factors of 2.

      • willmore
      • 3 years ago

      The wording here is incorrect–as other reports have done correctly. It should read “arranged in clusters of *up to* four cores each”.

      In practice, I haven’t see a chip with only 1 core in a cluster, but that’s most likely due to layout issues–the cluster interconnect overhead has to be there even if you only have one core, so you might as well have two as the additional die area is very small. But that’s speculation on my part.

        • morphine
        • 3 years ago

        Thanks for the heads-up on that detail, fixed.

    • NoOne ButMe
    • 3 years ago

    50x the AI performamce… for only 100-150x the cores!

    What is an A53 core with L2 on 28nm with 256kb of cache? Maybe 1.5mm^2 at most?

    Sub 1mm^2 on 16/14 FinFET.

    Here comes a 512 core A53 SOC… at 1Ghz and ~250 watts.

    Edit: 250W is high actually. <75W for 1Ghz. 250W would be somewhere in the 2-2.9Ghz range

      • tipoo
      • 3 years ago

      Yeah, it’s crazy how small the A53 is. You can sprinkle that everywhere!

      I wonder if a 128 core (or multiples thereof) dev board of A53s would make for a good embarrassingly parallel compute learning device. Though I guess that’s still the domain of GPGPU.

      …And now I’m wondering if that wouldn’t be a better Larrabee.

        • chuckula
        • 3 years ago

        [quote<]...And now I'm wondering if that wouldn't be a better Larrabee. [/quote<] You answered your own question: [quote<]that wouldn't be a better Larrabee[/quote<]

          • tipoo
          • 3 years ago

          butwhy.png. Ok, so Larrabee was partly sold on x86 compatibility, but apart from that the lack of the ucode ROM and larger x86 decode block would make more ARM cores be able to fit in.

          Not saying the concept would be good anyawys, but ARM would likely be a better fit for Larabee than x86 ironically

        • NoOne ButMe
        • 3 years ago

        A35 cores probably would be better. You could get over 1000 on either Finfet process.

        And probably in 1.5Ghz+ for 200 watts including uncore and memory controllers.

        Be an interesting thing to see. I suspect it could compete with GPUs on a workload basis.

          • dodozoid
          • 3 years ago

          Dont forget that chip design isnt just slaping a truckload of cores on a slice of silicon. Those cores need to be fed and coordinated somehow.

            • Waco
            • 3 years ago

            Yeah – the cores may be small, but the communication network and off-chip access will take up a good amount of real estate and power.

            • NoOne ButMe
            • 3 years ago

            I looked into numbers:
            Using Samsung 7420 as baseline for A53 than:

            A35 core by itself about .4mm^2 on 14LPP. Before L2 cache. Where A53 is about .48mm^2
            At 1.5Ghz power draw should be about 200mW excluding L2.
            ARM claims <100mW at 1Ghz on 28nm themselves.
            A thousand cores is a slight exaggeration. But fitting 700 or 800 into an under 600mm^2 die at <300W including RAM should be possible.

        • southrncomfortjm
        • 3 years ago

        You guys are exceedingly “big” little chip nerds. I don’t mean that in a pejorative way. Mostly impressed actually.

        • NTMBK
        • 3 years ago

        The Xeon Phi has a pair of massive 512-bit vector units strapped to each of its “little” cores. The wide SIMD lets it get a better FLOPS/W ratio by reducing the amount of overhead in scheduling instructions relative to useful work. Plus it reduces the number of nodes on the ringbus/mesh interconnect, simplifying the cache coherency and again saving power.

        • BobbinThreadbare
        • 3 years ago

        It would probably be bad at it. You couldn’t have the memory bandwidth to keep them all loaded.

          • NoOne ButMe
          • 3 years ago

          550 A53: 212mm^2
          L2: 165Mm^2 @ 96kb per core
          HBM2x4 18mm ^2
          Uncore: 205mm^2
          600mm^2, recoverable 500+ core CPU.

          Numbers based on ARM’s claims of 25% denser A35 versus A53 ISO process with A53 being .48 per Anandteh 7420 deep side.
          L2 based on .8mm^2 for 255kb L2 per Same source
          HBM2 based on GP100 die shot

    • chuckula
    • 3 years ago

    Pshaw!! You thought a 32 Core Naples or Xeon server was cool?

    32 COARZ IN YO’ PHONE!
    — Mediatek

      • NoOne ButMe
      • 3 years ago

      And in such small area! All 32 cores + cache are smaller than AMD’s or Intel’s 4C+cache!!!

        • ronch
        • 3 years ago

        AMD might as well resurrect SeaMicro and put together a 64-core server SOC and sell it for 50 bucks!! Should fluster Intel for an entire tick-tock cycle!

      • ronch
      • 3 years ago

      64 x Cortex A8 FTW!

      • TheMonkeyKing
      • 3 years ago

      Oh, put a SoC in it!

Pin It on Pinterest

Share This