David Kanter dissects 64-bit ARMv8 architecture

ARM’s practice of licensing its low-power CPU cores to practically anyone who wants to build a chip has been wildly successful in the past five years. It’s had the effect of flinging computing power from a PC at the center of the network out to every node, from printers to routers to NAS boxes, smart phones, and tablets. As a result, ARM has become the biggest threat to Intel’s dominance that we’ve seen in a generation. However, in order for ARM-based CPUs to step into even more roles, they have some catching up to do. For starters, they’re going to have to transition to 64-bit memory addressing.

Plans are already in place to do so, of course, and David Kanter has taken a long, hard look at the proposed ARMv8 archiecture. His overview is worth reading, if you care about such things. Kanter mostly likes what he sees. Allow me to lift a bit from his conclusions:

Like x86, ARMv7 had a fair bit of cruft, and the architects took care to remove many of the byzantine aspects of the instruction set that were difficult to implement. The peculiar interrupt modes and banked registers are mostly gone. Predication and implicit shift operations have been dramatically curtailed. The load/store multiple instructions have also been eliminated, replaced with load/store pair. Collectively, these changes make AArch64 potentially more efficient than ARMv7 and easier to implement in modern process technology.

Among other things, ARMv8 should have a very direct impact on Nvidia’s project Denver and, of course, the future of Apple’s iDevices.

Comments closed
    • ronch
    • 7 years ago

    It’s somewhat unfortunate that x86 became the de facto standard when it comes to modern day computing particularly in the desktop/laptop space, considering it’s probably one of the most inefficient architectures ever created. ARM is obviously gaining traction. Who would have thought it was them that will ultimately challenge x86? Not PowerPC, not SPARC, not Itanium. I suppose the market has a way of adjusting itself to adopting a more efficient architecture where the main developer/pioneer of the architecture will (at least for now) not try to monopolize the industry as it happened with x86. Intel tried (and continues to do so) to monopolize the x86 industry by blocking anyone who dares challenge it. I’m glad to see an alternative architecture that’s more efficient rise up to challenge Intel and its monopolistic behavior.

      • chuckula
      • 7 years ago

      How exactly is x86 “inefficient” again? I hear this line over and over again by people whose main gripe is that x86 assembly looks “ugly”. Well, none of these people make a living writing assembly (be it for x86 or ARM) so frankly I’m not sure what the big deal is.

      As you have seen from the article, ARM has quite a bit of cruft that is being removed from the newest architecture too.. but lots of ARM chips will still have to support the older instruction sets for legacy purposes. Sound familiar? The modern version of x86 is actually x86-64 + extensions like AVX, FMA, etc. The modern x86-64 also dropped a lot of older cruft, but compatibility is maintained to earlier versions because it is cheap to do and provides a lot of value for real people running real software out in the wide world.

      Here’s an “efficiency” question for you: Calxeda recently did a completely skewed benchmark between one of its brand-new quad-core ARM systems that is completely tuned for low-power operation vs. an off the shelf quad-core Sandy Bridge system. Of course Calxeda skewed the results to make it look like the ARM system had some massive power consumption advantage, but even using their own numbers, they basically said that their quad-core ARM running 100% flat out is still substantially slower than a standard Sandy Bridge running at… wait for it… 15% load. That means that the one “inefficient” x86 core substantially outperforms a fully utilized quad-core ARM chip while still having room to spare. Oh, and the benchmark involved a stupidly simple static-web page loading benchmark that is about as best-case as you can get for the ARM board.. try a benchmark with real memory I/O or complex computations and the ARM board will get run over. So where is the supposedly superior efficiency of ARM when even an intentionally rigged benchmark still looks good for x86?

      The hilarious thing is that many of the same people who rip x86 and praise ARM also vacillate between loving and ripping on EPIC (the architecture behind Itanium). When it comes to theoretical beauty, EPIC makes ARM look like a kludged-together 8th grade science fair project. It is the apotheosis of academic design theory from the 1990’s and early 2000’s…. but look at how it has failed to light the world on fire. Don’t think that Intel intentionally sabotaged its own chips either. Believe me, the Intel of circa 2000 would have *loved* for everyone in 2012 to be using some sort of EPIC-derived chip that would cut AMD out of the picture. However, theoretical beauty has to meet the real world, and that is where x86 has truly shined.

        • ronch
        • 7 years ago

        You’re comparing microarchitectures, not architectures. X86 is where it is only because of the extensive development Intel has poured into it. Do not compare Sandy Bridge to a much simpler microarchitecture using the ARM architecture. By itself, x86 has too many instreuctions that have to be laid down in silicon each time a new microarchitecture is created. It has to deal with a painful memory addressing scheme. It originally had too few registers. It had been relying on a floating point stack when its RISC peers were already using far larger and more registers. If it weren’t for the complex decoders that modern x86 processors employ, it will probably still be executing instructions of wildly differing lengths. Etc. Etc. Etc. Now if Intel used a more efficient architecture from the very start and poured as much money into it, where do you think we’d be today? Smaller die sizes? Less power consumption? Intel had to deal with x86’s inherent deficiencies with more transistors and die space. Imagine all of Sandy Bridge’s microarchitectural tricks implemented on the ARM architecture or some other RISC architecture.

        I hope you understand my point now.

          • chuckula
          • 7 years ago

          I understand your point and you are actually agreeing with something I’ve been saying for a long time. For example: [quote<]Imagine all of Sandy Bridge's microarchitectural tricks implemented on the ARM architecture or some other RISC architecture. [/quote<] You know what you would get? A CPU that is about as fast as Sandy Bridge and uses about the same amount of power as Sandy Bridge. Seriously. I have been saying for a *very* long time that Intel can scale Atom down to the same power levels as ARM, but in the process they'll end up with a CPU that isn't going to destroy ARM in performance. With Medfield, Intel has basically proven the point. The inverse of Intel scaling down is also true for ARM scaling up. Could an ARM licensee spend a *HUGE* amount of money coming up with an ARM instruction set based system with comparable resource to Sandy Bridge? Sure, but guess what: you won't be massively outperforming SB if you stick to SB's power envelope and your performance will suffer compared to SB if you keep the power envelope much lower. If anything, ARM has a *much* longer way to go in scaling up than Intel does in scaling down. Intel has already got into the range they want to compete in, which is high-end ARM devices that power smartphones, tablets, etc, but neither ARM nor any ARM partners have really shown anything that comes close to the performance-per-watt of a notebook/desktop/server x86 system. Note the difference between *total power usage* vs. performance per watt. ARM fits the low power profile, but is not winning the performance per watt war in the slightest. The vast majority of ARM's sales are not in these markets, despite the hype, however. Instead, ARM makes its really big sales in tiny chips for embedded applications where Intel isn't really going to push x86. Maybe at those insanely low-power levels ARM has an advantage not because of an instruction set but because ARM basically focuses on low power consumption at the expense of everything else. Don't confuse a tiny microcontroller with the chips in tablets though, and in that market Intel is more than capable of competing.

          • dkanter
          • 7 years ago

          It’d help if your point made any sense. ARMv7 is crufty, but not quite as bad as IA32. x86-64 is actually quite good.

          Sure ARMv8 is more efficient, but it’s really a marginal difference. Maybe it’s theoretically 5-20% more efficient…that doesn’t matter. Only a third of most chips are CPU cores, and 20% * 30% = 6% net advantage (best case).

          DK

          • chuckula
          • 7 years ago

          [quote<]You're comparing microarchitectures, not architectures[/quote<] You can't fully separate the two unless you are engaged in a meaningless exercise of simulating a CPU with a pencil and paper and declaring victory for the one that is easiest to simulate using the fewest sheets of paper. Every single point you just threw out could be turned on its head. Intel spent years building high-performance chips so therefore ARM is better? WTF? What if I just said that ARM spent decades (and it has) building power efficient designs and that if somebody would just put same effort into x86... just imagine what would be possible!! Of course, both statements are stupid conjecture but in the ARM fanboy world there's a one-way street where anything touched by ARM becomes magically super while it is physically impossible to improve x86 because Intel is not cool or something. Floating point stack? REALLY? You do realize that ARM is *vastly* inferior to modern x86 in floating point? It's so bad that the ARM fanboys intentionally disparage any workload or benchmark that includes floating point as being "cheating" since x86 destroys it. You need to get your head out of 1982 and realize that x86 has had vastly superior floating point processing capabilities using SSE going back a full decade. Oh, and AVX is light-years ahead of anything on ARM including that joke called "neon". Once again, you are using the standard ARM fanboy argument that because versions of x86 from 1982 are not as nice as versions of the ARM instruction set from the year 2015 that ARM is inherently magical and that Intel has literally just overclocked an 8088 and called it Sandy Bridge. Memory addressing modes? If x86 was so hopeless at addressing memory, I'm pretty sure it never would have been able to take over in the server world. Guess what: compilers are able to handle the different x86 memory addressing techniques just fine and when you look at the I/O advantages that even Atom has over competing ARM parts, I'd try barking up another tree. Extra die size? WTF? You do realize that a full-scale Ivy Bridge desktop die is smaller than the A5X chip used in the Ipad 3 right? Sure there are tiny ARM chips... they are tiny because they don't do a whole lot and are used in markets where Intel isn't particularly interested in competing. You get up to the cellphone/tablet market and those ARM chips really aren't any smaller than competing Medfields. Hate to burst your bubble but the laws of physics are not suspended simply because the ARM marketing department spoonfed you talking points. You keep using the word "RISC"... you don't seem to understand what that word means. I know to ARM fanboys it is a magical word that removes any need to present evidence or reasoning to back up your arguments, but the modern versions of ARM are *NOT* RISC using any real definition. You need to get it in your head that RISC is not just a substitute for "not x86" because that's not what its about. Basically any advancements to the ARM IA will be making it *less* RISC like. Case in point: the expansion of the vector instructions in ARM v8? Guess what: vector instructions ain't RISC.

            • ronch
            • 7 years ago

            I don’t know what your problem is. You seem so riled up over defending x86. I don’t think you really understand what my point was and instead choose to focus on trying to prove your point while being blind to mine.

            x86 may be superior to ARM right now because it has evolved much farther at this point. However, that does not change the fact that x86 was a kludged up architecture and I would rather a more efficient and elegant architecture became predominant. Perhaps it’s not ARM. Perhaps it’s PoewrPC or SPARC. Now, your Highness, I’m not as all-knowing as you when it comes to CPU architectures, as you seem to know everything and challenge those who have a different view from you, but as I said and as most folks would probably agree on, x86 is nowhere near being the most efficient way of computing.

            Mind your manners, boy. Just because this is the Internet doesn’t mean you can go around treating other people’s opinion like they’re crap. I hope you’re better in real life.

            • djgandy
            • 7 years ago

            You’re wasting your time. The people who say X86 is inefficient having excellent sources for their information. (BSoN, FudZilla etc.)

            They don’t seem to understand that legacy instructions really have no silicon impact due to the fact they are legacy. They not designed to be super fast because applications that use them are so old that you could do them on a wrist watch faster than they were expected to be executed when the software was developed.

            Whatever x86 architecture actually means half the time people around here use it I don’t know either. x86 is an instruction set. It will have some impact on how your design your CPU architecture, but back in the real world, the CPU’s designed around things such as new SIMD instructions and memory bandwidth, and legacy instructions piggy back on new features (see paragraph 2). Also see Qualcomm, guess what they use the ARM ISA but don’t take an ARM core.

            And we haven’t even touched on Pipelines, Caches, OOE yet. Apparently these things are irrelevant to a CPU’s performance because having to support an instruction that 0.01% of applications use from 1990 will destroy your CPU performance. All I can infer from these stupid arguments is that CPU’s in the 90’s must have been measured in metres squared seeing how supporting things trivial back then is so hard on silicon when transistors have shrunk over 1000 fold since that time.

          • BobbinThreadbare
          • 7 years ago

          Right because no one else has spent any money on CPU architectures to compete with Intel.

          [quote<]If it weren't for the complex decoders that modern x86 processors employ, it will probably still be executing instructions of wildly differing lengths.[/quote<] And if water wasn't wet, we wouldn't be alive. What's your point? Intel and AMD have mad great decoders and it turns out "kludgy" x86 with a decoder is faster than Power, Cell, ARM, or just about anything you have.

            • ronch
            • 7 years ago

            Yes. Obviously Intel did enough to make x86 as good as it can be compared to other architectures. All I’m saying is that I’d rather Intel (or some other company with vast amounts of R&D money to throw around) came up with a far more efficient and elegant architecture than x86. AMD64 may be better, but it still carries a lot of old baggage. Perhaps a native RISC architecture (not necessarily ARM, as I’ve stressed to chuckula) should have been what x86 should have been. That alone would avoid today’s complex x86-to-RISC decoders.

          • WaltC
          • 7 years ago

          There’s virtually no resemblance to the “x86” of today–which really means “x86 compatible” and the x86 of 25 years ago. Actual, so-called “x86 circuitry” makes up a very small fraction of the die in current x86 chip topology. Back in the early 90’s, it was assumed that RISC would become the architecture of the immediate future (at least in popular mags like PC World, et al), shouldering x86 out of the way. Then came the Pentium, then came AMD’s K7-A64…and the dynamics once again shifted back to “x86”–even the original x86 CISC architecture is all but gone inside today’s x86 cpus. RISC is truly a “been there, done that” argument that makes little sense today–current x86 cpus, while technically not RISC themselves, have more in common with RISC than CISC. In a sense, RISC did win–just not in the purest sense of some of its early supporters–that it supplant everything else, including x86. It didn’t do that. Obviously. And it’s a good thing too–as I value backwards compatibility and like continuity.

          How popular or useful would RISC or CISC be today if every new generation meant you had to throw the baby out with bathwater? If every new generation meant you’d have to buy all new software and developers would have to start all over again? Oh, yea–we’d get “better” hardware out of it–but the software environment commercially would be crap–and as you probably know it is because of software and what it enables you to do with your hardware that the hardware market exists at all as it does today.

        • shaurz
        • 7 years ago

        x86 has an incredibly complex instruction encoding, with many layers of extensions patched on. ARM, although it may have a little cruft, has always had a simple encoding format, and so does ARMv8.

          • chuckula
          • 7 years ago

          [quote<]x86 has an incredibly complex instruction encoding[/quote<] Complex sure, but "incredibly" is a bit over the top. The amount of silicon needed to do instruction decode in a modern x86 processor is tiny compared to the overall transistor budget. In exchange, x86-64 has instruction density that is at least as good as even the ultra-compressed ARM thumb instructions, meaning that the x86 processors get an effective boost in the efficiency of their instruction caches. There's no perfect "right" or "wrong" between ARM & x86 instruction sets. Instead there are tradeoffs between two different approaches.

    • Anomymous Gerbil
    • 7 years ago

    Possibly dumb question – why does memory addressing need to jump from 32-bit all the way to 64-bit? Even 48 bits is a huuuuge addressable space.

      • Kurlon
      • 7 years ago

      Memory addressing did only jump to 48 bit with ARMv8, the registers / etc are 64 bit but the actual memory addressing range of the unit is 48bit.

      • pragma
      • 7 years ago

      With UNIX roots dominant in todays computing, maybe it doesn’t. There are systems, however, such as IBM’s System i, where “everything is an object” and a big address space is necessary to cover everything.

      ps. 64 of 4TB disks map to a 48-bit address space. Just a few more years, those 48 bits may become limiting.

        • Anomymous Gerbil
        • 7 years ago

        Good point, I forgot to do the maths! Thanks for the replies, all.

      • Stranger
      • 7 years ago

      While x86 spec technically requires 64 bit addressing most modern implementations of fuse off the upper couple of bits allowing only the lower 50 some bits to be functional and forcing the upper bits to zero. this allows the standard to expand in the future if needed.

    • MadManOriginal
    • 7 years ago

    You know what I hate about ARM core nomeclature? The apparent randomness of the commonly used number for the core type. the *v# nomenclature makes sense and advances one at a time. But the ARM## (I mean Cortex A## here) nomenclature makes no freaking sense with higher numbers coming out much earlier and being worse than lower numbers.

      • Flying Fox
      • 7 years ago

      It has always been the ARMv#. What are you talking about?

      Cortex A## is different.

      [url<]http://en.wikipedia.org/wiki/ARM_architecture#ARM_cores[/url<] So looks like ARMv# is the instruction set architecture. The Cortex A##'s are just code names for the different cores that may have implemented the same ISA.

        • MadManOriginal
        • 7 years ago

        Yeah I meant Cortex A#.

          • Malphas
          • 7 years ago

          Well what’s so difficult about that? The Cortex A# nomenclature goes from the A5 at the low end to the A15 at the high end, with A7, A8 and A9 in between in increasing order or performance. It’s pretty straighforward and consistent.

      • NeelyCam
      • 7 years ago

      Asomethingmeaningless..?

      • Hattig
      • 7 years ago

      It’s fairly simple.

      ARM A5 < A7 < A8 < A9 < A15

      They’re numbered in order of performance (complexity, and probably cost to license), not release date. Quite why the numbers are what they are is another matter.

      The ‘A’ stands for Application Processor.

      I guess that the ARMv8 cores will start at ARM A20 or thereabouts.

        • shaurz
        • 7 years ago

        Not really. A7 is supposed to be faster than A8.

          • Malphas
          • 7 years ago

          It’s not.

      • Anonymous Hamster
      • 7 years ago

      As people have explained, it’s not too bad once you understand the things being designated (v# vs. A#). However, it can indeed be a problem when people play a little too loose with their specifications.

        • willmore
        • 7 years ago

        Like the MIPS chips coming out of China being called A10s? Yeah, that’s not meant to confuse people.

    • jdaven
    • 7 years ago

    Given GloFo and TSMC plans for 20 nm half-node next year, does it look likely that 20 nm ARMv8 chips will significantly overlap the time period of Intel’s 22 nm efforts in the low power arena?

      • chuckula
      • 7 years ago

      Not even close. Intel will have 14 nm in full production before the 20nm ramp is complete and Atom is going to be the first production part that rolls off in 14 nm. Remember, GloFo is still *hoping* to have 28nm mobile parts in large scale production by the end of this year.

      There’s a reason that brand-new and even yet-to-be-launched tablets from Lenovo, Asus, Microsoft, etc. are all using 40 nm Tegra 3’s and that’s because you can at least get decent supplies of those chips. Apple is just barely putting its toes in the water with Samsung’s 32 nm process used to fab a die-shrinked version of its ARM chip for the iPad 2 while the newer iPad 3 is still using 40 nm parts.
      Basically, 2013 is the year of the 28nm ARM chip, even though some leaked out in 2012.

      If everything goes perfectly for these guys they’ll have parts out about 1 year after Intel has rolled out the first 14nm Atoms. If things go like they did for 28 nm, then don’t hold your breath.

      • DavidC1
      • 7 years ago

      Remember, add at least a year to claims on the Tablet and Phone world.

      Intel’s 22nm Silvermont-based chips are coming in late 2013, but should be announced earlier than that. That means having 14nm year later means late 2014 for Atoms, while Broadwell is early to mid 2014 depending on parts. Silicon-wise, Atoms might be earlier, but products will be immediately available on Core chips unlike on the Tablet/Smartphone Atoms where it will take additional time, usually a year.

      What about 20nm for Foundries? Well since they claim production next year, products are a year after that. Look over previous generations and you’ll see each full jump(in case of foundries, its from “half-node” to “half-node”) takes slightly longer than 2 years. First 28nm phones were available near summer of 2012. That means first 20nm phones will be at the earliest, summer 2014.

      • dkanter
      • 7 years ago

      Foundry production and products in end-user hands are totally different

      Process technology is on a 2 year cadence. The first 28nm products came out in early 2012…why would you expect 20nm any earlier than 2014?

      DK

        • jdaven
        • 7 years ago

        I ask because it seems that Intel will need a huge process lead in order to make their SoCs competitive against ARMv8 SoCs in the low power market (< 5W). If there is significant overlap of 22 nm Intel and 20 nm ARM SoCs, I don’t see how Intel will compete against ARMv8 at 20 nm.

    • codedivine
    • 7 years ago

    I remember reading Project Denver was supposed to hit in 2013, which will put it in the same timeframe as the HSA-enabled Kaveri APUs, as well as Haswell. Anyone have any speculation to offer about project denver?

      • Damage
      • 7 years ago

      It will produce an ARMv8-compatible CPU? 🙂

      Seriously, Nvidia has been incredibly quiet about it, other than to confirm they’re still working on it.

        • Beelzebubba9
        • 7 years ago

        Do we have any idea what verticals nVidia is targeting with Project Denver? Will it be a A15/Krait mobile CPU or something more powerful?

      • chuckula
      • 7 years ago

      My prediction for Denver (for what it’s worth): Denver is not Nvidia’s attempt to come out with a desktop/notebook ARM CPU + integrated GPU. Instead, Denver is going to effectively be a self-hosted video card type of device much along the lines of the Xeon Phi.

      Unlike a regular video card or even the “compute” Tesla parts, Denver is not just going to be a peripheral that is integrated as part of standard computer*, but instead will be a video card-like device that is completely capable of running an operating system (basically Linux) and self-hosting. In some ways Denver will be the antithesis of Sandy Bridge where Intel took a powerful CPU and threw on just enough GPU for window dressing. Denver is just enough ARM CPU to let the Linux system load and perform the basic OS tasks you expect fast enough to keep the large onboard “GPU” fed with data for number crunching. The “GPU” is in quotes since it will probably spend most or all of its time doing computation and not so much playing games.

      Don’t let the 64-bit part of Denver fool you into thinking that Nvidia wants to build a high-performance ARM CPU that can go head to head with Haswell. Nvidia isn’t that dumb (if they are that dumb then I guarantee Denver fails). The 64 bit part of Denver is simply there to make sure the OS can address > 4 GB of RAM which will be mandatory by the time Denver makes it to market.

      So what is Denver? A big self-hosting number crunching card basically. I’m sure it’s theoretically possible for it to come in different form factors, but that’s my bet as to what Nvidia will sell. BTW, I expect the OS and application software to be loaded onto an on-board SSD so don’t expect these things to come with traditional motherboards, DIMM slots, SATA ports, etc. etc. Denver has a CPU, but is not meant to be a PC.

      * (That’s not to say that Denver can’t be plugged into a PCI slot to get power and for providing an I/O backplane. I’m sure there will be versions that drop into a regular PC, it’s just that Denver will be kind of a PC inside of a PC similar to how the Xeon Phi can operate.
      I would also expect there to be some sort of customized cluster hardware from Nvidia that just provides power & I/O backplanes to a whole bunch of self-contained Denver cards in a cross between a blade server and a big stack of GPU cards)

        • MadManOriginal
        • 7 years ago

        Hmm..so you think Denver is intended to be a ‘complete’ GPU compute part, to replace whatever traditional GPU compute NV has coming?

          • chuckula
          • 7 years ago

          Basically yes. Nvidia will go with what it knows. In much the same way that Intel isn’t slamming the competition with discrete video cards (or even beating AMD in IGPs), Nvidia is not about to run out and make an ARM version of an Ivy Bridge, Haswell or even Trinity grade desktop chip. I could see a Tegra-on-steroids solution for low-end notebooks and HTPCs in the future, but the noise that Nvidia made about Denver didn’t seem consistent with an amped-up Tegra.

          What Nvidia does know pretty well is CUDA, and you can bet that Jen-Hsun gets steamed every time a Tesla card goes into a high-end Intel based server. He wants to cut out the x86 component and have more control over the platform. Whether or not this is a good idea in the long run is something else, but, despite Jen-Hsun’s ego, you’ll probably still have the option to slot these things into standard x86 boxes.

      • DavidC1
      • 7 years ago

      “I remember reading Project Denver was supposed to hit in 2013, which will put it in the same timeframe as the HSA-enabled Kaveri APUs, as well as Haswell. Anyone have any speculation to offer about project denver?”

      Unlikely. Add at least a year to whatever the manufacturers want you to believe.

      There’s a difference between:
      -Production vs. Ramp
      -Foundry claims vs companies that use the service
      -Silicon vs. Product

      However on the PC, the year they say it’ll be out is the year it is out.

Pin It on Pinterest

Share This