news cortex a35 is arms smallest most efficient 64 bit cpu core yet

Cortex-A35 is ARM’s smallest, most efficient 64-bit CPU core yet

ARM is bringing 64-bit computing to its smallest, most efficient Cortex A-series CPU core yet with the introduction today of the Cortex-A35. The A35 is a licensable core, like the more familiar Cortex-A57, so its availability means that ARM's partners can begin the process of integrating this CPU core into their upcoming chips.

ARM expects chips based on the Cortex-A35 to go to a number of different places. Most prominent among them is the burgeoning market for low-cost smartphones ranging from under $200 to as low as $25. The firm says demand for such devices is growing at roughly eight percent per year.

The Cortex-A35 fits into ARM's portfolio as a successor to the ultra-low-power Cortex-A7. By ARM's estimates, the A35 consumes about 20% less power than a typical Cortex-A7 implementation built with the same 28-nm process tech. Half of those reductions come from potential improved design flows, and the other half comes from microarchitectural improvements present only in the A35.

The Cortex-A53 may be a more familiar comparison, since the A53 acts as a "little" core in the big.LITTLE scheme used in some of today's most popular mobile SoCs. Compared to the A53, ARM says the Cortex-A35 is 25% smaller, consumes 32% less power, and is 25% more energy efficient on the same 28-nm fab process.

The most remarkable thing about the A35 may be its ability to squeeze a full-featured implementation of the ARMv8-A ISA into such a tiny core. The A35 supports the same instructions as its bigger siblings, with 64-bit registers, cryptography acceleration, double-precision floating-point math, and vector processing via ARM's Neon extensions. Meanwhile, the A35 retains backward compatibility with existing 32-bit software in the AArch32 state. In this state, the core can execute legacy code without requiring any modifications, yet it also exposes new instructions for faster floating-point math and crypto.

None of these capabilities are particularly new, if you've been following the latest Cortex A-series offerings, but the A35 brings them to the smallest footprint yet. Since it is fully compatible with ARMv8-A programs, the Cortex-A35 is capable of acting as the "little" core in a big.LITTLE paring with other 64-bit ARM cores.

The A35 is relatively simple compared to the larger cores we're used to seeing. It has an eight-stage main pipeline with "limited" dual-issue capability. Instructions are executed in program order. ARM's presentation makes it sound like the Cortex-A7 was the starting point for the A35 design, with a few key modifications were made in order to improve performance. That portrait has to be something of an oversimplification, though; adding support for the ARMv8-A instruction set and 64-bit registers is no minor tweak.

Regardless, ARM cites a few key areas where the A35 has been particularly improved over the A7.  The CPU's front end has been completely redesigned, with more accurate branch prediction and a more efficient mix of instruction fetch and queuing resources. The memory subsystem has been overhauled with an eye toward better handling of streaming data access patterns, apparently a common sight in mobile applications. Both the L1 and L2 cache subsystems should better handle streaming writes, in particular.

Another big area of improvement is the Neon pipeline used for SIMD and floating-point math. The A35 has a fully pipelined double-precision multiplier and more SIMD execution units. In theory, it should deliver two times the single-precision flops of the Cortex-A7 and five times the double-precision flops.

In order to accommodate this relatively beefy floating-point hardware, the A35 places the Neon execution units into their own separate power domain, and it adds some new low-power operating modes, as illustrated above. A single power gate will allow the SoC to switch off power to the CPU core and Neon units entirely, but both domains can also go into a retention state where they draw less power but are ready to wake up fairly quickly, if needed. Crucially, the Neon units can drop into retention while the main CPU core remains in active operation. ARM exposes control over these states to the SoC's power management logic via a standard known as Q-channel.

Performance claims for IP that hasn't yet been baked into a consumer chip are always tricky, but ARM has offered a few hints about how the A35 ought to perform compared to its predecessor.

In a web browsing test, the firm says the A35 performs 16% better than the A7 at the same 1.2GHz clock speed in 28-nm silicon. However, the A35 is capable of higher-frequency operation, and it can be 84% faster when running at 2GHz.

In other tests, a single A35 core ranges between 6% and 40% faster than the A7 at the same clock speed.

Although the image above cites both higher performance and lower power from the A35, ARM didn't provide energy consumption numbers for these tests. We do know that ARM expects the A35 to be capable of operating at very low power levels: under 90 mW at 1GHz and, remarkably, under six mW at 100MHz. Both of those estimates assume a 28-nm manufacturing process. Presumably, the A35 could operate at even lower power levels with the benefits of newer 16/14-nm FinFET processes.

That six-mW figure comes from a particular implementation of the Cortex-A35, one that includes only a single core with an 8K L1 cache, no L2 cache, and no Neon or crypto hardware—so it's not exactly typical. The fact that such a configuration is possible, though, illustrates the flexibility of ARM's offering. The firm points out that this smallest configuration occupies only one-tenth the silicon area that a full-on quad-core implementation of the A35 with larger L1 and L2 caches and Neon/crypto units requires.

Some of the chipmakers in the ARM ecosystem move quickly. ARM has already licensed the Cortex-A35 to "multiple customers," and it expects devices containing the A35 to hit the market by the end of 2016. I'd expect those first entries to be low-cost phones and tablets sold outside of North America and Europe, but the A35 likely has a bright future ahead in a multitude of different sorts of devices.

0 responses to “Cortex-A35 is ARM’s smallest, most efficient 64-bit CPU core yet

  1. According to your source that A7 core has 32K of L1, has NEON, and has a VFP. The 0.4mm^2 figure in this article for the A35 has no NEON nor crypto hardware and only 8K of L1.

    Those aren’t comperable cores.

    If the diagram in this article is to scale, then the quad core A35 with L2, etc. is vastly larger than 0.4mm^2. Then again, the quad core A7 in the raspberry pi 2 with L2 is going to be well larger than the 0.45mm^2 figure from the source you quote.

    So, while I disagree with NoOne ButMe, the facts you have presented do not refute what they are saying.

    A quad of A35 are likely to be a reasonable amount larger than a quad of A7’s, but I don’t think they will be sufficiently larger as to deter their use as a replacement. That decision will more likely be made by core to I/O interfacing issues–does the A35 support AMBA, VPB, etc. for hooking to the existing I/O IP that Broadcom has.

  2. Cortex A7 = 0.45mm2 in 28nm – [url=,d.ZWU&cad=rja<]Source[/url<] Cortex A35 =< 0.4mm2 in 28nm Raspberry Pi 2 CPU = quadcore Cortex-A7 [quote="NoOne ButMe"<]The core is to large to use for a Rasberry Pi 3 without increasing cost or cutting things elsewhere.[/quote<] Mmmmmhhhkay...

  3. The core is to large to use for a Rasberry Pi 3 without increasing cost or cutting things elsewhere.

  4. No, my problem is definitely with the hardware. The standard PC style BIOS which all PCs have but which ARM lacks is a big deal. Device Tree is not a substitute for this, despite what some of its advocates say.

  5. The Moto G is pretty zippy with only Cortex A7 cores, this should be a bit faster than them. I mean, it’s not winning any benchmark contests, but with modern Android you can fly through most phone uses quickly on only quad A7s.

  6. Microsd slot, hmm… Cheap DIY option for one of those car video camera things that can do dual duty as a GPS. No cell service needed with offline maps.

  7. That’s misdirected frustration; your problem is with linux, not the hardware, since you didn’t mention performance but rather compatibility and ease of use. Even Valve struggles to smooth those rough edges.

  8. Maybe.

    For sure, it will power the vast majority of next-generation smartwatches which are currently stuck in the 32 bit era.

  9. Is an A7 ultra slow? Nope, not based on my experience with first gen Moto G.

    Fast enough for everything I would do.

    This is 10-20% faster at the same clock and can clock much higher. Also, it targets emerging markets. This would be the first experience with a personal computing device for the cost these will target. It will be plenty fast enough for that.

  10. As someone who owns a 1st gen Raspberry Pi and a CuBox-i i’ve had it with ARM. They still have a long way to go to catch up with the x86 PC world in terms of ease of getting an OS up and running and supporting the full capabilities of the SoC their CPU cores are integrated into. Give me a 14 nm Atom-based MinnowBoard any day. When the hell are those coming?

  11. They won’t. The market that will use these cores will likely use either one cluster of 4 as part of an SoC (like MediaTek’s X10 4xA53, 4xA57, 2xA72) or they will be making 1-4 cores with pretty much no addons for developing markets.

    And selling the SoCs for 2-3 dollars maybe.

    Also, they would use 28 or 32 I think 😉

  12. I honestly hope SoC makers won’t put 30 of these A35 cores and market it as “worlds 1st, greatest, fastest, [b<]Triaconta-core[/b<] SoC for your phone!". Long live moar-cores!!!

  13. Oh, for reference of how small that is, you could put two clusters of these in the area of Silvermont core + l2.

    If you moved silvermont to 28nm than a cluster of these 4 would probably be smaller than one silvermont core [b<]without cache[/b<]. Quite impressive.

  14. Well. Here comes the next wave of sub $2 phone chips from some Chinese company that will have a healthy margin.

    Until someone undercuts them selling a chip for $1 with near zero profit.

  15. got myself a huawei y550 recently for $50 that came with $50 vodafone credit and its a pretty good cheap phone with all the stuff you need
    4.5inch screen
    Qualcomm MSM8916 Snapdragon 410
    Quad-core 1.2 GHz Cortex-A53 (64bit)
    Adreno 306
    microsd slot
    front and rear camera
    2000mAh battery

    depending on usage it can last over a week with low usage and over 2 days with very high usage

    i dont see the point in spending loads on a phone nowdays when you can get ones like this for $50

  16. More likely an ODROID. But, whatever has it, I want one. 🙂 That single core version looks wonderful. Someone (I’m looking at you, Philips) put that in a 28 or 40 pin DIP!

  17. [quote<]20% smaller? [/quote<] Don't you mean 20% less power than the equivalent A7 on the same process, which is pretty cool considering all the extra functionality. In terms of smaller... it's 25% smaller than the A53, so it's not all that small really, certainly bigger than the A7.

  18. It does seem closer to the A7 after looking through it. Curious that ARM wants to call it closer to the A9, which had OoO, and then link the A7 to the A35. Maybe they mean pure end performance wise, but core wise the A7 seems like the base.

  19. That’s not quite accurate. The smaller cores never supported OoO execution.
    Albeit ARM tries to sell this as a successor to the A7, it looks a lot more like an optimized and slightly trimmed down A53 to me. A lot of the “new” stuff (like the neon retention for instance) was already sold as new with the A53. The A35 also inherits the much improved memory pipeline from the A53 (compared to the A7). Better branch prediction, data prefetch etc. also seem like taken straight from the A53.
    I think listing the differences to the A53 would have been much more interesting than those to the A7… It might be more configurable, though if used in smartphones that’s meaningless as you definitely need all the optional blocks anyway. Also, max L2 cache size is 1 MB only (down from 2MB), but otherwise I’m not sure. Maybe some queue sizes got smaller, it might also be possible it can’t quite dual-issue as much as A53 can but that’s just speculation.

  20. They didn’t clip it, since the chip’s predecessors (Cortex A5 and A7) never had it to begin with. In fact, even the Cortex A53 is in-order.

  21. Crazy how you could get a supercheap Android phone these days for something like $22. $22. Good grief. But they’re understandably 32-bit only.

    This core will sell by the boatloads.

  22. 25% smaller? Everyone get ready for 12 A35 core Mediatek SoCs.

    And a further butchery of the big.LITTLE concept with A35 as the little 4, A35 as the big 4, and A35 as the…Middle 4?

  23. Quite off topic, but when the TR page appeared and I saw what appeared to be a CPU block diagram, I felt a bit of excitement for a nanosecond thinking AMD has released Zen’s architectural details.

    Then I saw Neon… Cortex A35… smallest… Zzzzz…

  24. So they clipped OoO to get the die size down? wonder how much IPC is lost in a well optimized risc world ..