ARM is bringing 64-bit computing to its smallest, most efficient Cortex A-series CPU core yet with the introduction today of the Cortex-A35. The A35 is a licensable core, like the more familiar Cortex-A57, so its availability means that ARM's partners can begin the process of integrating this CPU core into their upcoming chips.
ARM expects chips based on the Cortex-A35 to go to a number of different places. Most prominent among them is the burgeoning market for low-cost smartphones ranging from under $200 to as low as $25. The firm says demand for such devices is growing at roughly eight percent per year.
The Cortex-A35 fits into ARM's portfolio as a successor to the ultra-low-power Cortex-A7. By ARM's estimates, the A35 consumes about 20% less power than a typical Cortex-A7 implementation built with the same 28-nm process tech. Half of those reductions come from potential improved design flows, and the other half comes from microarchitectural improvements present only in the A35.
The Cortex-A53 may be a more familiar comparison, since the A53 acts as a "little" core in the big.LITTLE scheme used in some of today's most popular mobile SoCs. Compared to the A53, ARM says the Cortex-A35 is 25% smaller, consumes 32% less power, and is 25% more energy efficient on the same 28-nm fab process.
The most remarkable thing about the A35 may be its ability to squeeze a full-featured implementation of the ARMv8-A ISA into such a tiny core. The A35 supports the same instructions as its bigger siblings, with 64-bit registers, cryptography acceleration, double-precision floating-point math, and vector processing via ARM's Neon extensions. Meanwhile, the A35 retains backward compatibility with existing 32-bit software in the AArch32 state. In this state, the core can execute legacy code without requiring any modifications, yet it also exposes new instructions for faster floating-point math and crypto.
None of these capabilities are particularly new, if you've been following the latest Cortex A-series offerings, but the A35 brings them to the smallest footprint yet. Since it is fully compatible with ARMv8-A programs, the Cortex-A35 is capable of acting as the "little" core in a big.LITTLE paring with other 64-bit ARM cores.
The A35 is relatively simple compared to the larger cores we're used to seeing. It has an eight-stage main pipeline with "limited" dual-issue capability. Instructions are executed in program order. ARM's presentation makes it sound like the Cortex-A7 was the starting point for the A35 design, with a few key modifications were made in order to improve performance. That portrait has to be something of an oversimplification, though; adding support for the ARMv8-A instruction set and 64-bit registers is no minor tweak.
Regardless, ARM cites a few key areas where the A35 has been particularly improved over the A7. The CPU's front end has been completely redesigned, with more accurate branch prediction and a more efficient mix of instruction fetch and queuing resources. The memory subsystem has been overhauled with an eye toward better handling of streaming data access patterns, apparently a common sight in mobile applications. Both the L1 and L2 cache subsystems should better handle streaming writes, in particular.
Another big area of improvement is the Neon pipeline used for SIMD and floating-point math. The A35 has a fully pipelined double-precision multiplier and more SIMD execution units. In theory, it should deliver two times the single-precision flops of the Cortex-A7 and five times the double-precision flops.
In order to accommodate this relatively beefy floating-point hardware, the A35 places the Neon execution units into their own separate power domain, and it adds some new low-power operating modes, as illustrated above. A single power gate will allow the SoC to switch off power to the CPU core and Neon units entirely, but both domains can also go into a retention state where they draw less power but are ready to wake up fairly quickly, if needed. Crucially, the Neon units can drop into retention while the main CPU core remains in active operation. ARM exposes control over these states to the SoC's power management logic via a standard known as Q-channel.
Performance claims for IP that hasn't yet been baked into a consumer chip are always tricky, but ARM has offered a few hints about how the A35 ought to perform compared to its predecessor.
In a web browsing test, the firm says the A35 performs 16% better than the A7 at the same 1.2GHz clock speed in 28-nm silicon. However, the A35 is capable of higher-frequency operation, and it can be 84% faster when running at 2GHz.
In other tests, a single A35 core ranges between 6% and 40% faster than the A7 at the same clock speed.
Although the image above cites both higher performance and lower power from the A35, ARM didn't provide energy consumption numbers for these tests. We do know that ARM expects the A35 to be capable of operating at very low power levels: under 90 mW at 1GHz and, remarkably, under six mW at 100MHz. Both of those estimates assume a 28-nm manufacturing process. Presumably, the A35 could operate at even lower power levels with the benefits of newer 16/14-nm FinFET processes.
That six-mW figure comes from a particular implementation of the Cortex-A35, one that includes only a single core with an 8K L1 cache, no L2 cache, and no Neon or crypto hardware—so it's not exactly typical. The fact that such a configuration is possible, though, illustrates the flexibility of ARM's offering. The firm points out that this smallest configuration occupies only one-tenth the silicon area that a full-on quad-core implementation of the A35 with larger L1 and L2 caches and Neon/crypto units requires.
Some of the chipmakers in the ARM ecosystem move quickly. ARM has already licensed the Cortex-A35 to "multiple customers," and it expects devices containing the A35 to hit the market by the end of 2016. I'd expect those first entries to be low-cost phones and tablets sold outside of North America and Europe, but the A35 likely has a bright future ahead in a multitude of different sorts of devices.