We covered some of the basics of the chip known as Avoton, Intel’s low-power and low-cost system-on-a-chip based on the next-generation Atom CPU core, earlier this summer. Today, Intel is officially unveiling the Atom C2000-series products based on Avoton, so we have the opportunity to offer a little more detail about this distinctive new SoC.
Although Avoton is based on a low-power CPU core, its mission is not to power a new generation of mobile devices. Intel has another SoC, code-named Bay Trail, for that market. Instead, Avoton is aimed at various spots in the data center where Intel’s Xeon processors are either too big or too expensive to serve. Among them: the emerging breed of rack-based systems known as microservers and enterprise storage applications. Avoton also has a sister chip, known as Rangeley, that’s based on very similar silicon but is intended for networking and communications devices.
Intel produces this SoC on a custom-tuned variant of its 22-nm fabrication process, which has some of the finest geometries in the industry and is the first process to adopt a “3D” or FinFET-style transistor structure. We’ve already seen quite a few bigger cores manufactured at 22-nm, but the benefits of this process are arguably most notable for low-power chips like Avoton. Intel is taking full advantage of its celebrated manufacturing advantage here.
Avoton is a true system on a chip, with everything one might expect from a traditional server-class system integrated into a single die. Nevertheless, Intel has a separate name for the platform on which Avoton resides: Edisonville. Thanks to Avoton’s extensive integration, the Edisonville platform’s footprint is about the size of a credit card, including memory and external I/O connections.
You can expect to see fully functional Avoton-based computing nodes mounted on compact cards that will plug into microserver enclosures. Part of the appeal of such systems is the ability to cram lots of nodes into a single rack. With the right balance of resources and the right application, such a deployment has the potential to offer higher compute density and better power efficiency than a rack of traditional Xeon-based servers.
The block diagram above is the most granular look we have from Intel at the layout of the Avoton SoC. You can probably pick up the basics just by scanning it, but we’ll cover some of the highlights in a little more detail.
Each of Avoton’s eight CPU cores is based on the brand-new Silvermont microarchitecture, which Intel outed earlier this year. Silvermont is the first true reworking of the Atom microarchitecture since its beginnings, and it is a new-from-the-ground-up design with a renewed emphasis on per-thread performance. Gone is the symmetric multithreading used in the old core; Silvermont extracts instruction level parallelism via out-of-order execution instead.
Of course, Silvermont retains full compatibility with Intel’s x86 ISA, including support for newer instructions up to the level of the big Westmere core, and it’s capable of true 64-bit addressing. Intel likes to emphasize both of those attributes, an indicator that ARM-based SoCs are the true competitive target for Avoton and Rangeley.
Silvermont’s cores are grouped into dual-core modules with 1MB of shared L2 cache, as shown above. Avoton has four of these modules, and each one talks to the rest of the world via a bit of glue known as the Silvermont system agent. The SA enables multi-core Atom SoCs like Avoton; it is a modular design that can scale up and down as needed. (We can presumably expect the Silvermont SA to make an appearance in SoCs like Bay Trail, as well.) The system agent coordinates between the modules’ four L2 caches, maintaining coherency, and routes requests to the rest of the system, as well.
The front-side bus used in prior Atoms is now well and truly gone, replaced by an Intel-developed interconnect known as IDI. This point-to-point link has been used in Intel’s larger cores since Nehalem, and in Avoton, it delivers up to 25.6 GB/s of bandwidth via a crossbar-style fabric.
That 25.6 GB/s figure is no accident; it’s also the peak amount of throughput possible via Avoton’s dual-channel memory interface, which supports DDR3 and DDR3L DRAM at speeds up to 1600 MT/s. Robust data integrity protection is offered, including ECC (SEC-DED), for external memory. When configured with two DIMMs per channel, a single Avoton node can support up to 64GB of physical memory.
Avoton’s integrated south bridge, which Intel has dubbed a “south complex,” packs in quite a bit of connectivity, including 16 lanes of PCI Express Gen2. The Avoton team chose to use the older PCIe Gen2 standard for a quick time to market, at least in part. I suspect the 64 Gbps of effective bandwidth provided by those lanes should suffice for the vast majority of roles this chip will play. The chip has four separate PCIe roots, each of them with four lanes, that can be combined into a single x16 connection, a dual x8 config, or so on.
That PCIe bandwidth shouldn’t be needed for Ethernet networking, since Avoton’s south complex also includes four GigE connections. In fact, those connections support an early pseudo-standard Ethernet speed of 2.5 Gbps, so they can offer 10 Gbps in aggregate across four ports when connected to a switch that supports that data rate.
The communication-focused Rangeley variant of the SoC includes an extra bit of logic, as well, represented in the diagram above as “QAT accel” and better known as Intel QuickAssist Technology. This is a hardware block dedicated to the acceleration of a host of popular data encryption algorithms, to allow for higher throughput while relieving the CPU cores of the burden. Intel says it provides an API for making use of this hardware and has already enabled direct access to the acceleration hardware via open-source frameworks. Although this is the first iteration of QuickAssist of which we’re aware, the technology is a scalable “building block” and could be expanded in future implementations in order to achieve higher throughput.
Perhaps the most intriguing part of the Avoton south complex, though, is the interconnect that ties everything together. Called IOSF, for Intel On-Chip System Fabric, its presence is a clear indication of how far Intel has moved toward the SoC-style of modular chip design. This common communication fabric enables the company to re-use functional blocks across multiple chip designs. Any component that speaks this common language should, in theory, drop into a new SoC layout with relative ease. Today’s Haswell client chips make use of IOSF, as do all of Intel’s platform controller hubs. Going forward, Intel says “everything being developed now” will employ IOSF.
ISOF fully supports PCI Express headers and ordering rules, and it looks like a PCIe device to software, so it shouldn’t require any special support. We don’t have all of the details, but IOSF is apparently a fairly wide interface that runs at clock frequencies up to 400MHz. It can be scaled back to save power when needed, and in Avoton, it has been. In fact, two separate speeds of IOSF fabric are deployed in the south complex.
In addition to PCIe, Ethernet, and SATA, the south complex supports a host of different types of legacy PC I/O. The block that facilitates this legacy I/O also includes the chip’s power management controller. Having the power control unit on-chip should allow for faster state transitions and more granular power gating than would be possible with the external PMIC used by most SoCs.
In fact, Avoton truly can “shift power around” on the die depending on the current usage scenario, to make full use of its power envelope. For instance, the chip can take better advantage of its Turbo clock speed headroom when fewer I/O connections are in use.
Speaking of power management, although Avoton is a low-power solution, Intel makes some sharp distinctions between this chip’s dynamic voltage and frequency scaling behavior and the way a chip for mobile devices might be tuned. The Avoton team wasn’t as willing to trade additional wake-time latency for incremental savings in idle power. For instance, capturing a savings of 50 mW at the expense of 100 microseconds of wake latency might work well for a phone, but it can mean dropped packets in a server. Avoton’s DVFS policies are very similar to the Xeon’s, in order to avoid such problems.
Unlike the Xeon, though, the Avoton SoC is intended only for single-socket systems. When asked about the prospects for multi-socket systems, Intel’s Brad Burres allowed that multi-socket SoCs of this class are possible in the future. He cast doubt on their prospects, though, by pointing out that the socket-to-socket interconnect burns 5-10W of power, a cost that is “easy to amortize” on a big Xeon but more difficult to justify in this class of chip.
The Atom C2000 series
Intel is offering a host of Atom C2000-series products based on Avoton and Rangeley. As you can see, the power envelopes range from 6W to 20W, with four to eight CPU cores. All of these models are based on the Avoton and Rangeley dies, which natively have eight cores. Those with lower core counts just have one or two dual-core modules disabled. The fastest versions have base clocks of 2.4GHz and Turbo peaks just a smidge higher, at 2.6GHz.
We haven’t tested an Atom C2000 ourselves (yet?), but Intel has provided a few performance numbers that offer a sense of what to expect. The rise in Stream performance compared to the older “Centeron”-based Atom S2160 is substantial. I expect the gain comes in part from Avoton’s dual channels of memory at 1600 MT/s and in part from architectural changes, with more cores and more internal bandwidth via the system agent and IDI.
Since these performance numbers are only relative, we can’t compare bandwidth to Xeons and Opterons running Stream, unfortunately.
Here’s a look at integer computation performance. Again, the improvement from the prior generation is over 4X. Obviously, the Marvell SoC based on quad ARM Cortex-A9s is overmatched, partially because it simply can’t accommodate enough RAM to run four threads simultaneously.
I suppose that’s a big part of the story here: by delivering Avoton-based products today, Intel is well out ahead of its competition in a market where it has perceived a threat to its business. The interesting question now is whether Avoton’s apparent advantages in terms of compatibility, performance, and availability will be enough to head off the threat from a host of ARM-based SoCs that will surely be inexpensive, power efficient, and tailored exceptionally well for specific uses. At the very least, Intel isn’t making it easy for them.