A brand-new x86 processor microarchitecture doesn’t come along every day, but today, we have just that. We visited the offices of VIA Technologies’ processor subsidiary, Centaur Technology, Inc., yesterday to learn about its new x86-compatible processor architecture. Remarkably, Centaur President Glenn Henry and his team of less than 100 people have created a thoroughly modern x86 processor microarchitecture from scratch over the course of the last four years. The resulting design, which bore the code-named CN during its development and is also known as the VIA Isaiah microarchitecture, is a superscalar 64-bit processor with speculative, out-of-order execution.
As a new design, Isaiah’s overall set of capabilities and features reads more like Intel’s Core microarchitecture than anything else, but Centaur has aimed Isaiah at the same set of targets its C7 and prior CPUs have sought. That means low power consumption and low cost come first, with performance taking a back seat. Yet by moving from the C7’s simple, in-order architecture to a brand-new core with speculative, out-of-order execution, Centaur has the potential to deliver quite a bit more performance within its chosen set of constraints. Henry says the firm set the goal of delivering twice the C7’s performance at the same clock speed and within the same power envelope. VIA now claims Isaiah has two to four times the C7’s per-clock throughput, depending on the application.
That order of performance gainachieved by what Henry described as “real man’s architecture” rather than process technology optimizations or a move to multiple coresmay sound promising, but Centaur is careful about setting expectations for the performance of Isaiah-based processors. The first implementations will be single-core chips topping out at around 2GHz, mainly intended for embedded applications, ultra-mobile PCs, very low cost desktop PCs, and so-called mobile Internet devices. Isaiah’s mission is to add new capabilities and new instruction setslike x86-64, SSE3 and virtualization extensions compatible with Intel’s VM provisionsin order to enable such devices to run newer applications competently.
Centaur isn’t shy about taking on the competition in the markets it serves, though. In our meeting, Henry stated flatly that, based on what little we know about it, Intel’s Silverthorne processor won’t be as fast as Isaiah since it’s an in-order design. He observed with seeming bemusement that Intel was developing an in-order architecture for this market just as Centaur was moving to an out-of-order design.
Regardless of its target market, the Isaiah microarchitecture’s feature set impresses. Henry has authored a reasonably accessible architecture brief on Isaiah, even (quite comically) mapping Isaiah microarchitectural features to Intel marketing names like “Wide Dynamic Execution” in a series of footnotes. This brief reveals almost everything VIA has chosen to disclose about Isaiah to date, and I’d encourage reading it if you want the full scoop on this architecture. We will, however, discuss some of the highlights of the design. Here’s a quick logical overview of Isaiah:
From this altitude, Isaiah looks very much like any modern x86 processor. Isaiah can decode three x86 instructions per cycle, which it translates into micro-ops for internal execution. Like Intel’s Core, Isaiah can fuse multiple micro-ops into one, and it can combine multiple x86 instructions (like a compare-jump pair) into a single micro-op.
The chip can then issue as many as seven micro-ops per cycle to its seven execution ports. Those ports include two ALU ports for integer math, what Centaur calls a store address port and a store data port, one load port, and two media ports. The first media port is 128 bits wide and handles floating-point addition, SIMD integer operations, and divides and square roots. The second media port handles both integer and floating-point multiplication. According to Henry, “single-precision multiplies are fully pipelined with a world-record latency of three clocks.” Another interesting touch: this unit has a combined multiply-add capability used by more complex x86 instructions like transcendentals that are handled via Isaiah’s microcode subsystem.
Micro-ops are executed out of program order, and they’re then retired in program order at a rate of up to three per clock, which equates to as many as three x86 instructions.
Isaiah also features new and distinctive multi-stage branch prediction logic and a “memory disambiguation” capability similar to the Core microarchitecture that can moves loads ahead of stores if there are no dependencies.
Isaiah’s cache subsystem looks to make the most of its likely modest overall size with uncommon smarts. Isaiah’s L1 instruction and data caches are both 64KB in size and 16-way set associative. The L2 cache is similarly 16-way associative, and initial implementations will be 1MB in size, although Henry indicated that different L2 sizes are a distinct possibility. The L2 cache is exclusive, so it doesn’t duplicate the content of the L1 caches, effectively raising the total capacity of the cache hierarchy. Like other current CPUs, Isaiah uses predictive algorithms to examine data access patterns and prefetch some data directly into its L1 cache, but uniquely, it prefetches data less likely to be used into a smaller, dedicated buffer rather than the L2 cache.
Centaur sounds positive about the performance prospects of the first Isaiah-based chips, but Henry said they’re “infinitely smarter” as a result of having the first real chips back from the fab. The team is now able to watch what’s happening inside the chip as it operates in a way they could not during simulation, allowing them to make smarter decisions about things like queue depths and buffer sizing. Centaur is currently working on tuning the architecture and expects to achieve performance gains in future Isaiah-based products. New architectures do tend to afford opportunities for optimization; even Intel saw some nice performance gains when moving from Merom to Penryn.
Power, not performance, is the key
Even so, Isaiah isn’t likely to rival Core for outright or clock-for-clock performance. Henry proudly noted that Isaiah’s execution units have some strong points, such as low latency for an FP add or multiply. But Isaiah has fewer integer ALUs and fewer multipliers than Core, and Intel “can push more instructions through” its machine. Much of this difference is attributable to Centaur’s radically different design philosophy, centered around both more limited resources and very different goals. Henry noted that, at a high level, Isaiah looks like any other out-of-order machine. One level below that, however, he said Centaur has made thousands of choices that are quite different from those that an Intel or an AMD would make.
Many of those choices are aimed at conserving power. For instance, Henry said Centaur’s designers used a number of circuit design techniques to keep power consumption low, such as avoiding using the smallest possible feature sizes available to them on the 65nm process node in order to avoid leakage problems. Some areas of the chip are actually larger than they’d otherwise have to be if power efficiency were not such a priority.
Centaur has outfitted Isaiah with a range of power-saving technologies, including a dynamic clock scaling mechanism similar to Intel’s SpeedStep that alternates between a pair of PLLs to achieve very quick multiplier transitions. The chip’s adaptive thermal mechanism will modulate voltage based on die temperature, taking advantage of better conditions to keep voltage low or even overclocking the processor if the thermal and voltage headroom is available. Isaiah’s complement of power states includes a new C5 sleep state in which the chip flushes caches and powers them down, and Henry said a future stepping of the chip will introduce a new C6 sleep mode. In C6, the processor will save its state into internal memory that’s powered by I/O voltage, and then the system can literally turn off VCC to the part, which should make for an ultra-low-power sleep state.
Small, cool, and competent
Despite all of the changes, Isaiah processors remain bus- and pin-compatible with VIA’s current C7 processors, and they fit within the same thermal envelopes, so they should be easy upgrades for companies that manufacture C7-based products. The first Isaiah is a larger chip than the C7, as illustrated by the picture below, which shows several Isaiah chips next to a single C7.
As with the C7, Isaiahs will largely ship soldered to the motherboard, not in a socket. The chips’ BGA-style package measures only 11 mm by 11 mm.
C7 chips are manufactured on a 90nm process, while Isaiah will begin life at 65nm. VIA wasn’t especially keen on revealing the name of the foundry that produces the chips, but I couldn’t help noticing the “Fujistu 65nm” label printed on a sign next to a diagram of the Isaiah processor in Centaur’s offices. Henry estimates Isaiah’s transistor count at 94 million, versus 25 million in the C7, but he sensibly insists the number is meaningless because it’s dominated by Isaiah’s larger 1MB L2 cache.
VIA expects Isaiah-based processors to begin shipping in the middle of 2008. Even after that happens, the C7 will continue as a lower cost option in its product portfolio.
We weren’t able to run any benchmarks, but Centaur did have several demo systems set up in order to show off its new processor’s capabilities. One of them played back a compressed video at 720p resolution using the combination of a 1.2GHz ultra-low-voltage Isaiah chip and a VX800 chipset connected by an 800MHz front-side bus. This system used only passive cooling. Another demo rig played a Blu-ray disc fluidly using a 1.3GHz Isaiah assisted by a Radeon HD 3850. And the third system combined a 2GHz Isaiah with a GeForce 7950 GT to run a couple of games, including Viva Pinata and Crysis. I wasn’t able to play any games on this system in our limited time there, but they did look to run smoothly enough.
These demos of basic competency in common tasks jibe with the zen of Centaur’s approach to processor development, which is very much about just being good enough. Henry will patiently and persuasively lecture anyone who will listen about his philosophy. As you listen, you realize you’re hearing the voice of the radical commoditization of x86 processors. He points out that most people don’t and shouldn’t care what type of CPU they have in their PCs, so long as it gets the job done. When Centaur started, Henry says, they had to develop engineers with a different mindset, not “faster is better.” He set a series of targets involving die size limits and a ship date, and then directed his people to make the processor fast enough within those constraints that people would want to buy it.
This approach sounded quite foreign to many of us when Centaur first began, but with the advent of devices like the Eee PC, the iPhone, and Shuttle’s kPC, it no longer seems so strange. Indeed, once you’ve absorbed the Centaur mindset, Henry’s answers to questions become somewhat predictable. Will Isaiah go multi-core? It can; it’s built that way, and Henry thinks Intel’s approach of a shared L2 cache makes sense. But he scoffs at the notion that people need multiple cores in basic computing devices right now. Henry says Centaur will go to multiple cores if it needs that level of performance or if Intel convinces people they have to have it.
Some portion of the x86 processor market will be receptive to Centaur’s low-cost, low-power proposition, and I suspect that portion of the market will grow substantially in the coming years. Whatever happens, I must admit that the low-cost, low-power, make-it-adequate pitch sounds much better when served alongside a modern 64-bit superscalar, out-of-order CPU architecture like Isaiah.