The trend toward ever smaller and cheaper PC components is, of course, nothing new. Chips have shrunk and prices have fallen for over 30 years now. Yet that trend has accelerated dramatically in recent years, spurred onward by the rise of mobile computing and signified by the success of low-cost laptops like the Asus Eee PC and high-zoot mobile computers like the iPhone. Sensing this trend, the world’s largest chipmaker kicked off an effort four years ago to develop a CPU that could fit inside the power, heat, and size requirements of such devices while maintaining compatibility with its existing lineup of PC processors. Internally at Intel, this processor became known as Silverthorne, and the core logic associated with it was code-named Poulsbo. Together, they make up the so-called Menlow platform, whose development we’ve been tracking for some time now.
Today, Silverthorne and Menlow are taking their final shape with the introduction of the Intel Atom processor and the Centrino Atom mobile computing brand. Thanks to an all-new CPU microarchitecture and companion core-logic chip, Intel is pushing x86-compatible computing into new frontiers. To better understand how they did it, we recently visited the Austin, Texas offices of the Silverthorne design team and spoke with several of the chips’ architects. Read on for an extensive overview of this new CPU and its related technology.
Thinking smallbut not too small
One of the keys to understanding Silverthorne is understanding its place in the world. Intel designed Silverthorne to fit into thermal and physical footprints that none of its current processors could. In its initial incarnation, Silverthorne will consume between half a watt and two-and-a-half watts, much less than most x86-compatible processors. (The closest x86 competitor, VIA’s Isaiah, will start at 3W and go up from there.) That power profile lets Silverthorne play in a number of different types of devices, including handheld GPS navigational devices and portable video players. Accordingly, Intel points to ARM processors as Silverthorne’s primary competition.
However, Silverthorne’s ability to fit into the smallest handheld devices, like smart phones, will be constrained in this first generation by the size and power consumption of its associated core-logic chip. Poulsbo was created to provide a fairly robust PC-style feature set for Ultra Mobile PCs and the like, with all of the necessary PC I/O interfaces and relatively powerful integrated graphics. The market failure of UMPCs and the triumph of the iPhone make Poulsbo’s targets look like something of a miscalculation now. But Intel paints the Menlow platform as part of a planned progression to deliver PC compatibility and power into ever-smaller form factors over time. The next step in the process will be the product code-named Moorestown, due in the 2009-2010 time frame, that will incorporate a Silverthorne CPU core into a system-on-a-chip design that’s better suited for smart phones.
In the interim, Intel will play to Silverthorne’s strengths by aiming it at slightly larger devices in established product categories and by continuing to push for development of what it now calls mobile Internet devices, or MIDs. Chief among those strengths is Silverthorne’s compatibility with Intel’s other x86 processors, which Intel equates not just with PC compatibility but Internet compatibility, as well. The firm sees substantial opportunity for Silverthorne to win a place in the tens of millions of handheld GPS receivers, video and DVD players, and game machines that ship each year by endowing them with a standard instruction set. When combined with Wi-Fi connectivity, any such device could double as a capable web client and extend its functionality by grabbing data from the Internet for mash-ups and the like.
Part of Intel’s strategy here is to make Silverthorne integration easy by fostering the development of a standard software stack that device makers can use to incorporate robust Internet client capabilities into their products with relative ease. To that end, Intel initiated the Mobile Linux Internet Project, or Moblin, online at moblin.org, and says it has about 100 people working on Moblin development. Their efforts focus on a range of components, from the kernel to middleware to application and media handling frameworks, including codecs. Moblin is based on desktop Linux, but Intel worked to reduce the OS’s footprint, to improve its power management features, and to standardize the APIs, so applications written for Moblin-based devices can work across different mobile Linux distros. Several OS vendors have signed on to package up Moblin into a complete product, among them Ubuntu and the Asianux consortium. More impressively, Intel cites a long list of application vendors and solution providers with Moblin-optimized products in the works, including Skype, Nero, AOL, MySpace, Adobe, Real, Dolby, and PopCap Games. With a slate of participants like that on deck, Moblin at least holds the promise of an even better web-client experience than Apple’s iPhone. Moblin’s project list includes a Mozilla-based web browser, as well, and its UI work includes provisions for touch-based navigation.
The choice of Linux over Windows may be a startling one coming from one of the members of the “Wintel” duo, but it makes sense to us. Linux has a smaller memory footprint, costs less (duh), and is more easily customizable than Windows. The user interfaces for MIDs and app-specific devices vary widely from one product to the next, so the familiarity of the Windows UI means little here. And, well, I’ve never been a terribly impressed with Windows Mobile, either.
Intel expects to showcase over 25 designs at the Atom’s launch today. Presumably, some of those will be GPS receivers and the like, while others will be MID attempts. The Linux based ones will be smaller, lighter, and cheaper, starting at about $499 and going up from there. The Windows-based devices will be higher end products, starting at $599.
Obviously, with MIDs, Intel intends to help create a new category of products into which it can sell Silverthorne. Having seen the iPhone in action, however, I’m skeptical about the larger, bulkier MIDs’ prospects for success. Because Silverthorne is such a small chip, it’s also very cheap to produce, and Intel will be addressing other markets with this CPU, as well. The upcoming Diamondville platform will combine a Silverthorne processor with a different Intel chipset, and will target low-cost notebooks and desktops; it looks like a perfect fit for future generations of the Asus Eee PC and is already expected to be a part of the Eee PC desktop variant. Silverthorne will likely find its way into embedded applications, too. In both of these spaces, it will compete more directly with VIA’s C7 and Isaiah processors.
The Menlow basics
For those who are familiar with Intel’s desktop-class products, the fundamentals of the Menlow platform will be comfortable territory in many ways. Still, both the Silverthorne processornow Atomand the Poulsbo chipsetnow the System Controller Hubare new-from-the-ground-up designs. Or, as many folks seem to be saying these days, “grounds-up” designs, which sounds like an exotic method for brewing coffee. (Note to self: Investigate.)
Silverthorne is a much different beast than Intel’s current Core 2 desktop and mobile processors, with an entirely new microarchitecture designed to hit a much smaller power budget. Performance was an important, but secondary, consideration in this architecture’s genesis. The chip itself packs “only” 47 million transistors into a die that’s a dimunitive 7.8 by 3.1 mm, or just over 24 mm². Intel has used its most advanced 45nm high-k chip fabrication process to produce Silverthorne, the same process it uses for Core 2 “Penryn” chips. (By contrast, Penryn’s die is 107 mm².) Even when mounted on a package for integration, Silverthorne will measure only 13 x 14 mm, with a height of 1.6mm.
The first Silverthorne processors will range in clock speed from 800MHz to 1.86GHz, and they will communicate with the rest of the system via Intel’s familiar front-side bus, with bus clocks of either 400 or 533MHz, depending on the product. (In fact, since Silverthorne uses this bus protocol, it’s theoretically compatible with Intel’s current desktop and mobile chipsets.) Silverthorne-based chips all pack 512K of onboard L2 cache, and the design team waded deep into Intel’s alphabet soup of technologies, building in compatibility for VT, XD, EM64T, SSE3, SSSE3, SpeedStep, and HT.
Phew, OK, let’s unpack that a bit.
The notable acronyms include EM64T, which is Intel’s name for x86-64 compatibility (so you could run Windows Vista x64 Edition on your next Garmin Nuvi, I suppose). SpeedStep is Intel’s name for its dynamic power management technologies, of which Silverthorne has a full complement. And the most intriguing of all may be HT, for Hyper-Threading, Intel’s rendition of simultaneous multithreading, a technology first (and last) implemented on the Pentium 4. Although Silverthorne is natively a single-core processor, certain models expose two threads to the operating system and execute them together in order to achieve higher performance.
We’ve already noted Silverthorne’s approximate power budget. More specifically, TDP ratings range from 0.65W for the lowest end models to 2.4W for the fastest ones. And I’ve already mentioned that Silverthorne is a low-cost affair. Exactly how low may surprise you, though. Here’s an overview of the various Silverthorne models and their pricing.
|Model||Clock speed||FSB speed||L2 cache||Hyper-
(CPU + chipset)
Notice that the CPUs’ TDP ratings are very much maximum values. Both the idle and estimated average power numbers for the processors are much lower, in the milliwatt range. This is the sort of territory Intel’s design teams for Silverthorne and Poulsbo had to navigate when considering power consumption.
Silverthorne pricing includes a Poulsbo System Controller Hub. Poulsbo integrates several different chipset functions into a single chip: a north bridge with a FSB and memory controller, a south bridge with various sorts of I/O blocks, and a graphics processor. The north bridge’s bus capabilities mirror Silverthorne’s, with frequencies of either 400 or 533MHz, and the memory controller supports a single channel of DDR2 memory at the same 400/533MHz clock speeds. The south bridge is but a subset of a traditional PC’s, with two PCIe x1 links, eight USB 2.0 ports, HD Audio, andoddly enoughan ATA/100 disk interface. This one is, I believe, simply the result of Intel guessing wrong about the penetration of SATA into mobile drives when it created the Poulsbo spec. More presciently, the SCH can also talk to flash RAM via three SDIO/MMC ports.
The vast majority of the SCH, of course, is dedicated to the integrated graphics processor. Although it’s a low-power component, this is truly a PC-class GPU, with OpenGL and DirectX 9 support (under Windows Vista, at least) and what Intel calls “full” HD video decode acceleration. This IGP comes with a big surprise and a mystery attached, though. The surprise is that Intel has gone to a third-party provider for this low-power GPU, and the big mystery is the identity of that supplier. Could it be Nvidia? ATI? Poulsbo has been in development for three years, so nearly anything seems possible. AMD hadn’t yet snatched up ATI when development started, and the impending Larrabee project hadn’t yet alienated Nvidia. Intel wouldn’t tell us who was providing the SCH’s graphics core, but I expect to find out in the next few days as the Menlow platform’s details become public. (Update: Turns out the GPU maker is Imagination Technologies, the PowerVR guys.)
The SCH is a somewhat larger chip than Silverthorne, partly because it’s manufactured using an older 130nm fab process. Intel says it chose 130nm because low-leakage circuits were readily available on that process. Also, the SCH’s physical size is limited by the large number of I/O pads it requires.
The Silverthorne pipeline
To better understand this new CPU, we met with a trio of project managers from the Silverthorne group, including Jonathan Tyler, Gian Gerosa, and Haytham Samarchi. Tyler gave a detailed presentation on the guts of the processor, with Gerosa and Smarchi interjecting comments along the way. One of the most important things we heard from them was a characterization of the team’s overall approach to the Silverthorne design, which was focused on fitting into the chip’s targeted power budget and then adding as much performance as possible. Echoing sentiments we’ve heard expressed by Glenn Henry at Centaur, the trio said one of their biggest challenges was getting the engineers involved to adapt their mentality to this project’s goals.
The Silverthorne team imposed discipline on this front by beginning with a simple single-issue, in-order CPU pipeline. They then added new features bit by bit, iteratively, until their performance and power efficiency goals were met. Potential features were vetted in quantifiable ways for efficiency, and those that failed to make the cut were not included.
The result of this process was a very distinctive new design. Most modern CPUs use out-of-order execution to achieve best performance, but Silverthorne’s designers didn’t like the tradeoff involved. As a rule, Tyler said, they avoided aggressive control and data speculation for efficiency’s sake.
They found another efficiency win in optimizing this in-order pipeline to handle x86 instructions atomically. Virtually all of today’s x86-compatible processors decode x86’s CISC-style instructions into their own internal instructions, but the Silverthorne team tailored its pipeline to handle those translated instructions as single, atomic units. As you may know, Intel calls x86 instructions macro-ops and internal CPU core instructions micro-ops. Like other recent Intel chips, Silverthorne has the ability to fuse multiple macro-ops into a single micro-op. On Silverthorne, basic x86 instructions with memory operands translate as a single micro-op, which brings higher efficiencies for both decoding and scheduling.
Tyler presented the data above to illustrate how Silverthorne handles some typical workloads. Complex x86 instructions like cosine are still micro-coded, but otherwise, an average of 96% of macro-ops execute as directly translated (1:1) or fused single micro-ops. This behavior obviously gives the processor a higher IPC, and Tyler claimed issuing “big chunks” like this increases power efficiency, as well.
Once they had this foundation, Silverthorne’s designers were able to extract considerably more performance by adding another new feature: simultaneous multithreading, better known by Intel’s marketing name, Hyper-Threading. Silverthorne is both dual-issue and dual-threaded; the chip can issue two instructions per clock and manage the execution of two separate threads interleaved together. Tyler characterized the threading as very fine-grained throughout the pipeline, with dual instruction queues and cycle-by-cycle scheduling based on availability. Threads are controlled in hardware and treated as equals. The pipeline itself is non-blocking, and the two threads can be completely intermixed within it.
The team found that this form of thread-level parallelism worked well in conjunction with their in-order pipeline for improving performance in a power-efficient manner. Tyler estimated that Silverthorne’s SMT contributes a 36% to 47% increase in performance at the expense of a 17% to 19% increase in power consumptiona clear win.
Along with its focus on power efficiency, Silverthorne was intended to be a thoroughly modern CPU. That’s another reason Intel chose not to base this processor on an older design. Moving an older design to this fabrication process and adding all of the latest features, they claimed, would have been more work than producing this new architecture. Silverthorne does have almost all of the latest bells, whistles, and ISA extensions Intel has introduced over the years. As I’ve mentioned, it supports SSE3 and the newer Supplemental SSE3 instructions added with the original Core 2 Duo, though it lacks SSE4 support. It also has extensions for virtualization (VT) and is compatible with AMD’s x86-64 extensions for 64-bit addressing. These ISA extensions should grant Silverthorne better IPC and higher efficiency, in addition to new capabilities. The architecture is even multi-core capable, a fact that may prove handy when Diamondville ships.
Tyler said one of the key considerations in selecting clock frequency targets for Silverthorne was keeping the number of gates per clock cycle relatively small. Accordingly, they chose a 16-stage main pipeline, breaking the pipe into many relatively simple stages. This decision allowed the use of low-power circuits and power-efficient algorithms at each step along the way. Also, surprisingly enough, it enabled Intel’s designers to use extensive automation in the design process. Inside of Silverthorne’s execution core, the ROMs and the thermal sensor are the only fully custom blocks. The rest are cell-based designs, with half of those synthesized and the rest made up of structured data paths. (The core is only part of the story, of course. Only about 30% of the chip’s transistors are there.)
Here’s a look at the various stages of Silverthorne’s pipeline. Tyler said this pipeline is capable of very high frequenciesup to 2.5GHz with typical silicon at 1.2V. In fact, he offered this look at clock frequency scaling versus core voltage.
Intel has chosen to stay at the lower end of this curve for Menlow-based products, keeping under 2GHz and likely under 1.0V. Given these numbers, I wouldn’t be surprised to see higher clock frequencies from Silverthorne in Diamondville systems and in other low-cost applications that aren’t as power sensitive.
We don’t yet have a Silverthorne-based device we can benchmark ourselves, so we’ll have to rely on Intel’s numbers for the time being. This web page rendering benchmark offers our first sense of how the chip might perform compared to its competition. Tyler flashed a number of other scores in front of us during his presentation, but most of them were unofficial, preliminary, and had various caveats attached. The one comparison worth mentioning was a quick one they’d done pitting a 2W Atom processor (presumably at 1.6GHz) against a 3W “Dothan” Pentium M ULV at 800MHz. In an array of tests, the Atom’s performance ranged between 1X and 1.3X that of the 800MHz Dothan. That should give you a sense of this CPU’s performance, and I think it’s pretty impressive when you think about it.
Silverthorne mapped out
Below is a logical block diagram of Silverthorne that Tyler showed us. I’ll cover the highlights he pointed out section by section below.
Fetch and decode
Silverthorne’s instruction cache is 32KB.
The branch prediction unit employs a 128-entry branch target buffer and a 4K-entry Gshare predictor in order to maintain prediction accuracy.
The chip’s two instruction queues have 16 entries each, or these can be unified to a single 32-entry unit. The scheduler can pick two micro-ops from either instruction queue in each clock cycle.
Silverthorne has two full 128-bit-wide floating-point ALUs. These ALUs and the shuffle unit also comprise a 128-bit-wide data path for SIMD integer math. The floating-point adder supports 128-bit single-precision adds, but most of the rest of the hardware is 64-bits wide, including the floating-point and integer multipliers.
The chip has a 24KB write-back L1 data cache and a 512KB L2 data cache that’s 8-way associative. The interface to the L2 cache is 256-bits wide. Both of these caches have hardware data prefetcher logic associated with them. The architecture supports integer store-to-load forwarding, as well.
Keeping power use low
Silverthorne’s design team made all sorts of choices aimed at keeping the processor’s power use in check, and we’ve already discussed some of them. In addition to the basic architectural decisions, they employed a whole range of power-saving tricks, some familiar and some new.
One of those tricks is the elimination of special-function units like the integer multiplier and dividerthese chores are handled by the corresponding FP units, instead. Also, as one would expect, Silverthorne has very fine-grained dynamic clock gating. Portions of the chip are deactivated dynamically in response to the data size and functional demands of specific threads. The power overhead required for 64-bit code is reduced when running 32-bit code; the integer register file and execution stack contain power optimizations for this purpose.
Silverthorne also includes the same C states, or sleep states, as Penryn, as summarized in the diagram above. Like Penryn, it can flush all or part of its caches in the various intermediate states. As for the C6 “Deep power down” state, the Silverthorne guys readily admitted “we stole it from Penryn” and “embellished it.” “It was a collaboration.” This is the sleep state in which virtually the entire chip is shut down and the machine state is stored in an onboard SRAM.
The C6 state is so efficient because Silverthorne has split power planes. Of the chip’s 203 IOs, only 21 need to be awake in the C6 state. By putting these on separate power plane, the other 182 can be turned off, resulting in a large reduction in leakage power.
Interestingly, Silverthorne doesn’t warm its caches upon coming back from C6, but instead warms them on demand.
To give you some idea of the effectiveness of the various C states, Intel quoted some numbers from its own internal testing, with the C0 state as its base number. In C1, Silverthorne’s power consumption was 40% of the base. C4 was 12% of base, and C6 was just 1.6% of base. The firm estimates that a mobile Internet device might spend 80-90% of its time in C6, which is how Silverthorne achieves an average power consumption of around 220mW.
Another optional power-saving measure in Silverthorne’s arsenal is the ability to run its front-side bus in CMOS mode rather than using the usual GTL signaling. Obviously, the SCH must support this mode, as well. Intel claims this measure can save between 200 and 500 mW of platform power.
If you care at all about the sort of low-cost and low-power computing devices that Silverthorne targets, it’s hard to not be excited about seeing Intel enter this market. The incumbent devices in these markets are too slow, too hot and power-hungry, too expensive, too proprietary, or some combination of those things. It doesn’t have to be that way, and Silverthorne’s interesting new design points the way to a better day, when mobile computing becomes ubiquitous and offers a truly satisfying user experience. Intel’s considerable resources in chip design, technology, and manufacturing should help make that day arrive sooner rather than later.
Yet I can’t help but think that day isn’t here yet. Despite a hint of cautious optimism about some of the Silverthorne-based MIDs we might see soon, I’m more or less convinced the Menlow platform is a little too large, too power-hungry, and too PC-centric to really deliver on Silverthorne’s potential. This is no great penetrating insight. The UMPC concept has flown like a rock, and even Intel admits the biggest opportunities here are in smart phones. Once Silverthorne-derived Moorestown system-on-a-chip arrives, then we’ll see the really sweet devices based on this processor technology come out of the woodwork.
Of course, I’d love to be proven wrong by a flood of desirable Menlow-based devices post-haste, like a variant of the upcoming Eee PC with an 8.9″ display, an Atom processor, and Wi-Max for $399. (Note to Asus: you know how to reach me.) We’ll be watching to see how the Menlow-based devices shape up. Meanwhile, Silverthorne’s biggest opportunity in the near term may be in lower-cost systems based on the Diamondville platform. Intel could sell a boatload of these things in the developing world, and that wouldn’t be a bad first step.