The Atom processor was something of a surprise success for Intel. The company expected its product to succeed, of course, but perhaps not in the manner it did. When we first visited the Atom’s Austin, Texas-based design team, the spotlight was firmly affixed on the low-power Menlow platform. Intel expected Menlow to find its way into several different sorts of handheld mobile devices, including GPS receivers and portable game players. Most notably of all, it hoped to see a new category of Atom-based products, variously called ultra-mobile PCs (UMPCs) and mobile Internet devices (MIDs), become a consumer favorite. The other, larger Atom platform, code-named Diamondville and aimed at low-cost notebooks and desktops, was almost an afterthought.
Yet even at that time, the prospects for a UMPC revolution seemed dim, particularly in light of the iPhone’s rising popularity. What happened instead was the explosion of low-cost netbooks, scads of ’em, as a robust new product category, almost all of them based on Diamondville. Suddenly the Atom was everywhere, just not in the form Intel had anticipated.
The reasons for Menlow’s relative unpopularity were no great mystery. Although the Menlow platformcomprised of the Silverthorne processor and the Poulsbo chipsetwas smaller and required less power than any prior PC-compatible solution, it was still too big and power-hungry to fit into traditional smart phones and the like. With MIDs moribund, Menlow was stuck between classes, a product without a true home. Intel’s attempts to lure phone makers away from ARM processors predictably gained little traction.
Then again, Menlow was only a first step. Intel’s plans for the Atom have long called for reductions in the size and power draw of the Atom platform while keeping computing power relatively steady. Late last year, Diamondville gave way to Pine Trail, with one third of the physical footprint and 40% lower max power consumption, ushering in a wave of netbooks with up to 10 hours of battery life for under $300. Increased integration made that change possible even without a move to a newer chip fabrication process. Now, the second-generation low-power Atom platform, code-named Moorestown, is poised to take a much larger step into wholly new territory for an x86 processor: directly into the size and power requirements of smart phones and tablets.
To get there, the design team has conducted a sweeping revamp of the Atom platform, integrating more components into the main chip and making extensive modifications throughout to reduce power consumption. The result is a claimed 50X reduction in platform-wide idle power draw versus Menlow, in a package that takes up 40% less area, mounted on boards half the size of the previous generation. Intel believes Moorestown is competitive with existing smart phone platforms in terms of size and power efficiency, but with roughly double the performance. The Atom has the added benefit of x86 compatibility, for whatever that’s worth in the realm of handheld devices. We recently visted Intel’s Austin Design Center to learn more about Moorestown, and it was an education in what’s required to shrink a PC-compatible system into a pocket-sized form factor.
The Lincroft SoC
Technically, a trio of chips makes up the Moorestown platform, but don’t let that fact fool you. Moorestown is more tightly integrated than ever before and will require fewer auxiliary chips in order to function. The most noteworthy of the three chips is code-named Lincroft, and it has enough platform elements integrated into a single piece of silicon that Intel calls it a system on a chip, or SoC. Inside the Lincroft SoC resides an Atom CPU core, a front-side bus interconnect, a memory controller, a 3D graphics core, video playback and encoding units, and a display engine. The companion Langwell platform controller hub (PCH) chip handles traditional I/O duties, and the Briertown MSIC largely serves as a platform power manager. We’ll take a look at each one in turn, starting with Lincroft.
You might be surprised to learn that Lincroft isn’t being manufactured using Intel’s latest 32-nm fabrication process, even though the firm has been shipping 32-nm desktop and server processors for months. Instead, Lincroft is built using a variant of Intel’s 45-nm, high-k metal gate process tailored for low-power applications. One benefit of this process is the ability to use multiple transistor types, something the Atom team says they’ve done for Lincroft. They declined, however, to reveal any specifics about how the different transistor types are being used.
Generally, we’ve seen two different transistor types deployed in low-power processor designs: a slower variant with lower leakage for circuitry not in the critical performance path, plus a faster variant with higher leakage for gates that must switch quickly. We expect Intel is doing something along those lines with Lincroft, which would explain how they’re able to claim a 60% reduction in leakage power with negligible performance loss.
The custom SoC process also enables integration by supporting the higher voltages required by some types of I/O, including display standards like LVDS.
All told, Lincroft measures 7.3 mm by 8.9 mm, or 65.3 mm², with roughly 140 million transistors packed into that space. That’s quite a bit larger than the prior-gen Silverthorne Atom, whose 47 million transistors occupied just over 24 mm², but moving more components from the chipset into Lincroft’s 45-nm silicon is almost certainly an advantage. Crucially for its intended market, Lincroft’s package is still relatively small at 13.8 mm by 13.8 mm.
The Lincroft SoC (continued)
The Atom processor core in Lincroft isn’t much changed, architecturally, from the first generation; it still supports a wide range of alphabet-soup instructions, up to SSE3 but no further, and is 64-bit capable. One core can still track and execute two threads simultaneously via Hyper-Threading, and cache sizes remain steady, with a 32KB L1 instruction cache, a 24KB L1 data cache, and a 512KB L2 cache. The team hasn’t made any noteworthy tweaks to improve per-clock instruction throughput or to extend the core’s capabilities.
Changes throughout the chip should improve the processor’s performance and power efficiency, regardless. Most notably, clock frequencies are upand down. They’re down because the chip’s SpeedStep dynamic clocking mechanism can now drop the core’s frequency as low as 200MHz at idle, versus 600MHz for Silverthorne. They’re up thanks to the addition of a “burst mode” in this new Atom similar to the Turbo Boost feature familiar from Intel’s recent desktop and server processors.
Burst mode raises clock speeds opportunistically when the thermal headroom of the chip and the device will permit, as high as 1.5GHz in the smart-phone-oriented versions of Lincroft and up to 1.9GHz in the tablet-oriented variants. The Atom core can run in burst mode indefinitely so long as the thermal headroom is sufficient, so larger devices like tablets might spend quite a bit of time in burst mode, whereas smart phones may not.
Lincroft’s burst mode differs from Turbo Boost in a couple of notable ways. For one, the advertised clock frequency for a given Atom model will be its peak burst mode frequency, not a lower, guaranteed baseline speed. In fact, we came away from our visit to Austin without any record of the baseline clock speeds of the new Atoms. When we later asked Intel what those base frequencies would be, the firm declined to answer! That base clock is the advertised speed for the Core i7 and friends. Another major difference, one that pervades nearly every component in Moorestown, is the fact that software has extensive control over the CPU’s clock frequencies. A handset maker could, for instance, choose to enable burst mode in only in certain scenarios where it might be needed, such as playing back YouTube videos. Alternately, it could decide to leave burst mode enabled generally, but turn it off in specific situations, like during voice calls when the phone must also power a modem.
The Atom’s front-side bus uses the same protocol as past Atoms, even though it’s now communicating with other components on the same chip. The FSB has gained some of the same clock speed flexibility that the CPU has, though, with its own version of, erm, forced induction. The speed of Lincroft’s front-side bus scales dynamically with the speed of the CPU core, peaking at 200MHz or 800 MT/sdouble the max of Menlow and Silverthorne. As I understand it, the FSB clock will remain at 200MHz any time the CPU rages above 1.2GHz. When the CPU drops to between 600 and 1200MHz, the bus will run at 100MHz, and when the CPU goes to 200MHz, the bus can range down to 50 or even 25MHz. The caveat here is that some workloads might use relatively little CPU power and still need a fair amount of bus bandwidth. Either Intel or the device maker will have to tune the platform to make sure this feature doesn’t harm performance in such scenarios.
To meet the power requirements of smart phones, Lincroft’s memory controller supports a single, 32-bit channel of low-power DDR memory at 400 MT/s. Larger devices like tablets can use DDR2 memory at 800 MT/s, but the lower bandwidth of LP DDR was a particular concern for Lincroft’s architects. Belli Kuttanna, Sr. Principal Architect in Intel’s Ultra Mobility Group, told us they tried to offset the memory bandwidth lost during the switch to LP DDR RAM by making memory scheduling improvements to increase DRAM utilization. In his assessment, they’ve more or less “broken even, clock for clock” by doing so.
Lincroft’s memory arbiter deals with memory access requests coming in from multiple devices in the system. The arbiter’s buffer size is doubled in Lincroft, and it has been tweaked to do a better job of coalescing requests where possible, especially those from the graphics and media subsystems. One reason for the changes, according to Kuttanna, was making sure the chip could deal well with concurrent workloadsfor instance, playing music and display 3D graphics at the same time, perhapsin a bandwidth-constrained environment.
Of course, the memory controller now works harder to keep power consumption low, too, with extensive DRAM power management and more aggressive policies for the closing of DRAM pages.
The graphics processor in Lincroft is the same Imagination Tech IGP used in Menlow, but it now runs at up to twice the clock speed. The Atom team says it made performance optimizations “around” the IGPsuch as the memory coalescing we’ve just mentionedand power use optimizations “in and around” it. The result should be better graphics performance than Menlow, and Intel claims an advantage of 2-4X over any competitors that use the same graphics core, thanks to the combination of a better implementation, integration, the benefits of Intel’s 45-nm process, and better drivers.
Lincroft carries over the Imagination Tech video decoding and playback logic from Menlow, as well, with support for a broad array of formats including MPEG2/4, WMV9, VC1, H.264 and DivX. Intel is promising better video decoding performance this time around, which would be nice given our experience with the Poulsbo-based Acer Aspire One 751. Any improvements are likely due to higher clock speeds, the memory arbiter changes, or both. In fact, the firm says Lincroft can decode H.264 video streams at bit rates of up to 20 Mbps in 1080p resolution with all profiles, putting its capabilities well above competing platforms that top out at 720p or below.
To this decoding prowess, Lincroft adds a new feature: a hardware video encoding engine, also from Imagination Tech, capable of squeezing 720p video into several formats, including MPEG4 and H.264 base profile, at up to 30 FPS. Obviously, the encoder should be useful both for capturing video and for doing video conferencing on a Moorestown-based device.
Two new power states to seal the deal
All of the tweaks and integration would be insufficient to reach the power levels required by smart phones if Lincroft didn’t include much more drastic power-saving measures. The real heavy hitters are the introduction of very fine-grained power delivery on the chip and a pair of new power states intended to make use of it.
The Lincroft team has divided the chip into a whopping total of 19 separate “power islands,” each with its own clock and power gating. Power gating is accomplished via on-die switches that cut off power entirely to idle “islands.” These various power islands are served by a total of 12 different voltage supply rails coming into the chipquite an increase over Silverthorne’s dual power planes. The external Briertown MSIC makes the increased granularity possible by managing power delivery for Lincroft’s 12 rails and the rest of the platform.
The end result should be that Lincroft will consume only as much power as it needs for the portions of the chip that are in use. During a compute-intensive task, for example, the Atom CPU core might be fully powered on and active, along with the memory subsystem, but the graphics, video, and display logic could be consuming little to no power.
Even more important for interactive devices is how they’re handled during a typical day. Lincroft has added a couple of new power states aimed at the usage models for smart phones and tablets, with the catchy names S0i1 and S0i3. The basics of those power states are laid out in the table below.
S0 is the familiar state when the system is active and the CPU is transitioning between its various C-states, depending on the level of activity. Some power gating is possible when Lincroft is in S0, but there are opportunities to be more efficient. During everyday, interactive use of a mobile device, the system is waiting for user input for countless instants. Many of these are just fractions of a second, but to a computer, they’re relatively vast expanses of time. During these periods, Lincroft will drop into its S0i1 power state, engaging extensive power gating throughout the chip while the CPU core drops into a deep C6 sleep.
For S0i1 to be effective, the system must spend as much time as possible resident in that state, so the ability to make quick transitions is crucial. Thus, the onboard wake-up logic remains active, while the power manager and the CPU retain their state information on-chip. Intel claims the entry latency for S0i1 is only 600 microseconds; the exit latency is longer at 1.2 milliseconds, though surely quick enough to escape user perception. One can imagine a smart phone or tablet transitioning in and out of S0i1 constantly as a user checks his voice mail, browses through e-mail, and surfs the web.
When that’s over, the user generally locks the phone and slips it into a pocket, at which point Lincroft should drop into its new S0i3 power state, turning off nearly everything on the chip, including the wake-up logic, the CPU, and the onboard power manager. The entry latency for S0i3 is only 450 microseconds (less than S0i1 because much of the chip will already be turned off), and waking up from S0i3 will take 3.1 millisecondsmuch longer than S0i1, but presumably short enough not to be a problem when the user hits a button or an incoming call is detected.
This class of power management goes well beyond standard practice in most PCs, and the team made a host of changes to ensure that it’s effective in Lincroft: adding power gating for the analog blocks like PLLs and thermal sensors, reducing the fuse array sensing time to quicken transitions into standby mode, optimizing SRAM power use when the CPU is in C6 sleep, and the like.
The Langwell PCH
The various types of I/O and connectivity that weren’t integrated directly into Lincroft were instead incorporated into the Langwell south bridge, or platform controller hub (PCH), in Intel-speak. Intel manufactures Langwell on a low-leakage 65-nm fabrication process and puts it into a BGA package that measures 14 mm on each side, or 196 mm². As one might expect, Langwell is very different from a desktop PC south bridge; it supports a variety of I/O types geared specifically for handheld devices. Even its USB support, which would seem familiar at a glance, is a low-power variant of the desktop standard.
As with most south bridges, to explain Langwell is to recite a laundry list of I/O types mashed together on one chip. You can read about most of them in the block diagram on the right. Several of the chip’s features and limitations deserve a little more attention, though.
Storage support is among them, because the Langwell PCH has no traditional disk controller; only solid-state NAND flash is supported. For phones, that limitation won’t likely matter, but for tablets and other devices, it may be something of a problem. The NAND interface should be fast enough; at 32 bits wide and 100MHz, it can sustain transfer rates up to 400 MB/sbeyond the transfer rates of current flash chips. Hard drives offer more and cheaper gigabytes, though.
An interesting note on that front: you won’t see it in the block diagram to the right, but the more detailed information Intel displayed during our visit showed a CE-ATA interface built into Langwell. The CE-ATA standard was intended to act as a disk interface for mobile devices, but it has since been abandoned, leaving Langwell dependent only on solid-state storage. I’d expect to see some alternative hard drive interface added to future derivatives of this south bridge.
The media support in Langwell looks to be quite robust. HDMI is supported at resolutions up to 1080p and beyond, with audio over the same connection. The audio block includes dedicated hardware for decoding MP3smore on this feature in a minuteand the image processing block holds a signal processor with support for two cameras: a five-megapixel main camera and a VGA-class secondary one, presumably a front-facing camera to be used for video conferencing.
Like Lincroft, Langwell has fine-grained power management built in, as well.
One man’s PMIC is another man’s MSIC
The final bit of silicon in the Moorestown trio is the Briertown MSIC. Most smart phones have a PMIC, or Power Management Integrated Circuit, and essentially, so does Moorestown. However, Intel prefers its own name for Briertown: MSIC, for mixed-signal IC. Briertown is the least complex of the Moorestown chips, and although it was designed by Intel, its manufacturing has been outsourced to NEC, Freescale, and Maxim.
Like any other PMIC, Briertown’s primary job is indeed power management. It controls power delivery to Lincroft and Langwell, switching on and off their multiple supply rails as needed. The quick voltage changes directed by Briertown facilitate several nice platform behaviors, such as quicker transitions in and out of low-power states (for longer residency in them) and faster ramps into burst mode for Lincroft’s Atom core. Briertown is also responsible for keeping the battery charged, a pretty critical chore in a mobile device.
The Briertown MSIC has other jobs, tooa mish-mash of different facilities that the Moorestown team calls “jelly beans.” The, uh, beans are often bits of logic that would have to be provided by a separate chip, were they not included in Moorestown’s main silicon. Briertown contains a number of these dedicated units, like a touch-screen controller and analog sensors. Sometimes it teams up with Langwell to provide a complete solution path, as it does for sound by providing a codec compatible with Langwell’s audio controller. Pulling these functions into the PCH and MSIC, Intel asserts, saves space and power compared to separate chips.
In order to keep Moorestown’s power-saving hardware operating as efficiently as possible, Intel has implemented platform-wide support for a power management protocol called OSPM, or Operating System Power Management. OSPM provides fine-grained, software-directed control of hardware power states. Moorestown Platform Architect Bruce Fleming described for us the difference between typical ACPI power management as used in Menlowwhere the system is either on, off, or sleepingand OSPM on Moorestown, where each subsystem has integrated power management. With OSPM, system devices are actively managed and are individually, asynchronously put into low-power states, to be brought out only when their services are needed.
OSPM doesn’t just facilitate efficient operation generally, either. The control it exposes over hardware states allows for custom programming of specific operating modes, such as the example we described above where CPU burst mode is disabled during a voice call to reserve power for the modem. Another custom mode could ensure sufficient bus bandwidth when the CPU is largely idle but other logic is active by keeping the FSB clock from dropping too low.
All of Moorestown’s considerable capabilities come together in one particular use case the team described to us: the playback of MP3-encoded audio. While audio is playing, the audio engine and MP3 decode hardware are active, but the other components in the system are in a low-power state. Every so often, the path to memory is enabled, the audio engine fills its buffer, and the memory subsystem quickly returns to a low-power state. Because the audio engine is capable of direct memory access, the CPU doesn’t have to wake up during this refresh. Music keeps playing without interruption while most of the system is asleep.
Yeah, I’m gonna date myself by saying this, but to an old Amiga guy, that’s just frickin’ sweet.
Putting some numbers to it
What have all of these changes really netted in terms of power efficiency and performance? We haven’t tested Moorestown ourselves, but Intel’s Anjali Shastri, Strategic Marketing Manager in the Ultra Mobility Group, presented some results she compiled during competitive testing. The first set of data compares Moorestown to Menlow, the prior-generation low-power Atom platform.
|Task||Platform power draw in milliwatts|
|HD video playback||3200-3500||1110|
Moorestown consumes only 21 mW of power in standby, presumably in the S0i3 power state. That’s down massively from Menlow, hence the claims of reductions in idle power use of “over 50X”. Audio playback is that special use case we’ve just explored, and obviously, it’s a huge win for Moorestown, as well. The percentage reductions in power use during video playback and web browsing aren’t as spectacular, but they’re solid, nonetheless.
Shastri didn’t present a direct comparison of power draw numbers versus competing smart phone platforms, but she did offer some battery life results. The run times are based on a Blackberry battery.
|Standby||6-14 days||>10 days|
|Audio playback||0.5-1.2 days||~2 days|
|SD video playback||4-11 hours||~5 hours|
|Web browsing||3-7 hours||~5 hours|
These results would seem to confirm that Intel has met its goal of producing an Atom platform capable of competing head to head with the ARM-based incumbents in the smart phone market. That is a breathtaking accomplishment for a platform with PC roots that is still binary compatible with today’s fastest processors.
These are Intel-supplied numbers, so you can make of them what you will, but they certainly sound promising.
These chips won’t be called by their code names foreverat least, not by most folks, though we reserve the right to keep using them. Already, with their announcement, Lincoft, Langwell, and Briertown have magically transformed into the Atom Z600 series, the MP20 Platform Controller Hub, and the MSIC, respectively. Intel expects to see consumer products based on the Moorestown platform become available to consumers some time within the second half of this yeara large, vague window, but that’s all we have.
One of the more intriguing questions about Moorestown is what form those products will take. Intel says the platform will initially support two operating systems: Google’s Android and the MeeGo mobile Linux distro, the artist formerly known as Moblin. We’d expect Moorestown to be Windows-compatible, too, eventually, given its x86 compatibility and the fact that Menlow runs Windows reasonably well.
So far, we only know of two devices definitely slated use Moorestown, the Aava Mobile smart phone pictured above and the OpenPeak tablet pictured below.
Intel says it’s only working with “a handful of smart phone makers” right now, but it hopes to see a larger variety of Moorestown-based tablets. I suspect we’ll see tablets from many of the big names in netbooks, such as Acer, Asus, and MSI.
We’re dubious about whether a flock of iPad wannabes will qualify the Atom Z600 series as a commercial success. Smart phones are an established market, while tablets could still potentially fall into the same void that UMPCs and MIDs did before them. With the best smart-phone operating systems already running on ARM, and with major players like Apple apparently committed to building their own integrated, ARM-based platforms, even the Atom’s x86 compatibility feels like a bit of a disadvantage in that market. Whether Intel will make substantial inroads into the smart phone world with this generation of Atom technology remains a wide-open question at this point.
Still, the Atom has come a long way in a few short years, and Intel’s roadmap indicates a Moorestown successor, Medfeld, is already planned. Intel has a great many natural advantages on its side, including the world’s best chip manufacturing operation, which has been ramping the production of 32-nm processors for some time now. With Moorestown, the Austin team has already picked much of the low-hanging fruit for power and size optimizations; they’re unlikely to reduce idle power another 50X in the next generation. But they could well begin setting new standards for power use and performance ahead of the rest of the market going forward. Would it be much of a surprise if they did?
What’s more, as a PC guy, I just happen to think that seeing an x86 platform scaled down to the size of a smart phone is incredibly cool. Even if Moorestown doesn’t carve out a big portion of the phone market and the tablets based on it flop, the possibilities for PCs squeezing into new form factors are practically endless. Funny things happen when you make that possible. In the end, Intel may end up with another surprise success on its hands, one that comes in a form it didn’t anticipate.