The Broadwell SoC and module
In some ways, the Broadwell-Y chip follows the same basic outlines as the Haswell-Y processor before it. Both chips have dual CPU cores, 3MB of L3 cache, and integrated graphics.
Still, according to Stephan Jourdan, Intel Fellow and Director of SoC Architecture in the Platform Engineering Group, fitting a chip like this one into a fanless tablet form factor less than nine millimeters thick is a daunting challenge. The power that an SoC can consume in a device is determined by lots of factors, including display size, chassis thickness, the materials used, and even ambient temperatures. Jourdan says the system type Intel targeted, with a 10.1" display, requires an SoC that operates at three to five watts of sustained power. (That's not TDP, or peak power, but likely maps to Intel's newer SDP metric for mobile processors.) Given that the prior-gen Haswell Y-series processors operate at a 6W SDP, Broadwell would need to cut sustained operating power in half to meet this goal. Broadwell's physical size would have to shrink, too, in order to fit into the target devices.
The Broadwell team attacked these problems on all fronts. Thanks to the 14-nm process, the SoC shrank substantially from one generation to the next. The Haswell Y-series SoC measures 130 square millimeters, while Broadwell-Y occupies only 82 mm². That's not exactly half the area, but Intel's architects have added a number of features to Broadwell in order to improve its power efficiency and performance. The net result of everyone's efforts, claims Jourdan, is a chip that delivers more than twice the performance per watt of Haswell-Y before it.
Some of the advancement comes courtesy of an advantage unique to Intel, one the company is quick to emphasize these days. Intel's process tech engineers and chip designers have the ability to work together, within the same company, to "co-optimize" their products and fabrication processes. Jourdan credits a specially tuned flavor of the 14-nm process for a further 10% reduction in capacitance in Broadwell-Y silicon, a 10% lower minimum operating voltage, and a 10-15% switching speed improvement at low voltages. All told, the combination of general 14-nm improvements and process-specific tuning account for roughly two-thirds of the power efficiency gains from Haswell-Y to Broadwell-Y.
As you may know, Haswell-Y isn't entirely a "true" system on a chip. Many of the legacy I/O functions are hosted on a separate piece of silicon, known as the Platform Controller Hub or PCH, that mounts on a common module with the CPU. Broadwell-Y follows the same template, but the module had to shrink dramatically to fit into tablet-sized devices. The Broadwell module is 50% smaller in area than the Haswell version, as pictured on the right, and it's 30% shorter in the Z dimension, as well.
Yes, this is a dual-core x86 processor with two "big cores," integrated graphics, and a companion chipset. Kind of hard to believe, isn't it?
Reducing the SoC module's thickness required some ingenuity. The fully-integrated voltage regulator (FIVR) in Haswell and Broadwell allows for fast, fine-grained power state transitions on the chip, but it also requires the presence of external inductors on the SoC package that add height. To overcome this obstacle, the Broadwell team developed a workaround it calls the 3DL module. The inductors are placed on a small external PCB that hangs beneath the SoC module. To make room for the 3DL PCB, each motherboard has a hole cut into it, directly beneath the Broadwell module. This arrangement effectively "hides" the additional Z-height of the inductors and allows Broadwell-Y's total height to be almost 50% lower than the Haswell equivalent.
Interestingly, Jourdan shared the details of another FIVR workaround the Broadwell team had to implement. Because FIVR isn't very efficient at low voltages, they added a mode called LVR where FIVR essentially gets bypassed under the right conditions. The need for 3DL and LVR makes one wonder whether the level of VR integration in Broadwell makes sense for future generations of Intel SoCs.
Managing power and extending dynamic range
Intel's chip- and system-level dynamic power management capabilities are incredibly sophisticated these days. One of the key mechanisms, Turbo Boost, has added a new wrinkle so that Broadwell can fit into a new class of devices.
The smaller batteries in sub-nine-mm tablets can potentially be stressed into failure by short bursts of high power consumption from the CPU, so the Broadwell team had to design a mechanism to avoid such problems. The result is a new, more granular limit in this chip's dynamic voltage and frequency scaling algorithm known as PL3. The other limits will be familiar from past chips. PL1 is the long-term CPU power limit that the system can withstand without overheating. This limit is measured across minutes of operation. PL2 is the short-term burst limit used for temporary excursions to higher clocks—say, a quick trip to a faster clock frequency to improve responsiveness while loading a program. PL2 is measured in seconds. The new PL3 limit is monitored in milliseconds, to prevent instantaneous power use from damaging the device's battery.
The additional intelligence in Broadwell's Turbo Boost control complements the rest of Intel's power management mojo, which allows power sharing across the SoC die and manages the thermal behavior of the entire system.
Even with all the goodness of the 14-nm process and Broadwell's dynamic power management, driving SoC power from 6W to 3W while maintaining performance was probably out of reach without some additional help. Intel was bumping up against some basic limits in the physics of chip operation.
For one, the firm could only reduce Broadwell's operating voltage so much before the transistors would cease to work properly. Any home overclocker knows how crucial voltage is to ensuring stable CPU operation. This lower limit on voltage is a significant barrier to driving down power consumption in a chip like Broadwell with over a billion transistors.
You see, a chip's power draw is determined by a fairly simple equation that involves the clock frequency, the number of bits actively flipping, and the square of voltage—and that squared term means voltage tends to dominate the conversation. The Broadwell team could push its chip's clock speeds lower, but doing so would only result in linear reductions in power draw. Any time a portion of the chip is operating at low clock speeds and at the chip's minimum voltage level, it's just not being terribly efficient.
The Broadwell team's solution to this dilemma was to adopt a method known as duty cycling. With duty cycling, some portions of the chip are turned off entirely during certain clock cycles. Intel has used duty cycle throttling (DCT) for years to rein in its CPUs to prevent failures in the event of overheating.
Broadwell introduces a new mechanism called duty cycling control (DCC) that has a different aim. Broadwell's integrated graphics component takes up roughly a third of its die area, perhaps a little more, and DCC targets those graphics units. Working together, the SoC hardware and Intel's graphics driver can shut the IGP's execution units off entirely during some clock cycles, eliminating even leakage power. DCC kicks in when those execution units would otherwise be operating under inefficient conditions: at a low clock frequency where further voltage reductions aren't practical.
With a light graphics workload that only requires half of the IGP's horsepower, DCC might ensure that the IGP spends half its cycles turned off and the other half doing its work. Jourdan tells us Broadwell's integrated GPU has very low latency for switching on and off, which makes this mechanism practical. In fact, Broadwell's IGP has a range of DCC operating points ranging as low as 12.5% of the regular clock speed. At that lowest level, the graphics EUs are active for only one out of every eight clock cycles. They're powered down for the rest, even though the IGP may be drawing an animation on screen.
So that's another way the Broadwell team managed to shoehorn this chip into a much smaller power envelope. One can imagine that this technique could see extensive use in the future, as graphics hardware takes up an ever larger portion of the die area. What's more, since the SoC can share power across its die, some of the power reductions realized on the graphics side of the house with DCC can be used to enable Broadwell's CPU cores to run at higher frequencies, as well. So DCC offers an effective increase in dynamic operating range on both ends of the spectrum.
|Apple's A9 impresses and the Nexus strikes back: The TR Podcast 188||29|
|Microsoft acquires Havok physics engine from Intel||80|
|AMD unleashes mobile Tonga with the FirePro W7170M||13|
|Deals of the week: Crucial's MX200 500GB SSD and more||10|
|Report: TSMC makes around 6 in 10 Apple A9 SoCs||19|
|Mobile Quadros bring Maxwell to 15" and 17" workstations||4|
|Report: Amazon to halt sales of Chromecast and Apple TV||41|
|The Tech Report Podcast is live on Twitch||2|
|A billion Android devices could be vulnerable to Stagefright 2.0 bug||50|