Single page Print

Oh, right: architecture!
Intel's tick-tock process typically confines major CPU architecture changes to the second chip produced on a new process technology, but that rigid segmentation seems to be leaking a bit over time as Intel pursues its goal of credibility—er, dominance?—in the mobile space.

Broadwell's CPU cores have received a number of tweaks over Haswell's, with the net effect of increasing instruction throughput per clock by about five percent, generally speaking. In keeping with Broadwell's mobile focus, Intel's architects set a high standard for any added features in this revision of the architecture: a new feature must contribute 2% more performance for every 1% of added power use. In the past, any gain better than 1:1 might have been acceptable, but not so this time.

That said, the list of performance-enhancing changes to Broadwell's core still has quite a few familiar-sounding items. The expanded transistor budget at 14-nm has allowed for larger structures in many cases: a bigger out-of-order scheduler, a 50% larger TLB for the L2 cache, and a new, dedicated L2 TLB for 1GB pages. Also, a second unit can now handle TLB page misses in parallel with the primary one. With all of the TLB enhancements, it should be no surprise that virtualization round trips are supposedly quicker.

Of course, the ubiquitous "improved branch prediction" line-item is present, but Intel hasn't disclosed any details of how it's achieved more accurate predictions.

Broadwell has a fewer beefier execution units, too. The floating-point multiplier's latency has dropped from five to three cycles. There's a new Radix-1024 divider, and vector gather operations are now faster. Certain cryptography-specific instructions now execute more quickly, as well.

The changes to Broadwell's graphics and media architecture are arguably even more sweeping. Here's a quick but still daunting overview of the new arrangement.

A logical block diagram of Broadwell's integrated graphics. Source: Intel.

The most notable change in Broadwell-Y's IGP is an increase in the number of modular "slices" of graphics resources included—three here, versus two in Haswell GT2. Each slice has its own L1 cache, texture cache, and texture sampling/filtering hardware, so Broadwell is up 50% on those fronts versus the prior generation.

Meanwhile, the number of graphics execution units per slice has dropped a bit, from 10 to eight. Broadwell therefore has a total of 24 graphics EUs and 192 stream processors. By contrast, Haswell has 20 EUs and 160 SPs. The overall trajectory in terms of graphics units is northward, but Broadwell tilts the balance toward more texturing and sampling hardware.

The graphics microarchitecture in Broadwell has changed, too, with tweaks to improve geometry throughput and Z- and pixel-fill rates. This hardware officially supports the latest APIs, including DirectX 11.2, OpenGL 4.3, and, at last, OpenCL 2.0 with shared virtual memory for GPU computing.

Without IGP clock speeds, which we don't yet know, we can't really make any assessments about how Broadwell compares to Haswell or to competitors like AMD's Kaveri. As with Haswell, we'd expect to see a beefier GT3 version of Broadwell graphics eventually, likely on a quad-core die and occasionally paired with an external eDRAM chip for much higher throughput.

The addition of more samplers and stream processors directly benefits the IGP's media processing capabilities. Intel claims Broadwell's video engine can achieve up to double the throughput of its predecessor, and it says the QuickSync video transcoding engine in the chip has improved in terms of performance and output quality.

Since Broadwell's display block can drive 4K displays, the chip's ability to handle 4K-class video processing is a live issue. Rather than decode the new 4K-oriented H.265 standard entirely in hardware, Broadwell will take what Intel calls a hybrid approach, using some fixed-function hardware in conjunction with the graphics EUs to process H.265 video. The firm claims H.265 decoding on Broadwell-Y is "fast enough for 4K" with no caveats, and it says H.265 encoding is sufficient for 4K resolutions at 30 Hz. That's not too bad, all things considered, although I wouldn't expect H.265 processing to be terribly power efficient given the involvement of the graphics EUs.

A new chipset: Broadwell PCH-LP
Although it looks to be about the same size as the prior version and is manufactured on an older 32-nm process, Broadwell's platform controller hub is new silicon, too. The PCH-LP will accompany Broadwell-Y in low-power, fanless systems.

The most dramatic changes here versus last year's model have to do with power efficiency. Intel's designers have added more power gating around the PCH chip, resulting in a 25% reduction in idle power draw. Active power use is down by about 20% versus the Haswell PCH-LP, as well, and the firm has built a collection of firmware and software updates that enable the PCH to do fine-grained monitoring of power use.

Feature-wise, the PCH has gotten an upgrade in the audio DSP department, with more SRAM and MIPS than before. As with everything else, Intel expects the improved audio hardware to conserve power at the end of the day. The other feature of note is the welcome addition of support for PCIe-based storage.