Single page Print

Re-thought integrated graphics and other improvements
The fact that the graphics processor is just another stop on the ring demonstrates how completely Sandy Bridge integrates its GPU. The graphics device shares not just main memory bandwidth but also the last-level cache with the CPU cores—and in some cases, it shares memory directly with those cores. Some memory is still dedicated solely to graphics, but the graphics driver can designate graphics streams to be cached and treated as coherent.

Inside the graphics engine, the big news isn't higher unit counts but more robust individual execution units. Recent Intel graphics solutions have claimed compatibility with the feature-rich DirectX 10 API, but they have used their programmable shaders to process nearly every sort of math required in the graphics pipeline. Dedicated, custom hardware can generally be faster and more efficient at a given task, though, which is why most GPUs still contain considerable amounts of graphics-focused custom hardware blocks—and why those Intel IGPs have generally underachieved.

For this IGP, Intel revised its approach, using dedicated graphics hardware throughout, wherever it made sense to do so. A new transcendental math capability, for instance, promises 4-20X higher performance than the older generation. Before, DirectX instructions would break down into two to four internal instructions in the IGP, but in Sandy Bridge, the relationship is generally one-to-one. A larger register file should facilitate the execution of more complex shaders, as well. Cumulatively, Intel estimates, the changes should add up to double the throughput per shader unit compared to the last generation. The first Sandy Bridge derivative will have 12 of those revised execution units, although I understand that number may scale up and down in other variants.

Like the prior gen, this IGP will be DirectX 10-compliant but won't support DX11's more advanced feature set with geometry tessellation and higher-precision datatypes.

Sandy Bridge's large last-level cache will be available to the graphics engine, and that fact purportedly will improve performance while saving power by limiting memory I/O transactions. We heard quite a bit of talk about the advantages of the cache for Sandy Bridge's IGP, but we're curious to see just how useful it proves to be. GPUs have generally stuck with relatively small caches since graphics memory access patterns tend to involve streaming through large amounts of data, making extensive caching impractical. Sandy Bridge's IGP may be able to use the cache well in some cases, but it could trip up when high degrees of antialiasing or anisotropic filtering cause the working data set to grow too large. We'll have to see about that.

We also remain rather skeptical about the prospects for Intel to match the standards of quality and compatibility set by the graphics driver development teams at Nvidia and AMD any time soon.

The concept is that the CPU will recognize when an intensive workload begins and ramp up the clock speed so the user gets "a lot more performance" for a relatively long period—we heard the time frame of 20 seconds thrown around.

One bit of dedicated hardware that's gotten quite a bit of attention on Sandy Bridge belongs to the IGP, and that's the video unit. This unit includes custom logic to accelerate the processing of H.264 video codecs, much like past Intel IGPs and competing graphics solutions, with the notable addition of an encoding capability as well as decoding. Using the encoding and decoding capabilities together opens the possibility of very high speed (and potentially very power-efficient) video transcoding, and Intel briefly demoed just that during the opening keynote. We heard whispers of speeds up to 10X or 20X that of a software-only solution.

Sandy Bridge's transcoding capabilities raise all sorts of funny questions. On one hand, using custom logic for video encoding as well as decoding makes perfect sense given current usage models, and it seems like a convenient way for Intel to poke a finger into the eye of competitors like AMD and Nvidia, whose GPGPU technologies have, to date, just one high-profile consumer application: video transcoding. On the other hand, this is Intel, bastion of CPUs and tailored instruction sets, embracing application-specific acceleration logic. I'm also a little taken aback by all of the excitement surrounding this feature, given that my mobile phone has the same sort of hardware.

Because the video codec acceleration is part of Sandy Bridge's IGP, it will be inaccessible to users of discrete video cards, including anyone using the performance enthusiast-oriented P-series chipsets. Several folks from Intel told us the firm is looking into possible options for making the transcoding hardware available to users of discrete graphics cards, but if that happens it all, it will likely happen some time after the initial Sandy Bridge products reach consumers.

One more piece of the Sandy Bridge picture worth noting is the expansion of thermal-sensor-based dynamic clock frequency scaling—better known as Turbo Boost—along a several lines. Although the Westmere dual-core processors had a measure of dynamic speed adjustment for the graphics component, the integration of graphics onto the same die has allowed much faster, finer-grained participation in the Turbo Boost scheme. Intel's architects talked of "moving power around" between the graphics and CPU cores as needed, depending on the constraints of the workloads. If, say, a 3D game doesn't require a full measure of CPU time but needs all the graphics performance it can get, the chip should respond by raising the graphics core's voltage and clock speed while keeping the CPU's power draw lower.

Furthermore, Intel claims Sandy Bridge should have substantially more headroom for peak Turbo Boost frequencies, although it remains coy about the exact numbers there. One indication of how expansive that headroom may be is a new twist on Turbo Boost aimed at improving system responsiveness during periods of high demand. The concept is that the CPU will recognize when an intensive workload begins and ramp up the clock speed so the user gets "a lot more performance" for a relatively long period—we heard the time frame of 20 seconds thrown around. With this feature, the workload doesn't have to use just one or two threads to qualify for the speed boost; the processor will actually operate above its maximum thermal rating, or TDP, for the duration of the period, so long as its on-die thermal sensors don't indicate a problem.

We worry that this feature may make computer performance even less deterministic than the first generation of Turbo Boost, and it will almost surely place a higher premium on good cooling. Still, the end result should be more responsive systems for users, and it's hard to argue with that outcome.TR

AMD's Ryzen Threadripper 1920X and Ryzen Threadripper 1950X CPUs reviewedI'm rubber, you're glue 43
AMD's Ryzen Threadripper 1950X, Threadripper 1920X, and Threadripper 1900X CPUs revealedAMD returns to the high-end desktop 109
AMD's Ryzen 3 1300X and Ryzen 3 1200 CPUs reviewedZen for everyone 122
Ryzen Pro platform brings a dash of Epyc to corporate desktopsZen puts on a suit and tie 28
AMD's Epyc 7000-series CPUs revealed Zen gets its data center marching orders 157
Intel's Core i9-7900X CPU reviewed, part oneVying for a perfect 10 169
AMD's Ryzen 5 CPUs reviewed, part twoGetting down to business 171
Intel's Core X-series CPUs and X299 platform revealedSkylake-X and Kaby Lake-X make their debut 245