Several new features
On this page, we intend to explain some of the important new features Nvidia has built into the GK104 or its software stack. However, in the interests of getting this review posted before our deadline, we've decided to put in a placeholder, a radically condensed version of the final product. Don't worry, we'll fix it later in software—like the R600's ROPs.
- GPU Boost — As evidenced by the various "turbo" schemes in desktop CPUs, dynamic voltage and frequency schemes are all the rage these days. The theory is straightforward enough. Not all games and other graphics workloads make use of the GPU in the same way, and even relatively "intensive" games may not cause all of the transistors to flip and thus heat up the GPU quite like the most extreme cases. As a result, there's often some headroom left in a graphics card's designated thermal envelope, or TDP (thermal design power), which is generally engineered to withstand a worst-case peak workload. Dynamic clocking schemes attempt to track this headroom and to take advantage of it by raising clock speeds opportunistically.
Although the theory is fairly simple, the various implementations of dynamic clocking vary widely in their specifics, which can make them hard to track. Intel's Turbo Boost is probably the gold standard at present; it uses a network of thermal sensors spread across the die in conjunction with a programmable, on-chip microcontroller that governs Turbo policy. Since it's a hardware solution with direct inputs from the die, Turbo Boost reacts very quickly to changes in thermal conditions, and its behavior may differ somewhat from chip to chip, since the thermal properties of the chips themselves can vary.
Although distinct from one another in certain ways, both AMD's Turbo Core (in its CPUs) and PowerTune (in its GPUs) combine on-chip activity counters with pre-production chip testing to establish a profile for each model. In use, power draw for the chip is then estimated based on the activity counters, and clocks are adjusted in response to the expected thermal situation. AMD argues the predictable, deterministic behavior of its DVFS schemes is an admirable trait. The price of that consistency is that it can't squeeze every last drop of performance out of each individual slab of silicon.
GPU Boost is essentially a first-generation crack at a dynamic clocking feature, and it combines some traits of each of the competing schemes. Fundamentally, the logic is more like the two Turbos than it is like AMD's PowerTune. With PowerTune, AMD runs its GPUs at a relatively high base frequency, but clock speeds are sometimes throttled back under atypically high GPU utilization. By contrast, GPU Boost starts with a more conservative base clock speed and ranges into higher frequencies when possible.
The inputs for Boost's decision-making algorithm include power draw, GPU and memory utilization, and GPU temperatures. Most of this information is collected from the GPU itself, but I believe the power use information comes from external circuitry on the GTX 680 board. In fact, Nvidia's Tom Petersen told us board makers will be required to include this circuitry in order to get the GPU maker's stamp of approval. The various inputs for Boost are then processed in software, in a portion of the GPU driver, not in an on-chip controller. The combination of software control and external power circuitry is likely responsible for Boost's relatively high clock-change latency. Stepping up or down in frequency takes about 100 milliseconds, according to Petersen. A tenth of a second is a very long time in the life of a gigahertz-class chip, and Petersen was frank in admitting that this first generation of GPU Boost isn't everything Nvidia hopes it will become in the future.
Graphics cards with Boost will be sold with a couple of clock speed numbers on the side. The base clock is the lower of the two—1006MHz on the GeForce GTX 680—and represents the lowest operating speed in thermally intensive workloads. Curiously enough, the "boost clock"—which is 1058MHz on the GTX 680—isn't the maximum speed possible. Instead, it's "sort of a promise," according to Petersen, the clock speed at which the GPU should run during typical operation. GPU Boost performance will vary slightly from card to card, based on factors like chip quality, ambient temperatures, and the effectiveness of the cooling solution. GTX 680 owners should expect to see their cards running at the Boost clock frequency as a matter of course, regardless of these factors. Beyond that, GPU Boost will make its best effort to reach even higher clock speeds when feasible, stepping up and down in increments of 13MHz.
Petersen demoed several interesting scenarios to illustrate Boost behavior. In a very power-intensive scene, 3DMark11's first graphics test, the GTX 680 was forced to remain at its base clock throughout. When playing Battlefield 3, meanwhile, the chip spent most of its time at about 1.1GHz—above both the base and boost levels. In a third application, the classic DX9 graphics demo "rthdribl," the GTX throttled back to under 1GHz, simply because additional GPU performance wasn't needed. One spot where Nvidia intends to make use of this throttling capability is in-game menu screens—and we're happy to see it. Some menu screens can cause power use and fan speeds to shoot skyward as frame rates reach quadruple digits.
Nvidia has taken pains to ensure GPU Boost is compatible with user-driven tweaking and overclocking. A new version of its NVAPI allows third-party software, like EVGA's slick Precision software, control over key Boost parameters. With Precision, the user may raise the GPU's maximum power limit by as much as 32% above the default, in order to enable operation at higher clock speeds. Interestingly enough, Petersen said Nvidia doesn't consider cranking up this slider overclocking, since its GPUs are qualified to work properly at every voltage-and-frequency point along the curve. (Of course, you could exceed the bounds of the PCIe power connector specification by cranking this slider, so it's not exactly 100% kosher.) True overclocking happens by grabbing hold of a separate slider, the GPU clock offset, which raises the chip's frequency at a given voltage level. An offset of +200MHz, for instance, raised our GTX 680's clock speed while running Skyrim from 1110MHz (its usual Boost speed) to 1306MHz. EVGA's tool allows GPU clock offsets as high as +549MHz and memory clock offsets up to +1000MHz, so users are given quite a bit of leeway for experimentation.
Although GPU Boost is only in its first incarnation, Nvidia has some big ideas about how to take advantage of these dynamic clocking capabilities. For instance, Petersen openly telegraphed the firm's plans for future versions of Boost to include control over memory speeds, as well as GPU clocks.
More immediately, one feature exposed by EVGA's Precision utility is frame-rate targeting. Very simply, the user is able to specify his desired frame rate with a slider, and if the game's performance exceeds that limit, the GPU steps back down the voltage-and-frequency curve in order to conserve power. We were initially skeptical about the usefulness of this feature for one big reason: the very long latency of 100 ms for clock speed adjustments. If the GPU has dialed back its speed because the workload is light and then something changes in the game—say, an explosion that adds a bunch of smoke and particle effects to the mix—ramping the clock back up could take quite a while, causing a perceptible hitch in the action. We think that potential is there, and as a result, we doubt this feature will appeal to twitch gamers and the like. However, in our initial playtesting of this feature, we've not noticed any problems. We need to spend more time with it, but Kepler's frame rate targeting may prove to be useful, even in this generation, so long as its clock speed leeway isn't too wide. At some point in the future, when the GPU's DVFS logic is moved into hardware and frequency change delays are measured in much smaller numbers, we expect features like this one to become standard procedure, especially for mobile systems.
- Adaptive vsync — Better than dumb vsync.
- TXAA — Quincunx 2.0, or Nvidia erects a narrower tent.
- Bindless textures — Megatexturing in hardware, but not for DX11.
- NVENC — Hardware video encoding, or right back atcha, QuickSync.
- Display output improvement — Eye-nvidi-ty.