Recall Moore's law, which essentially states that the number of transistors possible on a chip will double every 18 months. The corollaries of this law—chip clock speeds will go up, performance will rise, die sizes will shrink, etc.—are often discussed as if they were the law itself, but Moore's Law is primarily about transistor counts.
For years, the CPU guys have been taking advantage of Moore's law in various ways, by adding transistors and building more complex processors capable of doing more work per clock cycle, and by ratcheting up clock speeds to gain greater absolute performance. Intel has long been a profit-making machine built on the assumption that Moore's Law would continue to prove true. For some time now, doomsayers have been predicting the death of Moore's Law, speculating that physical limitations would conspire to slow down advancements in chip fabrication technology. These discussions are intriguing in theory but snooze inducing in practice, because so far, Moore continues to be right.
The more interesting question, perhaps, is what chip designers will do with their growing transistor budgets. Recently, a talk by Bob Colwell, one of the architects of the Intel P6 core (inside everything from the Pentium Pro to Pentium III), became available online, and it makes for very enlightening reading. Colwell expresses concern about the direction the CPU design world is moving, noting that processor architects seem to be having trouble taking advantage of growing transistor counts to get more performance.
Intel's new Prescott chip is emblematic of this problem. Prescott is a major overhaul of the Pentium 4 design, with a much deeper pipeline, many more transistors, larger caches, and hundreds of internal tweaks and additions. Prescott is slower clock for clock than its Northwood predecessor, runs hotter, and draws more power, although Intel may yet ramp Prescott up to higher clock speeds and get its power and heat problems in check. The Prescott P4 may still become a very decent product, but its problems are far from trivial.
The trouble is that CPU designers have already exploited many of the obvious tricks for extracting additional performance out of general-purpose microprocessors by adding transistors, and we may be approaching the end of the linear performance increases in CPU power to which we've become accustomed. The massive complexity of today's CPUs is becoming increasingly difficult to manage; already, CPU designers are moving toward more parallelism. Intel's Hyper-Threading is a start, and dual-core designs may be next, with a pair of processors glued together on a single die. However, software must be multithreaded in order to take advantage of symmetric multithreading and multiprocessing. Even then, the performance gains are of a different order. I like the creamy smooth computing experience that comes from multiple processors, but generally, a pair of 1GHz Athlons can't compare to a 2GHz Athlon 64.
I have some faith in the ability of the smart folks at companies like AMD, IBM, and Intel to overcome parts of these CPU design challenges, but the fact is, it's a tough road ahead.
Contrast that dilemma with what's happening in graphics, an area where GPU performance and capabilities have been skyrocketing over the past ten years. GPUs have followed Moore's predicted progression in a much more straightforward way than CPUs have. The original Radeon had about 30M transistors, the Radeon 8500 about 60M, and the Radeon 9700 roughly 110M. GPU clock speeds are up, too, but not by all that much, because GPUs don't need clock speed increases in order to deliver better performance. The graphics guys share some concerns with the CPU guys related to power dissipation and heat density, but they have the incredible benefit of working on a computing problem that is inherently and thoroughly parallelizable.
At the most basic level, upping GPU performance is simple: add more pipelines, get more power. The current market leader, the Radeon 9800XT, has eight pipes, and some rumors point to NVIDIA's upcoming NV40 having sixteen. GPUs can continue down this path for a good long time as Moore's Law allows.
However, GPUs are relatively young inventions, and they are still growing up. Graphics chips used to be very much fixed-function devices—collections of custom logic designed to perform certain tasks—so that if an engineer wanted a feature like bump mapping, he'd add bump mapping logic. Nowadays, GPUs have pixel shader units, and software types write shader programs to produce bump-mapping effects.
GPU still contain a fair amount of function-specific logic for the sake of speed and efficiency, but the trend over time should be toward more generality, relative simplicity, and more parallel processing power. In the generation of GPUs after the one just on the horizon—the post-DirectX 9 era—pixel and vertex shaders will merge, or at least begin to merge, sharing a common instruction set and possibly on-chip resources as well. I've heard whispers that Microsoft's Longhorn OS, the next big spin of Windows, will grant GPUs a real, unified storage hierarchy like CPUs have, complete with virtual memory. In short, GPUs are becoming highly parallel general-purpose processors, putting them on a collision course with CPUs. Already, researchers are looking into using DirectX 9-class GPUs as high-powered engines for scientific computing applications.
Similarly, as linear performance gains become more difficult to achieve, CPU guys will be irresistibly tempted to use their ballooning transistor budgets to incorporate more GPU-like capabilities into their processors. CPUs already have single-instruction multiple-data (SIMD) extensions like SSE2 and AltiVec, and Prescott's new SSE3 extensions include basic vertex shader instructions. CPUs are just a little pokey at graphics-style math because they don't have the internal parallelism GPUs do . . . yet.
When Intel and AMD run smack into ATI and NVIDIA, I predict big things . . . maybe a loud crash and twisted metal, or perhaps just mergers and acquisitions.