Feldstein said that ATI and Microsoft developed this chip together in the span of two years, and that they worked “from the ground up” to do a console product. He said that Microsoft was a very good partner with some good chip engineers who understood the problems of doing a non-PC system design. Also, because the part was custom created for a game console, it could be designed specifically for delivering a good gaming experience as part of the Xbox 360 system.
Feldstein cited several major areas of innovation where the Xbox 360 GPU breaks new ground. The first of those is the chip’s unified shader array, which does away with separate vertex and pixel shaders in favor of 48 parallel shaders capable of operating on data for both pixels and vertices. The GPU can dynamically allocate shader resources as necessary in order to best address a computational constraint, whether that constraint is vertex- or pixel-related.
This sort of graphics architecture has been rumored as a future possibility for some time, but ATI worried that using unified shaders might cause some efficiency loss. To keep all of the shader units utilized as fully as possible, the design team created a complex system of hardware threading inside the chip itself. In this case, each thread is a program associated with the shader arrays. The Xbox 360 GPU can manage and maintain state information on 64 separate threads in hardware. There’s a thread buffer inside the chip, and the GPU can switch between threads instantaneously in order to keep the shader arrays busy at all times.
This internal complexity allows for efficient use of the GPU’s computational resources, but it’s also completely hidden from software developers, who need only to write their shader programs without worrying about the details of the chip’s internal thread scheduling.
On chip, the shaders are organized in three SIMD engines with 16 processors per unit, for a total of 48 shaders. Each of these shaders is comprised of four ALUs that can execute a single operation per cycle, so that each shader unit can execute four floating-point ops per cycle.
These shaders execute a new unified instruction set that incorporates instructions for both vertex and pixel operations. In fact, Feldstein called it a “very general purpose instruction set” with some of the same roots as the DirectX instruction set. Necessarily, the shader language that developers will use to program these shader units will be distinct from the shader models currently used in DirectX 9, including Shader Model 3.0. Feldstein described it as “beyond 3.0.” This new shader language allows for programs to contain an “infinite” number of instructions with features such as branching, looping, indirect branching, and predicated indirect. He said developers are already using shader programs with hundreds of instructions in them.
I asked Feldstein whether the shaders themselves are, at the hardware level, actually more general than those in current graphics chips, because I expected that they would still contain a similar amount of custom logic to speed up common graphics operations. To my surprise, he said that the shaders are more general in hardware. At the outset of the project, he said, ATI hired a number of compiler experts in order to make sure everything would work right, and he noted that Microsoft is no slouch when it comes to compilers, either. Feldstein said Microsoft “made a great compiler for it.”
At this point, Feldstein paused quickly to note that this GPU was not a VLIW machine, apparently reminded of all of the compiler talk surrounding a certain past competitor. (The GeForce FX was, infamously, a VLIW machine with some less-than-desirable performance characteristics, including an extreme sensitivity to compiler instruction tuning.) He was quite confident that the Xbox 360 GPU will not suffer from similar problems, and he claimed the relative abundance of vertex processing power in this GPU should allow objects like fur, feathers, hair, and cloth to look much better than past technology had allowed. Feldstein also said that character skin should look great, and he confirmed to me that real-time subsurface scattering effects should be possible on the Xbox 360.
The Xbox 360 GPU’s unified shader model pays dividends in other places, as well. In traditional pixel shaders, he noted, any shader output is generally treated as a pixel, and it’s fed through the rest of the graphics pipeline after being operated on by the shader. By contrast, the Xbox 360 GPU can take data output by the shaders, unaltered by the rest of the graphics pipeline, and reprocess it. This more efficient flow of data, combined with a unified instruction set for vertex and pixel manipulation, allows easier implementation of some important graphics algorithms in real time, including higher-order surfaces and global illumination. I would expect to see fluid animation of complex terrain and extensive use of displacement mapping in Xbox 360 games. Feldstein also pointed out that this GPU should have sufficient muscle to enable the real-time use of other complex shader algorithms as they’re invented.
Now that we’ve delved into the shaders a bit, we should take a step back and look at the bigger picture. The Xbox 360 GPU not only packs a lot of shader power, but it’s also the central hub in the Xbox 360 system, acting as the main memory controller as well as the GPU. The Xbox 360 has 512MB of GDDR3 memory onboard running at 700MHz, with a 128-bit interface to ATI’s memory controller. The ATI GPU, in turn, has a very low latency path to the Xbox 360’s three IBM CPU cores. This link has about 25GB/s of bandwidth. Feldstein said the graphics portion of the chip has something of a crossbar arrangement for getting to memory, but he didn’t know whether the CPU uses a similar scheme.
Embedded DRAM for “free” antialiasing
The GPU won’t be using system memory itself quite as much as one might expect, because it packs 10MB of embedded DRAM right on the package. In fact, the Xbox 360 GPU is really a two-die design, with two chips in a single package on a single substrate. The parent die contains the GPU and memory controller, while the daughter die consists of the 10MB of eDRAM and some additional logic. There’s a high-speed 2GHz link between the parent and daughter dies, and Feldstein noted that future revisions of the GPU might incorporate both dies on a single piece of silicon for cost savings.
The really fascinating thing here is the design of that daughter die. Feldstein called it a continuation of the traditional graphics pipeline into memory. Basically, there’s a 10MB pool of embedded DRAM, designed by NEC, in the center of the die. Around the outside is a ring of logic designed by ATI. This logic is made up of 192 component processors capable of doing the basic math necessary for multisampled antialiasing. If I have it right, the component processors should be able to process 32 pixels at once by operating on six components per pixel: red, green, blue, alpha, stencil, and depth. This logic can do the resolve pass for multisample antialiasing right there on the eDRAM die, giving the Xbox 360 the ability to do 4X antialiasing on a high-definition (1280×768) image essentially for “free”i.e., with no appreciable performance penalty. The eDRAM holds the contents of all of the back buffers, does the resolve, and hands off the resulting image into main system memory for scan-out to the display.
Feldstein noted that this design is efficient from a power-savings standpoint, as well, because there’s much less memory I/O required when antialiasing can be handled on the chip. He said ATI was very power-conscious in the design of the chip, so that the Xbox 360 could be a decent citizen in the living room.
My conversation with Bob Feldstein about the Xbox 360 GPU was quick but, obviously, very compact, with lots of information. I hope that I’ve gotten everything right, but I expect we will learn more and sharpen up some of these details in the future. Nonetheless, ATI was very forthcoming about the technology inside its Xbox 360 GPU, and I have to say that it all sounds very promising.
For those of you wondering how the Xbox 360 GPU relates to ATI’s upcoming PC graphics chips, I wish I could tell you, but I can’t. Feldstein said the Xbox 360 GPU “doesn’t relate” to a PC product. Some of elements of the design seem impractical for PC use, like the 10MB of embedded DRAM for antialiasing; PCs don’t use one single, standard resolution like HDTVs do. Still, it’s hard to imagine ATI having some of this technology in its portfolio and not using it elsewhere at some point.