Single page Print

Details of ATI's Xbox 360 GPU unveiled


A peek inside the monster new console GPU
— 12:46 PM on May 19, 2005

WITH MICROSOFT'S OFFICIAL announcement of the next-generation Xbox 360 console this week, ATI has decided to disclose some of the architectural details of the graphics processor that it created for the system. I had a brief but enlightening conversation with Bob Feldstein, Vice President of Engineering at ATI, who helped oversee the Xbox 360 GPU project. He spelled out some of the GPU's details for me, and they're definitely intriguing.

Feldstein said that ATI and Microsoft developed this chip together in the span of two years, and that they worked "from the ground up" to do a console product. He said that Microsoft was a very good partner with some good chip engineers who understood the problems of doing a non-PC system design. Also, because the part was custom created for a game console, it could be designed specifically for delivering a good gaming experience as part of the Xbox 360 system.

Unified shaders
Feldstein cited several major areas of innovation where the Xbox 360 GPU breaks new ground. The first of those is the chip's unified shader array, which does away with separate vertex and pixel shaders in favor of 48 parallel shaders capable of operating on data for both pixels and vertices. The GPU can dynamically allocate shader resources as necessary in order to best address a computational constraint, whether that constraint is vertex- or pixel-related.

This sort of graphics architecture has been rumored as a future possibility for some time, but ATI worried that using unified shaders might cause some efficiency loss. To keep all of the shader units utilized as fully as possible, the design team created a complex system of hardware threading inside the chip itself. In this case, each thread is a program associated with the shader arrays. The Xbox 360 GPU can manage and maintain state information on 64 separate threads in hardware. There's a thread buffer inside the chip, and the GPU can switch between threads instantaneously in order to keep the shader arrays busy at all times.

This internal complexity allows for efficient use of the GPU's computational resources, but it's also completely hidden from software developers, who need only to write their shader programs without worrying about the details of the chip's internal thread scheduling.


A block diagram of the Xbox 360 GPU. Source: ATI.

On chip, the shaders are organized in three SIMD engines with 16 processors per unit, for a total of 48 shaders. Each of these shaders is comprised of four ALUs that can execute a single operation per cycle, so that each shader unit can execute four floating-point ops per cycle.

These shaders execute a new unified instruction set that incorporates instructions for both vertex and pixel operations. In fact, Feldstein called it a "very general purpose instruction set" with some of the same roots as the DirectX instruction set. Necessarily, the shader language that developers will use to program these shader units will be distinct from the shader models currently used in DirectX 9, including Shader Model 3.0. Feldstein described it as "beyond 3.0." This new shader language allows for programs to contain an "infinite" number of instructions with features such as branching, looping, indirect branching, and predicated indirect. He said developers are already using shader programs with hundreds of instructions in them.

I asked Feldstein whether the shaders themselves are, at the hardware level, actually more general than those in current graphics chips, because I expected that they would still contain a similar amount of custom logic to speed up common graphics operations. To my surprise, he said that the shaders are more general in hardware. At the outset of the project, he said, ATI hired a number of compiler experts in order to make sure everything would work right, and he noted that Microsoft is no slouch when it comes to compilers, either. Feldstein said Microsoft "made a great compiler for it."

At this point, Feldstein paused quickly to note that this GPU was not a VLIW machine, apparently reminded of all of the compiler talk surrounding a certain past competitor. (The GeForce FX was, infamously, a VLIW machine with some less-than-desirable performance characteristics, including an extreme sensitivity to compiler instruction tuning.) He was quite confident that the Xbox 360 GPU will not suffer from similar problems, and he claimed the relative abundance of vertex processing power in this GPU should allow objects like fur, feathers, hair, and cloth to look much better than past technology had allowed. Feldstein also said that character skin should look great, and he confirmed to me that real-time subsurface scattering effects should be possible on the Xbox 360.

The Xbox 360 GPU's unified shader model pays dividends in other places, as well. In traditional pixel shaders, he noted, any shader output is generally treated as a pixel, and it's fed through the rest of the graphics pipeline after being operated on by the shader. By contrast, the Xbox 360 GPU can take data output by the shaders, unaltered by the rest of the graphics pipeline, and reprocess it. This more efficient flow of data, combined with a unified instruction set for vertex and pixel manipulation, allows easier implementation of some important graphics algorithms in real time, including higher-order surfaces and global illumination. I would expect to see fluid animation of complex terrain and extensive use of displacement mapping in Xbox 360 games. Feldstein also pointed out that this GPU should have sufficient muscle to enable the real-time use of other complex shader algorithms as they're invented.