Details of ATI’s Xbox 360 GPU unveiled

WITH MICROSOFT’S OFFICIAL announcement of the next-generation Xbox 360 console this week, ATI has decided to disclose some of the architectural details of the graphics processor that it created for the system. I had a brief but enlightening conversation with Bob Feldstein, Vice President of Engineering at ATI, who helped oversee the Xbox 360 GPU project. He spelled out some of the GPU’s details for me, and they’re definitely intriguing.

Feldstein said that ATI and Microsoft developed this chip together in the span of two years, and that they worked “from the ground up” to do a console product. He said that Microsoft was a very good partner with some good chip engineers who understood the problems of doing a non-PC system design. Also, because the part was custom created for a game console, it could be designed specifically for delivering a good gaming experience as part of the Xbox 360 system.

Unified shaders
Feldstein cited several major areas of innovation where the Xbox 360 GPU breaks new ground. The first of those is the chip’s unified shader array, which does away with separate vertex and pixel shaders in favor of 48 parallel shaders capable of operating on data for both pixels and vertices. The GPU can dynamically allocate shader resources as necessary in order to best address a computational constraint, whether that constraint is vertex- or pixel-related.

This sort of graphics architecture has been rumored as a future possibility for some time, but ATI worried that using unified shaders might cause some efficiency loss. To keep all of the shader units utilized as fully as possible, the design team created a complex system of hardware threading inside the chip itself. In this case, each thread is a program associated with the shader arrays. The Xbox 360 GPU can manage and maintain state information on 64 separate threads in hardware. There’s a thread buffer inside the chip, and the GPU can switch between threads instantaneously in order to keep the shader arrays busy at all times.

This internal complexity allows for efficient use of the GPU’s computational resources, but it’s also completely hidden from software developers, who need only to write their shader programs without worrying about the details of the chip’s internal thread scheduling.


A block diagram of the Xbox 360 GPU. Source: ATI.

On chip, the shaders are organized in three SIMD engines with 16 processors per unit, for a total of 48 shaders. Each of these shaders is comprised of four ALUs that can execute a single operation per cycle, so that each shader unit can execute four floating-point ops per cycle.

These shaders execute a new unified instruction set that incorporates instructions for both vertex and pixel operations. In fact, Feldstein called it a “very general purpose instruction set” with some of the same roots as the DirectX instruction set. Necessarily, the shader language that developers will use to program these shader units will be distinct from the shader models currently used in DirectX 9, including Shader Model 3.0. Feldstein described it as “beyond 3.0.” This new shader language allows for programs to contain an “infinite” number of instructions with features such as branching, looping, indirect branching, and predicated indirect. He said developers are already using shader programs with hundreds of instructions in them.

I asked Feldstein whether the shaders themselves are, at the hardware level, actually more general than those in current graphics chips, because I expected that they would still contain a similar amount of custom logic to speed up common graphics operations. To my surprise, he said that the shaders are more general in hardware. At the outset of the project, he said, ATI hired a number of compiler experts in order to make sure everything would work right, and he noted that Microsoft is no slouch when it comes to compilers, either. Feldstein said Microsoft “made a great compiler for it.”

At this point, Feldstein paused quickly to note that this GPU was not a VLIW machine, apparently reminded of all of the compiler talk surrounding a certain past competitor. (The GeForce FX was, infamously, a VLIW machine with some less-than-desirable performance characteristics, including an extreme sensitivity to compiler instruction tuning.) He was quite confident that the Xbox 360 GPU will not suffer from similar problems, and he claimed the relative abundance of vertex processing power in this GPU should allow objects like fur, feathers, hair, and cloth to look much better than past technology had allowed. Feldstein also said that character skin should look great, and he confirmed to me that real-time subsurface scattering effects should be possible on the Xbox 360.

The Xbox 360 GPU’s unified shader model pays dividends in other places, as well. In traditional pixel shaders, he noted, any shader output is generally treated as a pixel, and it’s fed through the rest of the graphics pipeline after being operated on by the shader. By contrast, the Xbox 360 GPU can take data output by the shaders, unaltered by the rest of the graphics pipeline, and reprocess it. This more efficient flow of data, combined with a unified instruction set for vertex and pixel manipulation, allows easier implementation of some important graphics algorithms in real time, including higher-order surfaces and global illumination. I would expect to see fluid animation of complex terrain and extensive use of displacement mapping in Xbox 360 games. Feldstein also pointed out that this GPU should have sufficient muscle to enable the real-time use of other complex shader algorithms as they’re invented.

System architecture
Now that we’ve delved into the shaders a bit, we should take a step back and look at the bigger picture. The Xbox 360 GPU not only packs a lot of shader power, but it’s also the central hub in the Xbox 360 system, acting as the main memory controller as well as the GPU. The Xbox 360 has 512MB of GDDR3 memory onboard running at 700MHz, with a 128-bit interface to ATI’s memory controller. The ATI GPU, in turn, has a very low latency path to the Xbox 360’s three IBM CPU cores. This link has about 25GB/s of bandwidth. Feldstein said the graphics portion of the chip has something of a crossbar arrangement for getting to memory, but he didn’t know whether the CPU uses a similar scheme.

Embedded DRAM for “free” antialiasing
The GPU won’t be using system memory itself quite as much as one might expect, because it packs 10MB of embedded DRAM right on the package. In fact, the Xbox 360 GPU is really a two-die design, with two chips in a single package on a single substrate. The parent die contains the GPU and memory controller, while the daughter die consists of the 10MB of eDRAM and some additional logic. There’s a high-speed 2GHz link between the parent and daughter dies, and Feldstein noted that future revisions of the GPU might incorporate both dies on a single piece of silicon for cost savings.

The really fascinating thing here is the design of that daughter die. Feldstein called it a continuation of the traditional graphics pipeline into memory. Basically, there’s a 10MB pool of embedded DRAM, designed by NEC, in the center of the die. Around the outside is a ring of logic designed by ATI. This logic is made up of 192 component processors capable of doing the basic math necessary for multisampled antialiasing. If I have it right, the component processors should be able to process 32 pixels at once by operating on six components per pixel: red, green, blue, alpha, stencil, and depth. This logic can do the resolve pass for multisample antialiasing right there on the eDRAM die, giving the Xbox 360 the ability to do 4X antialiasing on a high-definition (1280×768) image essentially for “free”—i.e., with no appreciable performance penalty. The eDRAM holds the contents of all of the back buffers, does the resolve, and hands off the resulting image into main system memory for scan-out to the display.

Feldstein noted that this design is efficient from a power-savings standpoint, as well, because there’s much less memory I/O required when antialiasing can be handled on the chip. He said ATI was very power-conscious in the design of the chip, so that the Xbox 360 could be a decent citizen in the living room.

Conclusions
My conversation with Bob Feldstein about the Xbox 360 GPU was quick but, obviously, very compact, with lots of information. I hope that I’ve gotten everything right, but I expect we will learn more and sharpen up some of these details in the future. Nonetheless, ATI was very forthcoming about the technology inside its Xbox 360 GPU, and I have to say that it all sounds very promising.

For those of you wondering how the Xbox 360 GPU relates to ATI’s upcoming PC graphics chips, I wish I could tell you, but I can’t. Feldstein said the Xbox 360 GPU “doesn’t relate” to a PC product. Some of elements of the design seem impractical for PC use, like the 10MB of embedded DRAM for antialiasing; PCs don’t use one single, standard resolution like HDTVs do. Still, it’s hard to imagine ATI having some of this technology in its portfolio and not using it elsewhere at some point.

Comments closed
    • mas
    • 15 years ago

    I just keep getting more excited about X360 and PS3.. though X360 seems to be the more thought out solution. We’ll see ๐Ÿ™‚

    lets hope we can afford both + all the games…

    • Sjoerd
    • 15 years ago

    “Unified shaders […] doesn’t relate to the PC”…

    Hogwash. This is EXACTLY what DirectX Next (i.e., the Longhorn version) is all about!

    • Dauntless
    • 15 years ago

    embedded DRAM?

    Wasn’t that what Rendition promised waaay back in the dark ages of 3d video cards (ca 1996 or maybe 1997) that was going to be in the Rendition 3 chipset? I can’t remember who bought out Rendition, but whoever did shelved that technology. Was it perchance NEC who bought out Rendition?

      • Wulvor
      • 15 years ago

      I think Bitboys were also big on embedded DRAM for their chip designs, but at the time the cast was enormous, or something.

      • UberGerbil
      • 15 years ago

      IIRC they were going to put a cache/framebuffer in the chipset because they had access to a smaller process node and they couldn’t think of anything better to do with the extra transistors that gave them. They got killed by companies who could (such as, oh, adding 3D functionality).

      • Trident Troll
      • 15 years ago

      Micron bought Rendition. They used the technology for their Yukon chips. But no one showed any interest in them.

    • droopy1592
    • 15 years ago

    So what’s all this smack about developers saying the PS3 is mo’ powerful. 48 shaders is nothing to sneeze at. It’s got to be a bit more than twice as fast as anything out now.

    • sbarash
    • 15 years ago

    Fantastic Scott!!! Way to go…

    -Stephen

      • spuppy
      • 15 years ago

      It’s not like this story was exclusive…

        • eitje
        • 15 years ago

        your right – we should save our praise for really important things that are TR exclusive.

          • spuppy
          • 15 years ago

          I only praise TR when they rip on Apple ๐Ÿ˜‰

    • blitzy
    • 15 years ago

    how the tits you can get your head around all that enough to have a conversation is beyond me

    i think ill wait and let the images speak for themselves

    • spuppy
    • 15 years ago

    At 720p (which all Xenon games are specified to support), the backbuffer at 4XAA still won’t fit within 10MB… It would be over 14 MB. So I don’t think FSAA will be ‘free’ on the Xenon

      • tEd
      • 15 years ago

      that’s because you don’t know how it works ๐Ÿ˜‰

        • spuppy
        • 15 years ago

        Oh… Please explain it to me!

          • daniel4
          • 15 years ago

          Not an explanation, but according to ATI it is “almost” free.

          ยง[<http://firingsquad.com/features/xbox_360_interview/page4.asp<]ยง

          • tEd
          • 15 years ago

          The key is that most AA pixel samples can be 100% compressed and only the edge pixels need a additonal memory footprint which some of them still can be compressed to like 50 or 25%.

            • spuppy
            • 15 years ago

            Thank for clearing that up ๐Ÿ™‚

            me = noob

      • DukenukemX
      • 15 years ago

      I thought the 10 MB of memory was used like the Hyper Memory technology.

      Except done better.

        • lethal
        • 15 years ago

        i’ts more like the caches of normal cpus, but bigger.

          • UberGerbil
          • 15 years ago

          /[

          • Sjoerd
          • 15 years ago

          Not a cache, more like a very, very fast scratchpad.

    • JavaDog
    • 15 years ago

    Great article! Lots of interesting info.

    These consoles (360 and P3) are really shaping up to have a good war. It is actually pretty exciting…

    • PerfectCr
    • 15 years ago

    NICE! Can’t wait to see it in action. Benchies?

    fp ๐Ÿ˜›

      • eitje
      • 15 years ago

      oh, and can it fold – don’t forget that valuable question ๐Ÿ™‚

Pin It on Pinterest

Share This