Single page Print

Sweet sixteen—times two
At the heart of the A8R32-MVP Deluxe is brand new core logic from ATI in the form of the CrossFire Xpress 3200 north bridge. Code-named RD580, this new north bridge chip brings to CrossFire what NVIDIA's nForce4 SLI X16 has brought to SLI: dual 16-lane PCI Express graphics slots on a single motherboard. Unlike NVIDIA's solution, though, ATI's new north bridge is a single chip with a whopping total of 40 lanes of PCI Express connectivity built into it. Sixteen of those lanes are dedicated to connectivity for the primary graphics slot, sixteen for the secondary slot, four for other types of peripherals, and four for the interconnect with the chipset's south bridge I/O chip. As you can see on the diagram below, each of the four PCI-E lanes for peripherals has a port associated with it, so those lanes can link as many as four separate PCI-E x1 devices to the system. Alternately, those lanes could be aggregated into two connections of two lanes each, one single PCI-E x4 connection, or a few other possible configurations.

A block diagram of the CrossFire Xpress 3200. Source: ATI.

ATI likes to point out the superiority of its single-chip approach to dual 16-lane PCI Express graphics logic. The CrossFire Xpress 3200's only real competitor, the nForce4 SLI X16, employs a pair of chips with a single PCI x16 connection hanging off of each one. Any data traveling from one graphics card to the next in this scheme must travel over a chip-to-chip interconnect, introducing a little bit of additional latency and possible bandwidth penalties.

NVIDIA has complicated this issue somewhat by being cagey about the exact amount of bandwidth offered by its chip-to-chip interconnect on the nForce4 SLI X16, initially referring to it in slippery language that avoided exact bandwidth figures. NVIDIA also admits that it made a mistake in its nForce4 SLI X16 reference kit by including a BIOS that inadvertently set the default speed of the chip-to-chip interconnect lower than its full potential. As a result, some of the first nForce4 SLI X16 motherboards, such as the Asus A8N32-SLI Deluxe, shipped with less than optimal default configurations. (When we tested the A8N32-SLI Deluxe in our review of the board, we manually set the BIOS to the correct values.)

When configured optimally, however, the nForce4 SLI X16's HyperTransport interconnect is 16 bits wide, runs at 1GHz, uses HyperTransport's twice-per-clock data transfer capabilities, and is bidirectional. That works out to a total of 8GB/s of bandwidth, or 4GB/s in each direction—exactly equal to the theoretical peak bandwidth of a PCI Express X16 slot, excluding the overhead for data transfer protocols on each type of link. At least in theory, the nForce4 SLI X16's interconnect ought to be adequate to the task of carrying data between two PCI-E x16 graphics cards. As for the latency penalty associated with using the interconnect, well, latencies on HyperTransport links are probably best measured in nanoseconds, while frame render times on GPUs are typically measured in milliseconds. So, in theory, latency really shouldn't be an issue, either.

Graphics data traveling over the nForce4 SLI X16's chipset interconnect will have to contend with other types of data for bandwidth, including audio streams, disk I/O, and network traffic. In worst-case scenarios, congestion could become a problem. HyperTransport includes measures to manage traffic, reserve bandwidth, and reduce congestion, though. I'm not convinced that NVIDIA's dual-chip approach is necessarily a significant performance drag, given the basics involved.

That said, ATI's single-chip approach undoubtedly has its theoretical advantages. The CrossFire Xpress 3200 chip includes a dedicated, high-speed PCI Express x16 link between the two graphics slots that runs asynchronously to the rest of the chip. ATI has even conjured up a marketing name for this data pathway. Behold the Xpress Route:

This dedicated logic is capable of DMA data transfers, and it leaves little ambiguity about its suitability to the task at hand.

Even more difficult than sorting out the significance of differences between ATI and NVIDIA interconnect bandwidth may be the task of finding a use for it all. In high-end dual graphics setups, most information is shared between GPUs by means of a dedicated connector—the SLI bridge in the case of GeForces and the CrossFire dongle for Radeons. That leaves PCI Express to handle the traditional communication between host system and GPU, as well as any inter-GPU communication that doesn't involve the SLI or CrossFire compositing engines.

We found little performance difference between the nForce4 SLI and the nForce4 SLI X16 in our initial look at the A8N32-SLI Deluxe, except for the special case of NVIDIA's SLI antialiasing mode. The SLI compositing engine on current GeForce GPUs doesn't pass sample data for this AA mode over the SLI link, and the additional PCI-E bandwidth can boost frame rates as a result. I'm still not sold on the value of SLI AA given the limitations of current GeForce GPUs' antialiasing hardware, but at least this is one case where the nForce4 SLI X16's additional PCI-E connectivity can be of help.

The compositing engine for ATI's CrossFire, on the other hand, was built with CrossFire's SuperAA mode in mind and does the blend between the two card's antialiased images right on the compositing chip. In order to find a use for dual PCI-E x16's extra bandwidth, ATI instead points to its lower end graphics cards, which eschew a separate compositing engine, a CrossFire dongle, and the need for a CrossFire-edition "master" card in favor of a totally PCI Express-based scheme. Even ATI's mid-range Radeon X1600 XT goes commando on the compositing engine. Here, the secondary card transfers all of its rendered frames to the primary graphics card over—you guessed it—PCI Express. In this not-too-special case, the CrossFire Xpress 3200's sixteen lanes per card should definitely lead to higher frame rates.

There are other reasons why even high-end dual-GPU schemes need to swap data between graphic cards, including texture synchronization when developers use techniques such as render-to-texture. These needs will almost assuredly grow over time as tricks like render-to-texture become more popular and GPUs become more powerful. However, dual PCI Express x16 chipsets right now mostly bring marginal performance gains outside of particular scenarios where PCI-E bandwidth is at a premium.