ATI’s CrossFire dual-graphics solution

NVIDIA DID A VERY clever thing when it designed its GeForce 6 series graphics processors, building into them a compositing engine and provisions for a digital interconnect between graphics cards that could allow two (or more) graphics cards to work together in tandem. Once PCI Express became fairly common, NVIDIA pulled the curtains back on its SLI technology, scoring a coup on several fronts. The ability to team two graphics cards gave NVIDIA an indisputable edge in the otherwise tight struggle with ATI for graphics performance supremacy. The possibility of adding a second card gave enthusiasts an upgrade path they couldn’t get elsewhere, and the exclusivity of SLI allowed NVIDIA to muscle would-be chipset competitors out of the high end of the motherboard market, selling a bundle of nForce4 SLI chipsets. The number of actual, fully configured SLI systems in the wild may not be huge, but NVIDIA has gotten tremendous mileage out of dangling the tantalizing prospect of dual-graphics goodness in front of PC enthusiasts.

Naturally, ATI wanted to match NVIDIA’s dual-graphics capabilities stride for stride, but doing so wouldn’t be easy. Yes, at its heart, the technology is fundamentally simple—the output from two graphics cards is combined to offer nearly double the pixel-pushing power of a single card. Yes, ATI graphics processors have had the ability to run in parallel configurations for some time—since the debut of the Radeon 9700—and high-end visualization systems like the Evans & Sutherland Renderbeast have put multiple ATI GPUs at work in teams as large as 64 chips. But the Radeon X800 family of GPUs wasn’t built with provisions for a GPU-to-GPU communications and image compositing, and doing this work over a PCI Express link wouldn’t be fast enough for real-time, high-resolution gaming.

In order to make a credible rival for SLI, ATI had to go a different route. ATI’s consumer dual-graphics platform, known as CrossFire, would require a fair amount of ingenuity and some new, custom hardware. That hardware comes in the form of so-called “master” cards that incorporate additional chips to handle the communications and compositing needed to make CrossFire work. One of these master cards can hook up with a regular X800-series graphics card for multi-GPU bliss.


The Radeon X850 XT CrossFire Edition master card


Without its cooler, the master card’s extra chips are easy to see


A close-up look at the five chips that make a master card special

CrossFire is a pretty slick scheme, really, given the limitations imposed by the original X800 hardware. ATI equipped its master cards with five new chips, pictured above. The second largest of the five chips there is a TMDS receiver made by Texas Instruments. To make up for the lack of a dedicated digital interconnect between GPUs, the master card can intercept and decode the DVI output of the slave card using this receiver. Next to it, the largest of the chips is a Spartan-series FPGA chip from Xilinx. FPGA stands for Field Programmable Gate Array, which is a fancy way of saying that this is a programmable logic chip. In this case, ATI has programmed the Xilinx FPGA to act as CrossFire’s compositing engine, tasked with combining the images generated by the two Radeon GPUs into a single stream of video frames. The smaller chip just below the FPGA in the picture is a flash ROM; presumably, it holds the programming for the FPGA.

Once the images from the slave card have been decoded by the TMDS receiver and composited with the images from the master card’s GPU by the FPGA, they have to be output to a display. That’s where the last two chips come into the picture. The little, square chip above the FPGA is a RAMDAC chip made by Analog Devices. The RAMDAC converts digital video information for output to an analog display, such as a VGA monitor. Just above the TDMS receiver is a smaller, rectangular chip from Silicon Image. That’s a TMDS transmitter capable of encoding images for output to a digital display via a DVI output.

All together, these five chips add the necessary functionality to ATI’s master cards to allow a pair of graphics cards to run together in an SLI-like configuration with very little performance penalty for inter-chip communication or image compositing.

 

Caught in the CrossFire?
Of course, this scheme does impose some limitations on CrossFire configurations, not least of which is the need for a master card in order for the scheme to work. The master card has a high-density DMS-59 port onboard. An external, three-headed Y cable plugs into this high-density port on the master card and into the DVI output on the CrossFire slave card. The cable’s third port is a DVI output port, for connection to a monitor (or to a DVI-to-VGA converter.)


The slave card (left) and master card (right). Note the master card’s high-density connector (top).


The CrossFire dongle cable links the cards and sends output to the monitor

All of this works rather transparently once everything is connected properly, but it is a bit of a hassle to plug together. Also, when CrossFire is enabled, the slave card’s secondary video output cannot be used to drive a display. Fortunately, CrossFire can be enabled and disabled via the Catalyst Control Center without rebooting, unlike SLI.

Our pre-production CrossFire master card had another annoying limitation. When connected to our massive Mitsubishi Diamond Plus 200 monitor, it would not display POST messages, BIOS menus, or the WinXP boot screen. ATI says this is an incompatibility between pre-production master cards and monitors with missing or incomplete EDID data, and they claim it will be resolved in production master cards. I hope that’s true, because it was a mighty annoying problem that rendered almost useless a slightly older, but still very nice, monitor. (Ok, it’s a hunk o’ junk, but I still wish it worked.)

More onerous is a problem ATI can’t easily resolve: CrossFire is limited to a peak resolution of 1600×1200 at a 60Hz refresh rate. CrossFire relies on the single-link DVI output of existing Radeon X800-family graphics cards, and that connection tops out at 1600×1200 at 60Hz. Now, most folks don’t tend to play games at resolutions above 1600×1200, but for an uber-high-end dual-graphics platform, this limitation isn’t easily ignored. We’ve already demonstrated in our past efforts at SLI benchmarking that current games often don’t benefit from a dual-graphics performance boost at mere mortal resolutions. More importantly, owners of nice, big CRT monitors probably won’t appreciate being confined to the flickery domain of 60Hz refresh rates at that peak 1600×1200 resolution—especially since NVIDIA’s SLI doesn’t share this limitation.

Some folks have speculated about the possibility that ATI might circumvent the refresh rate limitations of the existing Radeon X800 cards’ DVI ports through a clever implementation that would interleave, say, 60Hz output from the slave card with 60Hz output from the master card, resulting in 120Hz effective output. This scheme could conceivably work with certain 3D graphics load-balancing schemes, like alternate frame rendering. However, such an implementation would require the FPGA compositing engine to have a large amount of embedded memory onboard (or some external memory) in order to hold a full frame of image data at 1600×1200, and it still wouldn’t work in most rendering modes. ATI decided that an exotic scheme like this wasn’t wholly workable or cost effective. It would have to live with the 1600×1200 at 60Hz limit, and its choice of components for the master cards, including the FPGA and other chips, was informed by this decision.

When pressed about this limitation, ATI argues that higher resolutions simply won’t matter to most gamers, but also says forthrightly that “future generations” of CrossFire will be capable of higher resolutions and refresh rates. With ATI’s next-gen R520 GPU looming close on the horizon, one could infer that we may not have to wait long for these future versions of CrossFire.

 

Load-balancing methods in CrossFire
If you’re familiar with NVIDIA’s SLI, the methods used for balancing the load between two graphics cards in CrossFire will be largely familiar. These modes are:

  • SuperTiling — This method is the default for Direct3D applications, and it’s also the only mode unique to CrossFire. The screen is subdivided into a checkerboard-like pattern of 32×32-pixel squares, with one card rendering what would be the red squares on the checkerboard, and the other rendering what would be the black squares on the board. ATI says this method distributes the load between the cards neatly and efficiently, but SuperTiling offers benefits only in terms of pixel-pushing power, not geometry computation. Both cards must compute the underlying geometry for each frame individually. SuperTiling is not supported in OpenGL.
  • Scissor mode — NVIDIA calls this mode “split-frame rendering,” but scissor mode is the same basic thing. The screen is split horizontally, with one card rendering the top half of the frame and the other card rendering the bottom half. In OpenGL applications, the split between the frames is static at 50% per card. Direct3D applications get dynamic load balancing, with the split between the cards varying on a frame-by-frame basis. As with SuperTiling, scissor mode doesn’t split up the work of geometry computation. Scissor mode is the default load-balancing method for OpenGL apps.
  • Alternate-frame rendering — The king of all multi-GPU load balancing modes is alternate-frame rendering, or AFR for short. AFR interleaves full frames rendered by the two cards, so that, say, the master card renders odd frames and the slave card renders even ones. This is the preferred load-balancing mode whenever possible, because AFR shows markedly better performance scaling than other modes. Part of the reason for AFR’s good performance is the fact that it splits the geometry processing load between the two cards evenly, something no other mode does.
  • SuperAA — CrossFire can also be used to improve image quality instead of raw performance, thanks to its SuperAA mode. ATI announced SuperAA along with the rest of the CrossFire concept back in June, and NVIDIA has since delivered its own SLI antialiasing mode to match. In SuperAA, each card renders a frame with some degree of antialiasing, and then the two images are blended to yield twice the effective antialiasing. The sample pattern used by each card is different, so that the resulting image gets twice as many unique samples. We will discuss SuperAA and its sample patterns in more detail later, but we should note that ATI offers four SuperAA modes that it has named 8X, 10X, 12X, and 14X AA. In the case of CrossFire, SuperAA images are blended by the Xilinx FPGA-based compositing engine, and SuperAA performance may be gated by the computational power of the FPGA chip.

    Although SLI antialiasing works in both OpenGL and Direct3D applications, SuperAA is restricted to Direct3D only.

When ATI first announced CrossFire, it proudly proclaimed that CrossFire would be able to accelerate all 3D applications without the need for application-specific profiles. That turns out to be true in that CrossFire defaults to SuperTiling for all Direct3D apps and scissor mode for OpenGL. However, in order to get the larger performance payoff of alternate-frame rendering or other non-default load-balancing techniques, ATI does in fact rely on application profiles through the Catalyst A.I. function of its graphics drivers—very much like NVIDIA uses profiles with SLI.

Perhaps because CrossFire will accelerate most applications without outside help, ATI offers users very little control over CrossFire rendering modes. It’s possible to disable Catalyst A.I. and thus force the use of the default load-balancing modes for Direct3D and OpenGL, and users may choose the SuperAA mode they wish to use. Otherwise, the user has little ability to tweak CrossFire, and outside of the occasional checkerboard pattern flickering on the screen when exiting an app in SuperTiling mode, there’s no visual indicator to tell you which load-balancing method is active. I asked ATI for a list of CrossFire accelerated games, and they refused to provide one.

Contrast this approach to SLI, where NVIDIA offers the option of a visual load-balancing graphic, extensive control over application profiles and SLI rendering modes via its “Coolbits” registry key, and a long list of SLI-accelerated games. ATI says it has no plans to expose this degree of user control in CrossFire.

NVIDIA has also said repeatedly that it will give developers access to SLI modes via software, and the upcoming game F.E.A.R. will be among the first to include native SLI support out of the box. ATI says it doesn’t have plans to expose access to CrossFire rendering modes to developers, although it is working with developers on making sure they write CrossFire-compatible software.

Speaking of that, one potential fly in the ointment is the ever-growing use of techniques like render-to-texture that don’t play well with distributed rendering schemes like SLI and CrossFire. ATI says that problematic data is passed back and forth between cards via the PCI Express connection as needed, and that most of the time it’s not a performance problem. I think it could well become a significant drag on performance as games use more advanced rendering methods in the future, and both ATI and NVIDIA will have to deal with it.

Master cards: take your pick of two
For the Radeon X800 family, ATI will initially supply two different master cards that will match up to a wide range of slave cards. At a list price that ATI claims will be $349, the Radeon X850 XT CrossFire Edition will be clocked like a Radeon X850 XT, feature 256MB of RAM, and will be capable of running in tandem with any PCI Express-based Radeon X850 card, including the Pro and XT Platinum Edition.

ATI also plans to offer a Radeon X800 CrossFire Edition card at a purported $299 list price for the 256MB version. This card will be clocked at the same speed as a Radeon X800 XL and should offer compatibility with any Radeon X800-class PCI Express graphics card—that is, anything but the X850 series. The list of compatible cards even includes the older Radeon X800 XT and X800 Pro cards.

ATI initially announced plans for a Radeon X800 CrossFire Edition card with 128MB of RAM and a $249 price tag, but those plans were apparently scrapped. ATI says its board partners are free to manufacture such cards if they wish. Board partners can also make the higher-end X800 and X850 CrossFire Edition cards, or they can buy them from ATI. I’d expect the first wave of master cards to come from ATI, whether they bear ATI’s name or not. Frankly, I wouldn’t expect board makers to focus much attention on X800-series master cards of any flavor with the next generation of ATI GPUs coming soon.

How a CrossFire Edition card will handle running with a mismatched slave card depends on the situation. For example, in the case of the Radeon X850 XT Platinum Edition, the master card will run at its regular, stock speeds and the Platinum Edition card will run at its native, slightly faster clock speeds. In the case of the Radeon X850 Pro, which has only 12 pipes, the 16-pipe master card will scale itself back to only 12 pipes by disabling four. The same principles apply for the X800 series.

I suppose it’s nifty that CrossFire configurations can include mismatched cards, but it’s mostly just a necessity given the two flavors of CrossFire Edition master cards. The performance ramifications of these mismatched configurations will probably depend somewhat on the load-balancing method that’s in use, but I’d expect a mismatched CrossFire rig to perform more or less like a pair of the slower of the two cards. Radeon X800 XT owners may not appreciate having the performance of two Radeon X800 XLs, but those are the breaks.

 

The chipset piece of the equation
CrossFire is more than just a dual-graphics technology, though, according to ATI. It’s a platform, and that platform is anchored by the CrossFire Editions of ATI’s Radeon Xpress 200 chipset. We first reviewed the AMD version of the Radeon Xpress 200 nearly a year ago, and found it to be a decent solution, despite a few warts.

The CrossFire Edition’s south bridge is a newer revision than the one we first tested. Dubbed the SB450, this new south bridge includes support for Intel’s High Definition Audio specification, bringing better resolutions and sample rates to stock PC audio. Unfortunately, the SB450 is missing a number of other enthusiast-class checkmarks, including newer Serial ATA features like 3Gbps transfer rates and Native Command Queuing. The SB450’s four SATA ports do support RAID, but only levels 0 and 1.


A block diagram of the Radeon Xpress 200 CrossFire Edition chipset for AMD. (Source: ATI.)

Rather than fret over the Radeon Xpress 200’s shortcomings, ATI has attempted to bolster its chipset’s enthusiast credentials by designing a virtual showcase of a motherboard reference design and pushing mobo makers to manufacture boards based on it. We recently reviewed a motherboard from Sapphire that’s very similar to the design of the ATI CrossFire reference board, with the obvious exception of the second PCI-E slot. That board overclocked exceptionally well for us and generally performed on par with an nForce4-based competitor. We did find some performance pitfalls, though. We’ll revisit those in our evaluation of the CrossFire platform.


ATI’s Radeon Xpress 200 CrossFire Edition reference board with dual X850 XT cards

In its reference design, ATI has augmented the SB450 cyborg-style by using PCI Express auxiliary I/O chips to support better SATA with 3Gbps transfer rates, NCQ, and Gigabit Ethernet. We’d prefer that all of a board’s SATA ports offer the latest capabilities, and we’d like to have RAID levels 5 and 10. Still, the reference design’s additions do put the Radeon Xpress 200 at the center of a credible enthusiast-class mobo.


A block diagram of the Radeon Xpress 200 CrossFire Edition chipset for Intel. (Source: ATI.)

Speaking of credible enthusiast-class hardware, Intel has been struggling to meet that standard with its Pentium 4 processors of late, but that didn’t stop ATI from churning out an Intel-centric version of the Radeon Xpress 200 CrossFire Edition. Unlike the AMD version, this chipset needs to have a very solid memory controller on the north bridge in order to keep up with the nForce SLI Intel Edition and the Intel 955X. ATI says its Intel CrossFire chipset will support front-side bus speeds up to 1066MHz and DDR2 memory up to 866MHz, with higher RAM speeds in validation now. We’ll have to test this chipset to see whether it passes muster compared to the competition, but we have reservations about the performance benefits of dual-graphics motherboards with current Intel Pentium 4/D processors.

Oddly enough, ATI will be supplying CrossFire Edition chipsets that also have an integrated graphics processor, as the block diagram for the Intel-oriented chipset above shows. These chipsets will be capable of driving an additional display even when both PCI-E graphics slots are occupied, thanks to ATI’s SurroundView capability.

Another possible option for CrossFire chipsets comes from Intel in the form of the 955X chipset. ATI has announced that it will validate the 955X for use with CrossFire, but unfortunately, we were not able to obtain drivers from ATI that would allow us to test CrossFire performance with the 955X.

A new wrinkle: the transposer card
Although ATI initially claimed that no redirector card would be needed in order to switch CrossFire from single-slot to dual-slot configurations, that turns out not to be the case. Our CrossFire review kit came with a so-called transposer card that plugs into the secondary PCI Express graphics slot in order to enable sixteen-lane operation of the primary slot with only one graphics card installed. This card works in concert with a BIOS setting that shifts the board from single- to dual-slot operation.

Alternately, with two graphics cards installed and the proper BIOS settings, eight of the sixteen lanes connected to the primary graphics slot can be redirected to the secondary slot.


The transposer card rests in the secondary PCI Express slot

Don’t forget to remove the transposer card from the secondary PCI-E slot when the mobo’s BIOS is set to dual-graphics mode, or the board won’t POST.

This card turns out to be even more of an inconvenience than the paddle card on an older or lower-priced nForce4 SLI motherboard, if for no other reason than that there’s nowhere on the motherboard to store it when it’s not in use. The thing looks easy to lose to me.

Crossfire’s competition?
ATI has been very careful not to position CrossFire head to head against NVIDIA’s new GeForce 7800 GTX graphics chips. Instead, they gingerly point out that a CrossFire rig with a pair of Radeon X850 XT cards is best matched up, in terms of price and performance, against a GeForce 6800 Ultra SLI system. ATI says its competition for NVIDIA’s $499 graphics cards is coming soon, and like the GeForce 7 series, it, too, will be a next-generation product.

Of course, we couldn’t resist throwing a pair of GeForce 7-series cards into the mix, but we chose NVIDIA’s GeForce 7800 GT, not the GTX. The 7800 GT isn’t priced too far above the Radeon X850 XT right now, all told. Still, the 6800 Ultra may be the more appropriate comparison, because our 7800 GT cards are “overclocked in the box” variants from XFX. They are real consumer products, but they do run at slightly higher clock speeds (450/525MHz) than a stock 7800 GT (400/500MHz).

 

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.

Our test systems were configured like so:

Processor Athlon 64 X2 4800+ 2.4GHz
System bus 1GHz HyperTransport
Motherboard Asus A8N-SLI Deluxe ATI CrossFire reference board
BIOS revision 1013 080012
North bridge nForce4 SLI Radeon Xpress 200P CrossFire Edition
South bridge RS450
Chipset drivers SMBus driver 4.45
SATA IDE driver 5.34
SMBus driver 5.10.1000.5
SATA IDE driver 5.0.0.2
Memory size 1GB (2 DIMMs)
Memory type OCZ EL PC3200 DDR SDRAM at 400MHz
CAS latency (CL) 2
RAS to CAS delay (tRCD) 2
RAS precharge (tRP) 2
Cycle time (tRAS) 8
Hard drive Maxtor DiamondMax 10 250GB SATA 150
Audio Integrated nForce4/ALC850
with Realtek 5.10.0.5900 drivers
Integrated RS450/ALC880
with Realtek 5.10.00.5152 drivers
Networking NVIDIA Ethernet driver 4.82
Marvell Yukon 8.39.3.3 drivers
VIA Velocity v24 drivers
Marvell Yukon 8.39.3.3 drivers
VIA Velocity v24 drivers
Graphics GeForce 6800 Ultra 256MB PCI-E with ForceWare 78.03 drivers Dual GeForce 6800 Ultra 256MB PCI-E with ForceWare 78.03 drivers XFX GeForce 7800 GT 256MB PCI-E with ForceWare 78.03 drivers Dual XFX GeForce 7800 GT 256MB PCI-E with ForceWare 78.03 drivers Radeon X850 XT  PCI-E  with Catalyst 8.162.1-050811a-026057E drivers Dual Radeon X850 XT  PCI-E  with Catalyst 8.162.1-050811a-026057E drivers
OS Windows XP Professional (32-bit)
OS updates Service Pack 2

Thanks to OCZ for providing us with memory for our testing. If you’re looking to tweak out your system to the max and maybe overclock it a little, OCZ’s RAM is definitely worth considering.

All of our test systems were powered by OCZ PowerStream 520W power supply units. The PowerStream was one of our Editor’s Choice winners in our last PSU round-up.

Unless otherwise specified, the image quality settings for both ATI and NVIDIA graphics cards were left at the control panel defaults.

The test systems’ Windows desktops were set at 1280×1024 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.

We used the following versions of our test applications:

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

 

Doom 3
We’ve conducted our testing almost exclusively with 4X antialiasing and a high degree of anisotropic filtering. We generally used in-game controls when possible in order to invoke AA and aniso. In the case of Doom 3, we used the game’s “High Quality” mode in combination with 4X AA.

Our Delta Labs demo is typical of most of this game: running around in the Mars base, shooting baddies. The imaginatively named “trdemo2” takes place in the game’s Hell level, where the environment is a little more varied and shader effects seem to be more abundant.

The Radeon X850 XT still trails the GeForce 6800 Ultra in Doom 3—an ancient blood feud, this one—but the CrossFire system scales just as well from one card to two as the SLI rig.

 

Far Cry
Next up is Far Cry, which takes advantage of Shader Model 3.0 to improve performance. The game also has a path for ATI’s Shader Model 2.0b. Our first demo takes place in the jungle with lots of dense vegetation and even denser mercenaries. All of the quality settings in the game’s setup menu were cranked to the max.

The performance scaling picture is again similar between SLI and CrossFire, although the Radeon X850 XT looks relatively stronger here than in Doom 3.

 

The Chronicles of Riddick: Escape from Butcher Bay
This game has a Shader Model 3.0-type mode, but to keep things even for comparison, I ran all cards with the SM2.0 path.

Riddick is another OpenGL game where the Radeon X850 XT struggles to keep pace. Fortunately, the addition of a second card yields substantial performance gains.

 

Splinter Cell: Chaos Theory
We’re using the 1.04 version of Splinter Cell: Chaos Theory for testing, and that gives us some useful tools for comparison. This new revision of the game includes support for Shader Model 2.0, the DirectX feature set used by Radeon X850 XT cards. The game also includes a Shader Model 3.0 code path that works optimally with GeForce 6 and 7-series GPUs. Because SM2.0 and SM3.0 can produce essentially the same output, we’ve tested the ATI cards with SM2.0 and the NVIDIA cards with SM3.0. (The game won’t let NVIDIA cards run in SM2.0 mode, although they are capable of doing so.)

In our first test, we enabled the game’s parallax mapping and soft shadowing effects. In the second, we’ve also turned on high-dynamic-range lighting and tone mapping, for some additional eye candy. Due to limitations in the game engine (and in NVIDIA’s hardware), we can’t use HDR lighting in combination with antialiasing, so the second test was run without edge AA.

Here, the CrossFire rig outperforms the GeForce 6800 Ultra SLI system, at last. Once again, performance scales nicely from one card to two.

 

Battlefield 2
We tested the next few games using FRAPS and playing through a level of the game manually. For these games, we played through five 90-second gaming sessions per config and captured average and low frame rates for each. The average frames per second number is the mean of the average frame rates from all five sessions. We also chose to report the median of the low frame rates from all five sessions, in order to rule out outliers. We found that these methods gave us reasonably consistent results, but they are not quite as solid and repeatable as other benchmarks.

The Radeon X850 XT CrossFire system turns in a higher average frame rate than a single-card setup, but its low frame rate is about the same as that of the lone X850 XT. Overall, though, a jump from 40 FPS to 53 FPS is a noteworthy improvement in a twitchy first-person shooter like this one.

F.E.A.R. demo
The F.E.A.R. demo looks purty, but that comes at the cost of frame rates. We actually had to drop back to 1024×768 resolution in order to hit playable frame rates, although we did have all of the image quality settings in the game cranked.

The F.E.A.R. demo probably doesn’t have a Catalyst A.I. profile yet, and it shows. The CrossFire setup turns in nearly identical frame rates to the single Radeon X850 XT. The SLI systems, by contrast, scale up appropriately from one card to two.

Guild Wars

Here’s a game that’s unfortunately really CPU limited on the high-end NVIDIA setups, as the top three scores would seem to suggest. However, CrossFire actually delivers a performance hit here. I was concerned that these numbers might not be right, so I played through another five sessions with on the CrossFire rig, just to be sure.

And, hey, level up!

The additional sessions only confirmed the earlier numbers. CrossFire slows down Guild Wars with ATI’s current drivers.

 

3DMark05

CrossFire again scales well in 3DMark05’s three game tests that determine its overall score.

CrossFire also handles itself competently in 3DMark’s synthetic feature tests. Notably, it scales better than the GeForce 7800 GT does in 3DMark’s vertex shader tests.

 

CrossFire Super AA
CrossFire’s SuperAA modes benefit greatly from the programmability of the Radeon X800 series’ antialiasing hardware. Like the non-CrossFire modes, SuperAA uses gamma-correct blending to achieve smoother color gradients on real-world displays, and it uses non-grid aligned sampling patterns to better fool the eye. I’ve used a handy AA sample pattern tester to show us the sampling patterns used by both CrossFire SuperAA and SLI antialiasing, and you can see the results below. The green dots represent texture samples, and the red dots represent geometry samples used to determine the coverage mask for the color blend to be performed by the multisample antialiasing algorithm.

  Radeon X850 XT/CrossFire GeForce 6800/7800 GeForce 6800/7800 SLI AA
2X

 
4X

 
6X

   
8X

10X

   
12X

   
14X

   
16X    

That’s a lotta dots. Let me note several things while your eyes attempt to recover. First, there is some disconnect between the naming schemes used by ATI and NVIDIA, and it’s largely caused by some alternate modes present in SuperAA. SuperAA 8X appears to include a single texture sample plus eight coverage samples, but that’s not the case. In truth, SuperAA 8X includes two texture samples situated directly on top of one another. Each card must grab a texture sample in order for the CrossFire scheme to work; the resulting complete images are then blended by the FPGA compositing engine. ATI has simply chosen to include a mode where the texture samples are in the same place in order to avoid the blurring of fine text and details that multiple, offset texture samples can cause. If you prefer an element of full-scene supersampled antialiasing, you can choose SuperAA 10X, which collects the same number of coverage samples and offsets the two texture samples. Performance at 8X and 10X SuperAA should be the same. In truth, SuperAA 10X is closer to NVIDIA’s two “8X” modes, the “8xS” mode offered on single cards and the 8X SLI AA mode, both of which collect eight coverage samples and two texture samples.

Super AA 12X and 14X are much the same story. The 12X mode grabs two texture samples from the same location, while 14X pulls texture samples from different places. The actual number of texture and coverage samples is the same: 12 coverage and two texture. Neither of these modes corresponds directly, in terms of the number of samples, to an SLI AA mode.

The real advantage of SuperAA comes in the form of its excellent sampling patterns. SuperAA uses two very distinct patterns on the two cards, and the end results are very nice, quasi-random sampling locations. NVIDIA’s SLI antialiasing simply jitters the GeForce cards’ existing 4X and 8xS sample patterns slightly. Unfortunately, in the case of 16X SLI AA especially, the resulting sample points cluster in pairs very close to one another, diluting the likely effectiveness of the AA mode. Also, although the GeForce 7800 series offers gamma-correct blends for antialiasing, this feature isn’t available with SLI AA.

I’ve tested SuperAA performance using Half-Life 2, which is very much a CPU-bound game. SuperAA is probably best used in games such as this one, because it has quite an impact on performance, as does SLI AA.

This occasion has also given me the opportunity to create what I believe is the most incredibly Byzantine results graph in the history of TR. I couldn’t be prouder. Gaze upon it, ye who seek enlightenment, and behold its arcane glory.

You can, uh, also look at the results in the data table below the graph, just in case. Note that I have not attempted to equalize things in any way by lining up ATI’s 10X mode with NVIDIA’s 8X modes, even though they are essentially equal in terms of sample size.

Personally, I’m too confused by the graph above to draw any meaningful conclusions, but I will attempt a few comments. You can see that, as predicted, ATI’s 8X and 10X SuperAA modes perform identically; so do 12X and 14X. Also, the performance drop-off on the SuperAA modes is steep, but not so steep as that of the GeForce 6800 Ultra in SLI AA. The 7800 GT handles the SLI AA modes with less trouble, as one might expect from a next-generation GPU.

 

Radeon X850 XT CrossFire antialiasing quality
Now let’s peer at some pictures. The images below were taken from Half-Life 2 and resized to two times their original size. They have not been otherwise altered. You can see how antialiasing improves the fine detail of the ever-so thin antennas on top of the buildings and of the electrical wires running through the scene. Also notice the near-horizontal edges on the tops of the buildings; AA should remove the jaggies there. Finally, the leaves on the tree in the picture are not touched by multisample AA, but they do get a degree of smoothing from AA modes with multiple, offset texture samples, as you will see.


Dual Radeon X850 XT – No antialiasing


Dual Radeon X850 XT – 2X antialiasing


Dual Radeon X850 XT – 4X antialiasing


Dual Radeon X850 XT – 6X antialiasing


Dual Radeon X850 XT – 8X CrossFire antialiasing


Dual Radeon X850 XT – 10X CrossFire antialiasing


Dual Radeon X850 XT – 12X CrossFire antialiasing


Dual Radeon X850 XT – 14X CrossFire antialiasing

The fine details in this scene are clarified by the SuperAA modes, and 14X mode is the ultimate expression of that improvement. The antennas and wires are delicately and accurately etched against the sky, while the gradient on the rightmost rooftop’s near-horizontal edge is buttery smooth. Meanwhile, the leaves of the tree are smoothed out somewhat, as well.

 

SLI antialiasing quality


Dual GeForce 7800 GT – No antialiasing


Dual GeForce 7800 GT – 2X antialiasing


Dual GeForce 7800 GT – 4X antialiasing


Dual GeForce 7800 GT – SLI 8X antialiasing


Dual GeForce 7800 GT – SLI 16X antialiasing

NVIDIA’s SLI AA modes also look pretty nice, feathering out the edges of the tree’s leaves and helping the pathological case of the antennas.

 

Dual-graphics antialiasing cage match
Now, let’s pull the SuperAA and SLI AA screenshots on to a single page to see how they really compare.


Dual Radeon X850 XT – 8X CrossFire antialiasing


Dual Radeon X850 XT – 10X CrossFire antialiasing


Dual GeForce 7800 GT – SLI 8X antialiasing


Dual Radeon X850 XT – 12X CrossFire antialiasing


Dual Radeon X850 XT – 14X CrossFire antialiasing


Dual GeForce 7800 GT – SLI 16X antialiasing

Make of these images what you will, but I’d say this is a clear win for ATI. To my eye, the SuperAA 10X mode produces smoother gradients, better preservation of fine details in the antennas, and smoother handling of the tree leaves than SLI 8X. Even NVIDIA’s 16X mode, which grabs more samples than SuperAA 14X, doesn’t look quite as good as ATI’s 14X mode for the same reasons.

 

Mismatched cards in CrossFire
So how does running a pair of mismatched cards affect CrossFire performance? I paired up a Radeon X850 XT Platinum Edition with our Radeon X850 XT master card to find out. Here are the results from 3DMark05.

That’s right, the yellow line is on top of the reddish line throughout the results, because the mismatched cards performed almost exactly like a pair of Radeon X850 XTs. I don’t know what else one should expect from them, but I thought it would be interesting to test.

 

SuperTiling performance
Here’s a completely loony test that I just had to try. I turned off Catalyst A.I., effectively forcing the CrossFire cards into SuperTiling mode, to see how they would perform.

The SuperTiling CrossFire rig is really pokey in 3DMark05’s three game tests, even slower than a single card. This performance is pretty well consistent with what we saw in the F.E.A.R. demo and Guild Wars, where turning on CrossFire was a net performance loss. These results can’t be encouraging for those pinning their hopes on CrossFire “just working” in any 3D game.

The three synthetic pixel-pushing tests above confirm that SuperTiling is indeed working correctly; the basic fill rate and pixel shading power of the two Radeon X850 XT cards is evident.

SuperTiling doesn’t help vertex processing performance, just as expected. The SuperTiling setup performs like a single card in the vertex shader tests. The CrossFire rig must normally run in AFR mode in 3DMark05.

 

Chipset I/O performance
Now that we’ve flogged CrossFire’s graphics performance sufficiently, let’s take a quick look at the chipset’s I/O performance to see whether we find a reprise of the USB and PCI Express Ethernet problems we found on Sapphire’s implementation of the ATI Radeon Xpress 200 reference design.

Ethernet throughput
We evaluated Ethernet performance using the NTttcp tool from the Microsoft’s Windows DDK. We used the following command line options on the server machine:

ntttcps -m 4,0,192.168.1.23 -a

..and the same basic thing on each of our test systems acting as clients:

ntttcpr -m 4,0,192.168.1.25 -a

We used our Abit IC7-G-based system as the server for these tests. It has an Intel NIC in it that’s attached to the north bridge via Intel’s CSA connection, and it’s proven very fast in prior testing. The server and client talked to each other over a Cat 6 crossover cable.

The Ethernet tests are what they are, but they also serve as something of a proxy for general PCI Express x1 performance. In order to keep things even, we tested the nForce4 SLI’s Ethernet throughput using a PCI-E x1 card bearing the exact same Marvell 88E8052 PCI-E GigE chip found on ATI’s reference motherboard—with the exact same drivers. We also tested NVIDIA’s built-in Gigabit Ethernet with ActiveArmor acceleration to see if it provides any benefits. Although we didn’t test it in the review of the Sapphire board, I also decided to stress PCI performance using a PCI-based VIA Velocity GigE NIC.

It’s redemption and disgrace all at once for the Radeon Xpress 200. This board shows none of the PCI Express throughput problems that we saw with the Sapphire board. In fact, the Radeon Xpress 200 CrossFire Edition turns in lower CPU utilization with the PCI-E NIC than the nForce4 SLI does. However, our decision to test with a PCI NIC has exposed another weakness: abysmal PCI throughput. Not everyone is going to care about ye olde PCI bus this day and age, but some folks will. I wouldn’t want to rely on the Radeon Xpress 200’s PCI implementation to host a GigE NIC, a RAID card, or an array of TV tuners in a home theater PC system, for instance.

NVIDIA’s ActiveArmor TCP acceleration finally pays off in the form of lower CPU utilization. Too bad it took NVIDIA months and months into the life of the product to make it work as advertised.

USB performance
We used HD Tach to measure USB transfer rates to a Maxtor DiamondMax D740X hard drive in a USB 2.0 drive enclosure.

The Radeon Xpress 200’s low USB transfer rates and relatively high CPU utilization persist. Like the slow PCI performance, this isn’t a deal-killer unless you have some specific plans for high-throughput USB peripherals. Still, I’d prefer to see better performance here. It’s possible that some of ATI’s motherboard partners may opt to use ULi’s Radeon Xpress 200-compatible south bridge in place of the SB450 in order to sidestep these problems.

 

Power consumption
We measured total system power consumption at the wall socket using a watt meter. The monitor was plugged into a separate outlet, so its power draw was not part of our measurement. The idle measurements were taken at the Windows desktop, and cards were tested under load running a loop of 3DMark05’s “Firefly Forest” test at 1280×1024 resolution.

The CrossFire rig slots right in between the 6800 Ultra SLI and 7800 GT SLI systems, with NVIDIA’s new generation of GPUs pulling less power than the older ones.

Noise levels
We used an Extech model 407727 digital sound level meter to measure the noise created (primarily) by the cooling fans on our two test systems. The meter’s weightings were set to comply with OSHA standards. I held the meter approxiately two inches above the tops of the graphics cards, right between the systems’ two PCI Express graphics slots.

Honestly, I didn’t expect this result at all. The CrossFire rig comes out looking quieter than the GeForce 6800 Ultra SLI system. Subjectively, the whine coming off of the Radeon X850 XT cards seemed much louder and more annoying than the hiss of the fans on the GeForce 6800 Ultra cards. At least my impression that the 7800 GTs were the quietest overall was confirmed.

 
Conclusions
At the end of the day, ATI has achieved most of its basic goals with CrossFire. The whole scheme works reasonably well within its limitations. CrossFire performance scales nicely from one card to two, so long as you’re talking about matched cards, and so long as the application you’re using has a profile waiting for it in Catalyst A.I. Also, CrossFire’s SuperAA modes offer the best antialiasing quality around, bar none. Unfortunately, the default SuperTiling mode doesn’t scale well at all, in our experience, so the prospect of CrossFire “just working” with any application whatsoever seems a little bit far-fetched.

In many respects, the first generation of CrossFire isn’t as refined as NVIDIA’s current implementation of SLI. Some of CrossFire’s problems could be fixed in software given time, including the dearth of user control over load-balancing methods and the lack of any indicator of the load-distribution technique in use. ATI simply needs to decide to do the right thing and open up CrossFire for tweaking. Other CrossFire shortcomings will likely be addressed with the release of new ATI graphics hardware, including the resolution/refresh rate ceiling of 1600×1200 at 60Hz. However, some CrossFire idiosyncracies probably won’t be going away any time soon, including the need for a separate CrossFire Edition master card, those pesky external cables, and the relatively pokey PCI and USB performance of the Radeon Xpress 200 south bridge.

Ultimately, these things shake down to a few essential truths. As a whole platform or solution, CrossFire isn’t as good as SLI, but it’s probably good enough. CrossFire’s true fate and desirability will be more than likely determined by ATI’s next generation of GPUs and by the master cards that will go with them. If those products are good, CrossFire should succeed. If they’re not, folks will probably decide that CrossFire isn’t worth the hassle.

For those of you who currently own Radeon X800 or X850 cards and are pondering an upgrade to CrossFire, my advice to you is this: wait for ATI’s new graphics cards before making the leap. You will quite likely decide you’d rather upgrade to one nifty, new graphics card than plunk down the cash for a Radeon X800 or X850 master card and motherboard. The first generation of CrossFire is probably too little, too late for most would-be upgraders.

If you just can’t bring yourself to heed my advice and wait, I believe Radeon X850 XT CrossFire Edition cards should be available starting today at online vendors, as should Radeon Xpress 200 CrossFire Edition motherboards. I suppose the master cards will come with the appropriate video drivers for CrossFire configs, and ATI says to expect the first public release of CrossFire-ready Catalyst drivers during the first week of October. Not long after that, on October 10, Radeon X800 master cards are slated to hit store shelves. If ATI delivers on everything it has promised, October will be a very interesting month indeed in the world of PC graphics. 

Comments closed

Pin It on Pinterest

Share This

Share this post with your friends!