CrossFire X explored

GPUs, it seems, are everywhere, breeding like rabbits. We see the introduction of a new GPU seemingly every month, and multi-GPU schemes like SLI and CrossFire are omnipresent. We now have multiple GPUs on a single graphics card, hybrid multi-GPU implementations involving integrated graphics, and more-than-two-way incarnations of both SLI and CrossFire.

The most intriguing bit of multi-GPU madness we’ve seen recently may be AMD’s CrossFire X, simply because in this generation, AMD opted to chain together three or four mid-range GPUs in place of creating a separate high-end graphics processor. That’s a bold move, fraught with peril, because multi-GPU schemes can be rather fragile, with iffy compatibility and less-than-ideal performance scaling. Then again, AMD’s decision to rely on CrossFire X to round out the high end of its product lineup has surely helped to concentrate its attention on making the scheme work well. So who knows?

We’ve taken a quick look at AMD’s first drivers for CrossFire X, and we have some interesting things to report. Read on to see what we learned.

Extending CrossFire to X

CrossFire X is, quite simply, an extension of the CrossFire dual-GPU feature to three and four GPUs. The hardware to make such a thing possible has been on the market for some time now, and last week’s release of the Catalyst 8.3 driver revision finally enabled this feature in software, as well. The basic building block of CrossFire X is AMD’s RV670 GPU, which is present in all of the various incarnations of the Radeon HD 3800 series of graphics cards. Getting to three or four GPUs can be achieved using a dizzying number of potential card combinations, which AMD has summarized in this helpful matrix:

Possible card combos for CrossFire X. Source: AMD.

The options are many. You could harness four GPUs together by using a pair of dual-GPU Radeon HD 3870 X2 cards, or given enough PCIe x16 slots, you could achieve a similar result using four Radeon HD 3850s. Cross-breeding is an option, as well, so a Radeon HD 3870 X2 could pair up with a single Radeon HD 3850 in a three-way config. Kinky.

A Radeon HD 3870 matched up with a Radeon HD 3870 X2 on an Intel X38 chipset

The caveat here is that CrossFire X will settle on the lowest core GPU clock, memory clock, and video RAM size to determine the operative clock speeds and effective memory size. As a result, a Radeon HD 3870 X2 paired with a Radeon HD 3850 256MB would perform like a trio of Radeon HD 3850 256MB cards. And, of course, that means the effective memory size for the entire GPU phalanx would effectively be 256MB, not 768MB, because memory isn’t shared between GPUs in CrossFire (or in SLI, for that matter).

Like its dual-GPU predecessor, CrossFire X works on a fairly broad range of motherboards, including those based on AMD 480, 580, and 7-series chipsets, as well as boards based on many of Intel’s more recent chipsets—among them: the 955, 965, 975, P35, G35, X38, and X48.

CrossFire X’s performance and feature set will be more or less optimal depending on the chipset’s topology and the motherboard’s allocation of PCIe lanes. AMD cites its own 790FX chipset as the most optimal possible config, where the motherboard could dedicate eight lanes of PCIe 2.0 bandwidth to each of four PCIe x16 slots. On the other hand, Intel’s P35 chipset would be less than ideal, since it has 16 lanes of PCIe 1.1 connectivity feeding a single PCIe x16 slot off of the north bridge chip, while the second PCIe x16 slot hangs off of the south bridge and has only four lanes connected. The P35’s lower bandwidth will impose some limitations on CrossFire X: image compositing must be done in hardware (so you’ll definitely need to have those CrossFire bridge connectors attached) and OpenGL support won’t be possible.

CrossFire X is either on or off—the user can’t specify three- or four-way operation

Of course, CrossFire X will impose its own set of limitations, simply due to its nature. As I’ve said, multi-GPU schemes are fragile, and more-than-two-way schemes are even more fragile than dual-GPU arrangements. Most of what I said about this subject in my three-way SLI review applies to CrossFire X, as well. CrossFire X will require game-specific profiles in the video driver to work best, and as a new technology, it has a limited stable of game profiles available. Even with a proper profile available, four GPUs will rarely be anything approaching four times as fast as a single GPU. Performance scaling just isn’t that easy. In many cases, you’d be lucky to see three times the performance of one GPU. Furthermore, AMD hasn’t yet implemented CrossFire X support for OpenGL games in its drivers, and for reasons we’ll discuss in a moment, most DirectX 10 games don’t yet benefit from the presence of a fourth GPU.

On the plus side, thanks to the power-efficient nature of AMD’s RV670 GPU, CrossFire X doesn’t impose the almost unreasonable power supply requirements of Nvidia’s three-way SLI. Our test system’s PC Power & Cooling Silencer 750W PSU, which is both quiet and fairly reasonably priced, had no trouble supplying power to both three- and four-way CrossFire X configs. Heck, a couple of Radeon HD 3870 X2s requires four PCIe aux power connectors, just like a pair of Radeon HD 2900 XTs.

CrossFire X has a few other nice attributes that SLI doesn’t share. One of those is the ability to work seamlessly with multiple monitors—no more enabling and disabling multi-GPU mode in order to switch between single-screen gaming and multi-display productivity sessions. We extolled the virtues of this feature in our Radeon HD 3870 X2 review—it’s a feature Nvidia’s SLI can’t match—and AMD now says this capability has been extended to more than two GPUs. Even four GPUs and eight displays ought to work effortlessly, as I understand it, though I’ve not had the chance to try it out myself. The one drawback here is that 3D apps running in a window are only accelerated by a single GPU. AMD says multi-GPU support in windowed mode is on its roadmap, but not yet ready.

Another new perk AMD has added for CrossFire X is the ability to use the Radeon HD series’ custom antialiasing filters in conjunction with the CrossFire “Super AA” mode. Super AA, for the uninitiated, is a GPU load-balancing method in which each GPU renders a different set of sub-pixel samples; those samples are then composited into a highly antialiased final image. The Super AA mode available on the Radeon HD series is a 16X mode. When combined with a wide-tent filter, Super AA can deliver what ATI classifies as “32X” AA. AMD also has an edge-detect custom filter that it claims can achieve up to “42X” AA in combination with Super AA, but that filter isn’t available in Catalyst 8.3.

Super AA plus wide-tent filter equals 32X AA

Incidentally, Catalyst 8.3 includes a number of other new features and enhancements for both single- and multi-GPU use. We’ve already covered those elsewhere, so I won’t repeat the laundry list of changes here.

The multi-GPU scaling challenge

AMD claims development on CrossFire X drivers has taken a year, and that the total effort amounts to twice that of its initial dual-GPU CrossFire development effort. In order to understand why that is, I spoke briefly with Dave Gotwalt, a 3D Architect at AMD responsible for CrossFire X driver development. Gotwalt identified several specific challenges that complicated CrossFire X development.

One of the biggest challenges, of course, is avoiding CPU bottlenecks, long the bane of multi-GPU solutions. Gotwalt offered a basic reminder that it’s easier to run into CPU limitations with a multi-GPU setup simply because multi-GPU solutions are faster overall. On top of that, he noted, multi-GPU schemes impose some CPU overhead. As a result, removing CPU bottlenecks sometimes helps more with multi-GPU performance than with one GPU.

In this context, I asked about the opportunities for multithreading the driver in order to take advantage of multiple CPU cores. Surprisingly, Gotwalt said that although AMD’s DirectX 9 driver is multithreaded, its DX10 driver is not—neither for a single GPU nor for multiples. Gotwalt explained that multithreading the driver isn’t possible in DX10 because the driver must make callbacks though the DX10 runtime to the OS kernel, and those calls must be made through the main thread. Microsoft, he said, apparently felt most DX10 applications would be multithreaded, and they didn’t want to create another thread. (What we’re finding now, however, noted Gotwalt, is that applications aren’t as multithreaded as Microsoft had anticipated.)

With that avenue unavailable to them, AMD had to focus on other areas of potential improvement for mitigating CPU bottlenecks. One of the keys Gotwalt identified is having the driver queue up several command buffers and several frames of data, in order to determine ahead of time what needs to be rendered for the next frame.

Even with such provisions in place, Windows Vista puts limitations on video drivers that sometimes prevent CrossFire X from scaling well. The OS, Gotwalt explained, controls the “flip queue” that holds upcoming frames to be displayed, and by default, the driver can only render as far as three frames ahead of the frame being displayed. Under Vista, both DX9 and DX10 allow the application to adjust this value, so that the driver could get as many as ten frames ahead if the application allowed it. The driver itself, however, has no control over this value. (Gotwalt said Microsoft built this limitation into the OS, interestingly enough, because “a certain graphics vendor—not us” was queuing up many more frames than the apps were accounting for, leading to serious mouse lag. Game developers were complaining, so Microsoft built in a limit.)

For CrossFire X, AMD currently relies solely on a method of GPU load balancing known as alternate frame rendering (AFR), in which each GPU is responsible for rendering a whole frame and frames are distributed to GPUs sequentially. Frame 0 will go to GPU 0, frame 1 to GPU 1, frame 2 to GPU 2, and so on. Because of the three-frame limit on rendering ahead, explained Gotwalt, the fourth GPU in a CrossFire X setup will have no effect in some applications. Gotwalt confirmed that AMD is working on combining split-frame rendering with AFR in order to improve scaling in such applications. He even alluded to another possible technique, but he wasn’t willing to talk about it just yet. Those methods will have to wait for a future Catalyst release.

Another performance challenge Gotwalt pointed to is one of Vista’s resource management practices. In order for an application to access a resource (such as a buffer), the application must “lock” this resource. The fastest type of lock, he said, is a lock-discard, which is useful when one doesn’t care about modifying the current contents of the resource, since a lock-discard simply allocates a new chunk of memory. This sort of lock makes sense for certain types of resources, like vertex buffers. The problem, according to Gotwalt, is that the OS’s implementation of lock-discard is expensive for small buffers. A kernel transition is involved, and the memory manager will only allow a given buffer to be renamed 64 times. After that, the DirectX runtime will require the driver to flush its command buffer, invoking a severe performance penalty. As Gotwalt put it, “We have now just serialized the whole system.” This limitation exists for both DX9 and DX10, but Gotwalt said it isn’t as evident in DX9. DirectX 10 presents more of a problem because its constant buffers are different in nature; they are smaller and can have a higher update frequency than vertex buffers.

As a result, AMD has taken over management of renaming in its drivers. Doing so isn’t a trivial task, Gotwalt pointed out, because one must avoid over-allocating memory. At present, AMD has a constant buffer renaming mechanism in place in Catalyst 8.3, but it involves some amount of manual tweaking, and new applications could potentially cause problems by exhibiting unexpected behavior. However, Gotwalt said AMD has a new, more robust solution coming soon that won’t involve so much tweaking, won’t easily be broken by new applications, and will apply to any resource that is renamed—not just constant buffers, but vertex buffers, textures, and the like.

The final issue Gotwalt described may be the thorniest one for multi-GPU rendering: the problem of persistent resources. In some cases, an application may produce a result that remains valid across several succeeding frames. Gotwalt’s example of such a resource was a shadow map. The GPU renders this map and then uses it as a reference in rendering the final frame. This sort of resource presents a problem because multiple GPUs in CrossFire X don’t share memory. As a result, he said, the driver will have to track when the map was rendered and synchronize its contents between different GPUs. Dependences must be tracked, as well, and the driver may have to replicate both a resource and anything used to create it from one GPU to the next (and the next). This, Gotwalt said, is one reason why profiled AFR ends up being superior to non-profiled AFR: the driver can turn off some of its resource tracking once the application has been profiled.

Gotwalt pointed out that “AFR-friendly” applications will simply re-render the necessary data multiple frames in a row. However, he said, the drivers must then be careful not to sync data unnecessarily when the contents of a texture have been re-rendered but haven’t changed.

Curious, I asked Gotwalt whether re-rendering was typically faster than transferring a texture from one GPU to the next. He said yes, in some applications it is, but one must be careful about it. If you’re re-rendering too many resources, you’re not really sharing the workload, and performance won’t scale. In those cases, it’s faster to copy the data from GPU to GPU. Gotwalt claimed they’d found this to be the case in DirectX 10 games, whereas DX9 games were generally better off re-rendering.

Gotwalt attributed this difference more to changes in the usage model in newer games than to the API itself. (Think about the recent proliferation of post-processing effects and motion blur.) DX10 games make more passes on the data and render to textures more, creating a “cascading of resources.” DX10’s ability to render to a buffer via stream out also allows more room for the creation of persistent resources. Obviously, this is a big problem to manage case by case, and Gotwalt admitted as much. He qualified that admission, though, by noting that AMD learns from every game it profiles and tries to incorporate what it learns into its general “compatible AFR” implementation when possible.

Clearly, AMD has put a tremendous amount of sweat and smarts into making CrossFire X work properly and into achieving reasonably good performance scaling with multiple GPUs. The obstacles Gotwalt outlined are by no means trivial, and the AMD driver team’s ability to navigate those obstacles with some success is impressive. Still, some of the challenges they face aren’t going to go away. In fact, the persistent resources problem is only growing thornier and more complex with time. This is one of the major reasons multi-GPU solutions—based on today’s GPU architectures, at least—will probably always be somewhat fragile and very much reliant on driver updates in order to deliver strong performance scaling. There’s reason for optimism here based on the good work that folks at AMD and elsewhere are putting into these problems, but also reason for caution.

Test notes

CrossFire X presents a wealth of possible test configs, but we chose a couple that we thought would be representative of common configurations. For our quad-GPU tests, we used a pair of Radeon HD 3870 X2 cards, and for three GPUs, we used a single Radeon HD 3870 X2 paired with a Radeon HD 3870. Our test motherboard, a Gigabyte GA-X38-DQ6, has two PCIe x16 slots with a full 16 lanes of PCIe 2.0 connectivity routed to each. These hardware combinations should be more or less optimal for CrossFire X in terms of interconnect bandwidth and the like, giving it plenty of opportunity for performance scaling.

Please note that we tested the single and dual-GPU Radeon configs with the Catalyst 8.2 drivers, simply because we didn’t have enough time to re-test everything with Cat 8.3. The one exception is Crysis, where we tested single- and dual-GPU Radeons with AMD’s 8.451-2-080123a drivers, which include many of the same application-specific tweaks that the final Catalyst 8.3 drivers do.

Our testing methods

As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.

Our test systems were configured like so:

Processor Core
2 Extreme X6800
2.93GHz
Core
2 Extreme X6800
2.93GHz
System
bus
1066MHz
(266MHz quad-pumped)
1066MHz
(266MHz quad-pumped)
Motherboard Gigabyte
GA-X38-DQ6
XFX
nForce 680i SLI
BIOS
revision
F7 P31
North
bridge
X38
MCH
nForce
680i SLI SPP
South
bridge
ICH9R nForce
680i SLI MCP
Chipset
drivers
INF
update 8.3.1.1009

Matrix Storage Manager 7.8

ForceWare
15.08
Memory
size
4GB
(4 DIMMs)
4GB
(4 DIMMs)
Memory
type
2
x Corsair
TWIN2X20488500C5D
DDR2 SDRAM
at 800MHz
2
x Corsair
TWIN2X20488500C5D
DDR2 SDRAM
at 800MHz
CAS
latency (CL)
4 4
RAS
to CAS delay (tRCD)
4 4
RAS
precharge (tRP)
4 4
Cycle
time (tRAS)
18 18
Command
rate
2T 2T
Audio Integrated
ICH9R/ALC889A

with RealTek 6.0.1.5497 drivers

Integrated
nForce 680i SLI/ALC850

with RealTek 6.0.1.5497 drivers

Graphics Diamond Radeon HD
3850 512MB PCIe

with Catalyst 8.2 drivers

Dual
GeForce
8800 GT 512MB PCIe

with ForceWare 169.28 drivers

Dual Radeon HD
3850 512MB PCIe

with Catalyst 8.2 drivers

Dual
Palit GeForce
9600 GT 512MB PCIe

with ForceWare 174.12 drivers


Radeon HD 3870 512MB PCIe

with Catalyst 8.2 drivers

Dual

Radeon HD 3870 512MB PCIe

with Catalyst 8.2 drivers



Radeon HD 3870 X2 1GB PCIe

with Catalyst 8.2 drivers


Dual Radeon HD 3870 X2 1GB PCIe

with Catalyst 8.3 drivers



Radeon HD 3870 X2 1GB PCIe
+

Radeon HD 3870 512MB PCIe

with Catalyst 8.3 drivers

Palit
GeForce
9600 GT 512MB PCIe

with ForceWare 174.12 drivers

GeForce
8800 GT 512MB PCIe

with ForceWare 169.28 drivers

EVGA
GeForce 8800 GTS 512MB PCIe

with ForceWare 169.28 drivers

GeForce
8800 Ultra 768MB PCIe

with ForceWare 169.28 drivers

Hard
drive
WD
Caviar SE16 320GB SATA
OS Windows
Vista Ultimate
x86 Edition
OS
updates
KB936710, KB938194, KB938979,
KB940105, KB945149,
DirectX November 2007 Update

Thanks to Corsair for providing us with memory for our testing. Their quality, service, and support are easily superior to no-name DIMMs.

Our test systems were powered by PC Power & Cooling Silencer 750W power supply units. The Silencer 750W was a runaway Editor’s Choice winner in our epic 11-way power supply roundup, so it seemed like a fitting choice for our test rigs. Thanks to OCZ for providing these units for our use in testing.

Unless otherwise specified, image quality settings for the graphics cards were left at the control panel defaults. Vertical refresh sync (vsync) was disabled for all tests.

We used the following versions of our test applications:

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Solving for X

As you may know, the RV670 GPU that powers the Radeon HD 3800 series of graphics cards is a pretty solid mid-range graphics processor, but it’s no match for Nvidia’s current higher-end GPUs. The contest is close enough, though, that stacking up three or four RV670s makes for a very potent graphics solution. Here’s how, in theory, our three- and four-way CrossFire X setups compare to most of today’s single-GPU setups—and with two- and three-way SLI setups involving Nvidia’s fastest card, the GeForce 8800 Ultra.

Peak
pixel
fill rate
(Gpixels/s)

Peak bilinear

texel
filtering
rate
(Gtexels/s)


Peak bilinear

FP16 texel
filtering
rate
(Gtexels/s)


Peak
memory
bandwidth
(GB/s)

Peak
shader
arithmetic
(GFLOPS)

GeForce 8800 GT 9.6 33.6 16.8 57.6 504
GeForce 8800 GTS

10.0 12.0 12.0 64.0 346
GeForce 8800 GTS 512 10.4 41.6 20.8 62.1 624

GeForce 8800 GTX

13.8 18.4 18.4 86.4 518
GeForce 8800 Ultra

14.7 19.6 19.6 103.7 576
GeForce 8800 Ultra SLI (x2) 29.4 39.2 39.2 207.4 1152
GeForce 8800 Ultra SLI (x3) 44.1 58.8 58.8 311.0 1728
Radeon HD 2900 XT

11.9 11.9 11.9 105.6 475
Radeon HD 3850 10.7 10.7 10.7 53.1 429
Radeon HD 3870 12.4 12.4 12.4 72.0 496
Radeon HD 3870 X2

26.4 26.4 26.4 115.2 1056
Radeon HD 3870 X2 + 3870 (x3)

37.2 37.2 37.2 172.8 1488
Radeon HD 3870 X2 CrossFire
(x4)

52.8 52.8 52.8 230.4 2112

Of course, simply adding together the peak theoretical capabilities of multiple GPUs, as we’ve done here, doesn’t account for any of the multi-GPU performance scaling issues we’re discussed. But it does give us a sense of where things stand. On this basis, our four-way CrossFire X rig leads all contenders in terms of pixel fill rate and shader arithmetic capacity. Staggeringly, the four-GPU config peaks at over 2.1 teraflops of shader power. Not bad for under a grand! These shader arithmetic numbers are all the more impressive because there’s an argument to be made that the GeForce FLOPS numbers you see above may be inflated by a third, depending on how the GPU is being used.

Overall, the three-way CrossFire X solution matches up well against two GeForce 8800 Ultras in SLI and against (if you do the math) a pair of GeForce 8800 GT cards in SLI, as well.

We can test these theoretical capacities with some precision using synthetic benchmarks. These aren’t a measure of real-world performance, but they do test something close to the actual peak throughput the hardware can achieve.

Things work out about as expected in terms of finishing order for pixel and texel fill rates. We know from history that this pixel fill rate test tends to be limited more by memory bandwidth than by raw GPU pixel output capacity, but the four-way CrossFire X setup manages to outdo the three-way SLI system despite having less peak memory bandwidth. In the multitextured fill rate test, the GPUs reach closer to their theoretical peaks, which is good news for CrossFire X. One surprise of sorts, if you weren’t watching for it, is the multitexturing performance of dual GeForce 8800 GTs in SLI. The G92 GPU has incredible texture filtering prowess with the most commonly used texture formats, although it’s only half as fast with FP16 textures.

CrossFire X largely dominates 3DMark’s simple pixel and vertex shader tests. Let’s see whether it can do the same in real games.

Call of Duty 4: Modern Warfare

We tested Call of Duty 4 by recording a custom demo of a multiplayer gaming session and playing it back using the game’s timedemo capability. Since these are high-end graphics configs we’re testing, we enabled 4X antialiasing and 16X anisotropic filtering and turned up the game’s texture and image quality settings to their limits.

We’ve chosen to test at 1680×1050, 1920×1200, and 2560×1600—resolutions of roughly two, three, and four megapixels—to see how performance scales. I’ve also tested at 1280×1024 with the lower-end graphics cards, since some of them struggled to deliver completely fluid rate rates at 1680×1050.

CrossFire X performance scales quite nicely to three GPUs in CoD4—almost linearly from one GPU to two to three, in fact. That’s sufficient for three 3870s to outperform two GeForce 8800 GTs. With four GPUs, scaling isn’t as impressive, and it’s not quite enough to allow CrossFire X to overcome a pair of GeForce 8880 Ultras.

Half-Life 2: Episode Two

We used a custom-recorded timedemo for this game, as well. We tested Episode Two with the in-game image quality options cranked, with 4X AA and 16 anisotropic filtering. HDR lighting and motion blur were both enabled.

Episode Two is a very different scaling story for CrossFire X and three-way SLI. Adding a third GeForce 8800 Ultra is a performance detriment, while going to three and even four GPUs benefits the Radeon HD 3870. Here’s the tough question, though: why? Two GeForce 8800 Ultras are faster than four Radeon HD 3870s. The fact that each 3870 GPU is slower means there’s more potential headroom for CrossFire X to scale. That said, the results at lower resolutions seem to indicate AMD is doing a better job than Nvidia of managing three (and four) GPUs without running into CPU bottlenecks or the like.

In this game, we should note, CrossFire X is only needed at 2560×1600 resolution. Below that, at 1920×1200, quite a few single- and dual-GPU solutions are plenty fast. Also, notice that three-way CrossFire X is only a hair faster than two GeForce 9600 GTs in SLI—a much less expensive option.

Crysis

I was a little dubious about the GPU benchmark Crytek supplies with Crysis after our experiences with it when testing three-way SLI. The scripted benchmark does a flyover that covers a lot of ground quickly and appears to stream in lots of data in a short period, possibly making it I/O bound—so I decided to see what I could learn by testing with FRAPS instead. I chose to test in the “Recovery” level, early in the game, using our standard FRAPS testing procedure (five sessions of 60 seconds each). The area where I tested included some forest, a village, a roadside, and some water—a good mix of the game’s usual environments.

Due to the fact that FRAPS testing is a time-intensive endeavor, I’ve tested the lower-end graphics cards at 1680×1050 and the higher end cards at 1920×1200, with CrossFire X included in each group.

This is one of those applications where CrossFire X can only make use of three GPUs due to limits on how many frames the driver can render ahead. As a result, four-way CrossFire X performs the same or apparently slightly slower in Crysis. Of course, since we’re playing through the game manually, some variance in the scores is likely. I’d say CrossFire X four-way performs essentially the same as three-way.

Also, CrossFire X is of no benefit in Crysis at 1680×1050 resolution with these quality settings. At 1920×1200, adding a third Radeon HD 3870 GPU does raise average frame rates slightly, but the median low frame rate doesn’t budge. My seat-of-the-pants impression is similar: the game doesn’t play any better with a third GPU.

In order to better tease out the differences between two, three, and four GPUs, I cranked up Crysis to its “very high” quality settings and turned on 4X antialiasing.

None of the graphics solutions produce truly playable performance, but we do see a clear difference between two and three Radeon HD 3870s. Note that, although CrossFire X manages higher average frame rates than two 8800 Ultras, its frame rate minimums are lower. The reality here is that, for practical purposes, having more than two GPUs is no help in Crysis right now.

Unreal Tournament 3

We tested UT3 by playing a deathmatch against some bots and recording frame rates during 60-second gameplay sessions using FRAPS. This method has the advantage of duplicating real gameplay, but it comes at the expense of precise repeatability. We believe five sample sessions are sufficient to get reasonably consistent and trustworthy results. In addition to average frame rates, we’ve included the low frames rates, because those tend to reflect the user experience in performance-critical situations. In order to diminish the effect of outliers, we’ve reported the median of the five low frame rates we encountered.

Because UT3 doesn’t natively support multisampled antialiasing, we tested without AA. Instead, we just cranked up the resolution to 2560×1600 and turned up the game’s quality sliders to the max. I also disabled the game’s frame rate cap before testing.

Here’s another case where CrossFire X scales up to three GPUs better than Nvidia does, and this time, it’s enough to put the Radeons over the top. Adding a fourth GPU is no help, and it even seems to hurt performance. In fact, this may be one case where rendering too far ahead causes problems. Four-way CrossFire X didn’t seem to play UT3 particularly smoothly, and I had trouble (more than usual) with placing shock rifle shots, too. Whatever the cause, I was able to be more accurate with three or fewer GPUs.

Power consumption

We measured total system power consumption at the wall socket using an Extech power analyzer model 380803. The monitor was plugged into a separate outlet, so its power draw was not part of our measurement. The cards were plugged into a motherboard on an open test bench.

The idle measurements were taken at the Windows Vista desktop with the Aero theme enabled. The cards were tested under load running UT3 at 2560×1600 resolution, using the same settings we did for performance testing.

Note that the SLI configs were, by necessity, tested on a different motherboard, as noted in our testing methods section.

Amazingly, our three-way CrossFire X system draws only a few more watts than the same system equipped with a single GeForce 8800 Ultra. Very nice. The RV670 GPU is very easy on the watt meter at idle, so even putting four of them in a system isn’t too bad.

When running a game, there’s no escaping the fact that having three or four GPUs onboard will make a PC pull quite a bit of juice—over 500W in the case of our four-way CrossFire X rig—but it still draws less power than our GeForce 8800 Ultra SLI system.

Noise levels

We measured noise levels on our test systems, sitting on an open test bench, using an Extech model 407727 digital sound level meter. The meter was mounted on a tripod approximately 12″ from the test system at a height even with the top of the video card. We used the OSHA-standard weighting and speed for these measurements.

You can think of these noise level measurements much like our system power consumption tests, because the entire systems’ noise levels were measured, including the stock Intel cooler we used to cool the CPU. Of course, noise levels will vary greatly in the real world along with the acoustic properties of the PC enclosure used, whether the enclosure provides adequate cooling to avoid a card’s highest fan speeds, placement of the enclosure in the room, and a whole range of other variables. These results should give a reasonably good picture of comparative fan noise, though.

Unfortunately—or, rather, quite fortunately—I wasn’t able to reliably measure noise levels for most of these systems at idle. Our test systems keep getting quieter with the addition of new power supply units and new motherboards with passive cooling and the like, as do the video cards themselves. I decided this time around that our test rigs at idle are too close to the sensitivity floor for our sound level meter, so I only measured noise levels under load.

I should mention, though, that our CrossFire X systems were much quieter at idle than the three-way SLI system, whose 1200W power supply generated quite a bit of fan noise. The CrossFire X rigs remained under the ~40 dB level at idle.

The Radeon HD 3870 X2 isn’t horribly loud while gaming, but it is noticeably louder than most other cards these days. When you add another video card to the mix, especially a second 3870 X2, the system becomes even noisier, since two closely situated video cards are both working to expel heat.

Personally, I’m not particularly bothered by the noise levels of the CrossFire X cards’ coolers while gaming. They’re not noisy enough to become a nuisance. Since CrossFire X is fairly quiet at idle, I could probably live with it.

GPU temperatures

Per your requests, I’ve added GPU temperature readings to our results. I captured these using AMD’s Catalyst Control Center and Nvidia’s nTune Monitor, so we’re basically relying on the cards to report their temperatures properly. In the case of multi-GPU configs, well, I only got one number out of CCC. I used the highest of the numbers from the Nvidia monitoring app. These temperatures were recorded while running UT3 in a window.

I’m a little worried about the temperatures we saw out of our single and dual Radeon HD 3870 cards, but the X2 seems to keep GPU temps more in check. As a result, the temperatures we saw reported with our CrossFire X systems were par for the course for today’s GPUs.

A look at 32X SuperAA antialiasing

Given the opportunity to play with 32X antialiasing, I had to take it for a spin. Here’s a quick look at CrossFire X’s 32X AA mode, which combines 16X Super AA load balancing with a wide-tent filter.

Incidentally, I decided to use Half-Life 2: Episode Two for this excursion because AMD’s custom AA filters don’t yet work with DirectX 10 games, even with a single GPU. That’s a pretty major limitation, so keep it in mind.

As you may be aware, the Radeon HD series’ narrow- and wide-tent filters grab subpixel samples from adjacent pixels and then use a weighted average to blend all available samples into a final color. Samples taken farther from the pixel center are weighted more lightly, but you’ll still see a slight blurring effect with these tent filters. We’ve found them to be pretty effective in the past without causing excessive blurriness, but, well, have a look at this:

4X MSAA 6X

4X MSAA + Narrow tent)

8X
(4X MSAA + Wide tent)
32X
(16X Super AA + Wide tent)

Super AA plus the wide-tent filter banishes the jaggies on the tree trunk, branches, vegetation, everything, but it does so at the expense of sharpness on object silhouettes and texture clarity. I don’t mind so much the effect of the narrow-tent filter with 6X AA, but even that produces softer edges than I’d like. Everything on screen looks oddly cartoonish with the blurring produced by the 32X mode—not good.

On top of that, divvying up the antialiasing work remains a poor method of multi-GPU load balancing. With the 32X mode enabled, I saw frame rates of about 14 FPS in Episode Two using our four-way CrossFire X rig. 4X multisampling plus the wide-tent filter ran at 58 FPS, and just 4X AA alone ran at over 70 FPS.

Conclusions

I’m not entirely sure what to make of CrossFire X. On the one hand, AMD has managed to coax some pretty solid performance scaling out of three GPUs—and, in some cases, even four. There’s still opportunity for additional performance enhancements, too, once AMD releases drivers that incorporate the upcoming changes Dave Gotwalt described to us. Not only that, but CrossFire X doesn’t really fall into the “mega-extreme” category reserved for energy drinks, Intel’s Skulltrail, and Nvidia’s three-way SLI. Even with four GPUs, CrossFire X doesn’t have outsized power supply requirements, and it doesn’t draw as much power, produce as much heat, or create as much noise as three-way SLI.

Nor does it cost as much money, because the raw ingredients are cheaper. Right now, you can buy a Radeon HD 3870 X2 online for about $430. Better yet, board makers are now selling higher clocked Radeon HD 3850 512MB cards that perform similarly to the 3870 for as little as $169. Collecting the right mix of video cards to enable three-way or four-way CrossFire X won’t be cheap, but it should be quite a bit more affordable than two or three GeForce 8800 Ultras.

All of these things are good, and CrossFire X’s seamless multi-monitor support is a wonderful thing to have, too.

On the other hand, CrossFire X has the same disadvantages as any current multi-GPU scheme, and those are often multiplied by the presence of three or four GPUs. We’ve talked about the performance scaling challenges involved. The games we tested tended to scale well to at least three GPUs, but not every game will have been profiled by AMD. Too often, brand-new games aren’t properly supported for a period of time after their release—the period during which I’d like to be playing them.

On top of that, we can’t ignore practical questions about the utility of CrossFire X. If you’re really looking to build a three- or four-GPU system, you’d better be planning to connect it to a very high resolution display, like a 30″ widescreen LCD with 2560×1600 resolution, in order to get the most out of it. At lower resolutions, two GPUs are probably more than sufficient for the majority of today’s games. The most prominent game where they’re not, Crysis, doesn’t appear to benefit from CrossFire X at playable frame rates. We even found that two GeForce 8800 GTs in SLI outperform a three-way CrossFire X config in Half-Life 2: Episode Two. According to the cold logic of price-performance ratios, I’d have a hard time passing up a pair of GeForce 9600 GTs or 8800 GTs for a three-way CrossFire X setup, even if CrossFire X could sometimes deliver higher frame rates.

And yet, CrossFire X remains impressive in its way, as a plausible alternative to Nvidia’s pricier solutions involving two or more high-end GPUs. Over the long run, I’m not sold on the concept of lashing together multiple mid-range GPUs as a replacement for a true high-end GPU. We’ll have to see how committed AMD is to this direction. But this isn’t a bad start with more than two GPUs, all things considered.

Comments closed
    • michael_d
    • 12 years ago

    I would like to see Crysis benchmarks on “Very High” no AA and 1284 x1024 resolution

    • Bensam123
    • 12 years ago

    Does it look like pictures get blurier and blurier the more AA is added to them to the point where it looks like they’re actually bluring it, as is the recent trend in games?

    *cough* switch back to XP Scott, Vista is a sinking ship even with DX10 on board.

    • Saber Cherry
    • 12 years ago

    I always assumed DX10 would be an upgrade from DX9, as it seemed in the past that DXN+1>DXN, universally. However, now it sounds much worse from the perspective of driver and video card developers, and Vista just exacerbates the situation (ATI’s claim, not mine).

    Considering, as well, that neither I nor any gamer I know in real life would ever install Vista, I’d really like to see an updated comparison that includes XP/DX9 benches… with CrossFire-3, at least, no need to rerun everything.

    And furthermore, I would very much like to know how much subjective _[

      • flip-mode
      • 12 years ago

      Nothing is a downgrade from a 6200. /half-sarcasm

        • Meadows
        • 12 years ago

        Intel’s integrated solutions might be.

      • MrJP
      • 12 years ago

      Here’s a site that does good monitor reviews including input lag:
      ยง[<http://www.tftcentral.co.uk<]ยง.

        • Saber Cherry
        • 12 years ago

        Thanks! Quite useful… especially for avoiding Samsung’s 50ms+ 245T.

      • Usacomp2k3
      • 12 years ago

      *sigh*

    • d0g_p00p
    • 12 years ago

    Please stop calling the 3870X2 a dual core card. it’s got TWO GPU’s onboard, not a dualcore GPU.

      • Thresher
      • 12 years ago

      It has two cores, doesn’t it? The nomenclature doesn’t really distinguish between two cores on the same substrate or two cores in different packages.

    • Thresher
    • 12 years ago

    BTW, did you leave out the P35 by accident? I realize that the P35 and G35 are basically the same thing, except for integrated graphics, but more people will be putting a graphics card on a P35 than a G35.

    “955, 965, 975, G35, X38, and X48.”

      • Damage
      • 12 years ago

      Yeah, P35 should work fine.

    • Thresher
    • 12 years ago

    I have a Mac Pro with an ATI X1900XT and a PC with an 8800GTS (640).

    I have to say that the IQ on the ATI card is without a doubt better. On games that don’t need the horsepower of the 8800GT, I prefer playing them on the Mac (under Windows, of course).

    To me, the best deal available is the 3870X2. You get the IQ of ATI and speed that beats the 8800GTS (512) for only a bit more money. I doubt I would ever put two of these puppies in my computer, but it’s nice to know that I could go quad core GPU with only two cards.

    I’m pretty sure this is the way I am going to go when the gubmint sends me my free money.

    • Peldor
    • 12 years ago

    Great article, especially the first two pages. Lots more detail than I’ve seen in any other CrossfireX discussion.

    • zqw
    • 12 years ago

    Is there any product that can use 3 screens for 1 high-perf 3D viewport? (Like NVidia’s span mode.) …other than the Matrox2go stuff which is limited to 3x 1024×1280.

    • echo_seven
    • 12 years ago

    Wow, to think I’d live to see the day when Crossfire outperforms(or at least is more efficient than) SLI.

    OTOH, AMD Marketing, you think there might have been a more efficient way to show the information on the page 1 chart? ๐Ÿ™‚

    And are those GPU temperatures really correct??? 92 C? Didn’t P4’s auto-shutdown at 72 C?

      • Nitrodist
      • 12 years ago

      GPU temperatures are usually rated up the 100C.

      Even with the P4 example, it varies a lot. Throttling for the Core 2 architecture is usually in the 80’s, iirc.

      Don’t forget that it varies with every component you look at. To compare a CPU to a GPU is ridiculous.

        • echo_seven
        • 12 years ago

        oh….okay, I see. I had always thought the temperature limit was a function of silicon(that is, a material property).

          • Meadows
          • 12 years ago

          If that was the only thing that mattered, then both of those pieces of hardware would stand much more.

      • Mithent
      • 12 years ago

      I thought that Crossfire had scaled equally as well as SLI, if not better, for a while? It’s just that Radeons have generally been slower than GeForce cards.

        • echo_seven
        • 12 years ago

        I suppose, I mean, this is the first time the difference was unambiguous in AMD/ATI’s favor.

    • redpriest
    • 12 years ago

    Multimonitor = ATI ftw. Nvidia ftl here. I gave up my 8800 gtx sli setup for a single 3870×2. Haven’t regretted it since.

    • fellix
    • 12 years ago

    Now, with so much GPUs in the box while matching a wide flavours of them, it’s the right time to unleash a decent F@H accelerated client along the pure gaming. ๐Ÿ˜‰

    • scpulp
    • 12 years ago

    I’d like to point out that CrossfireX’s vaunted multi-monitor support is, at least on my system, the same multi-monitor support we got with Catalyst 8.1: up to two monitors connected to your main card.

    I was really hoping it’d changed in Catalyst 8.3, because I have a third screen and when I game I wind up having to manually enable Crossfire and lose that screen.

    • Meadows
    • 12 years ago

    I swear, the first time I skimmed the title I thought it’s “Crossfire X *[

    • Jive
    • 12 years ago

    I would take the look of the 4xMSAA over the cartoonish feel of the rest of the AA modes *[

    • BobbinThreadbare
    • 12 years ago

    It’s a shame that the one game really needing more performance (Crysis), didn’t get much help.

      • eitje
      • 12 years ago

      to me, that indicates more and more that the crysis team did something really, really wrong… to have missed even their high-end audience by so much.

      • Jeffery
      • 12 years ago

      Wonderful article!

      Just a quick comment: It seems that the games that scale the best under crossfire X are the games that don’t really need that 3rd or 4th GPU in the first place (i.e. COD, HL2, etc… where 1 or 2 cards are already pumping out frame rates in excess of 60 FPS, entering the realm of overkill).

    • charged3800z24
    • 12 years ago

    That review was up fast. I was just reading on here looking for new info.. I went to the rest room and BOOM… Phenom B3 info and CrossFire-X. I was waiting for this review to determine weather or not to buy that diamond 3870×2 on newegg to go with my Diamond 3870. looks good. Might wait tho.. I hear prices are dropping soon.

    • flip-mode
    • 12 years ago

    Applause for page 2 Mr. Wasson. Unadulterated detail. Whew. Sounds like Vista shut a few doors and DX10 shut a few more. Pitty, damn pitty. Honestly, what the hell was MS thinking by not multithreading DX10? It seems like a common sense thing.

      • ssidbroadcast
      • 12 years ago

      Yeah now that you mention it, I’d like to see a few samples of the test suite compared against WinXP 32, DX9.

      But of course, there are only 24 hours in a day.

        • Nitrodist
        • 12 years ago

        DX9 only supports up to 3 GPUs, iirc.

      • eitje
      • 12 years ago

      agreed – i’d like to see more conversations with engineers. ๐Ÿ™‚

      • BabelHuber
      • 12 years ago

      You know, in software development things often aren’s so easy. You have a fixed amount of ressources and time with which you have to work.

      If you do one thing, you can’t do another. Most of the time, you have to drop features from your wish list.

      Hence I don’t think we can estimate the side-effects of implementing multi-threading to DX10 – at least not without insight into the MS development process.

    • pikaporeon
    • 12 years ago

    Yknow, this actually makes for some slim justification to the over-mem’d video cards.

    • JustAnEngineer
    • 12 years ago

    The fact that Crossfire X works well with Intel’s X38 chipset while SLI does not makes it easier to recommend a Radeon than a GeForce for multi-GPU configurations.

    • mattthemuppet
    • 12 years ago

    I thought the 4xMSAA looked the best of the bunch too – everything else looked like vasoline had been smeared over my monitor (ahem).

    I can’t help but think multi GPU set ups are a bit of an irrelevancy – at lower resolutions they’re generally not needed and at higher resolutions they rarely provide playable framerates where they are needed. All that for more money, more noise and higher electricity bills. Hmm.

    • Xenolith
    • 12 years ago

    The enthusiast market must die! Thank you.

      • Jigar
      • 12 years ago

      I am impressed by your guts to type this on enthusiast site ๐Ÿ˜‰

        • Xenolith
        • 12 years ago

        Takes no courage at all… especially since it was half-hearted. Although I hope even most enthusiasts can see the absurdity of 3 and 4 video card configs. Multi-core on one card I can understand, but this is getting to be a bit ridiculous.

          • Anomymous Gerbil
          • 12 years ago

          If you want performance over 3 monitors…

            • Xenolith
            • 12 years ago

            … make a 3 core card with 3 hdmi-outs.

            • Mithent
            • 12 years ago

            A 3-core card would undoubtedly just be SLI- or Crossfire-onna-stick, like all the dual-core cards.

            • willyolio
            • 12 years ago

            why, it’s so simple! i’ll just nip off to my garage, where i assemble all my own video cards and solder one together with 3 GPUs and 3 outputs… without using SLI or Crossfire to run it, of course.

            • Xenolith
            • 12 years ago

            Wow! You make your own video cards?

            • Meadows
            • 12 years ago

            He makes videocards as much as you make valid points. ๐Ÿ˜‰

            • Xenolith
            • 12 years ago

            I didn’t really make a point. If I were to make a point, it would be this: We need enthusiasts to go toward finding new efficiencies, both in power usage, and physical space. Crossfire and Sli are klugey solutions. Look at the power consumption compared to the benchmarks. I am more impressed with someone using a single 9600 GT or a 3870.

            Multiple cores on one die are coming, and they won’t be using crossfire or SLI. The energy usage of this future multi-core tech will be comparable to current single core tech. SLI and Crossfire technology will be dated and out to pasture in one or two years.

    • ssidbroadcast
    • 12 years ago

    First post. Gotta go to class! Read later!

    Edit: Right outside class. Crap, people are entering class! Wish me luck on that math test!

    Also, I liked how the Narrow Tent AA mode looked best. The Wide and Super 32 are too washed out.

      • mortifiedPenguin
      • 12 years ago

      Gotta love the dedication to reading up on new tech.

        • ssidbroadcast
        • 12 years ago

        Okay. I just got out of the test. I think I did okay because one of the questions was,

        “Scott puts 4 ATi 3870’s in his machine and achieves 65.1 frames per second on Call of Duty 4. When he took one of the 3870 gpu cards out, he got only 72% of his original performance. How many frames per second did he get on just 3 3870 gpu cards?”

        I think I got the right answer.

          • Mourmain
          • 12 years ago

          lol

Pin It on Pinterest

Share This