When too much is not enough, now there's quad SLI
NVIDIA first demonstrated a four-GPU SLI system at this year's CES with GeForce 7800 GPUs, but the first quad-SLI rigs available to the public will be based on the 7900 series.
We were left with a number of questions after seeing quad SLI in action at CES, most of which concern exactly how it all worked. Now we have more details.
As is obvious from the pictures, quad SLI relies on a pair of PCI Express graphics cards, each of which is comprised of two circuit boards. On each circuit board is a GPU and its associated frame buffer memory. On the card's main circuit board is a custom 48-lane PCI Express switch chip that connects to each of the two GPUs (16 lanes each) and to the PCI-E slot (the final 16 lanes.) The card also has an SLI link that connects its two GPUs to one another. Each GPU has a total of two SLI interconnect interfaces on it, though, so that's not the whole story. A quad SLI subsystem will also make use of two card-to-card SLI bridge connections, one linking PCB 1 from card A to PCB 1 of card B, and the other connecting PBC 2 from card A to PCB 2 of card B. Once all of the connections are made, each GPU will talk to two other GPUs in a quad-SLI configuration, resulting in a ring topology that looks something like this:
I wasn't aware of the presence of dual SLI links on NVIDIA's GPUs before now. Turns out those links share pins with the chip's interfaces for external TDMS transmitters. That's pretty much a non-issue for this application, particularly given the G71's incorporation of twin dual-link TMDS transmitters.
If you're familiar with dual-GPU SLI, you already know about split-frame rendering, where the screen is subdivided into two sections, each of which is rendered by one GPU. You also know about alternate-frame rendering, where even frames are rendered by one GPU and odd frames by the other. (Alternate frame rendering is the preferred method where possible, because it offers the best performance scaling and raises geometry throughput as well as pixel throughput.) The third and final general method of SLI load balancing is SLI antialiasing mode, where the two GPUs render the scene using different sample points at a sub-pixel level and the two sets of results are combined.
The quad SLI physical topology goes along with a clutch of new load-balancing modes for splitting up the rendering work. Those are: four-way split-frame rendering, four-way alternate-frame rendering, a combination of alternate-frame and split-frame rendering, and SLI antialiasing. They work about as one might expect them to work, for the most part.
If you can unravel that SLI 16X AA diagram, let me know. As I understand it, what's basically happening is that each GPU is rendering the scene with 4X multisampling, with each GPU grabbing samples at an offset from the others. Two pairs of GPUs combine their images after transmitting sample data over the SLI links between them, and then the two resulting images are combined after that. Presto: you have 16X AA. There's also a 32X SLI AA mode for quad SLI where each GPU renders the scene using NVIDIA's 8xS antialiasing method.
I have some reservations about the image quality likely to result from all of this effort. NVIDIA's antialiasing hardware has built-in limitations on sample positions that may blunt the impact of grabbing that many samples. Current dual-GPU SLI AA modes suffer from this problem. We'll have to see whether the quad SLI modes fare any better.
Four-way alternate-frame rendering has the potential to introduce some additional latency into the graphics subsystem as four frames are buffered before being sent to the screen, which would be a Very Bad Thing for an extreme gaming rig. However, NVIDIA argues latency won't be an issue so long as frame rates are high enough, because frames will be pushed out to screen before too many milliseconds have passed. If that doesn't work, there's always the option of split-frame or partial split-frame modes to keep potential delays in check.
So if latency doesn't kill it, how well will quad SLI scale? We hope to get our hands on a quad-SLI system soon in order to test it, but NVIDIA claims the graphics subsystem isn't the bottleneck: it's the CPU. Quad SLI will require more PCI Express traffic in order to maintain texture coherency when techniques like render to texture are in use, but dual 16-lane PCI-E chipsets have practically been looking for an app like this to justify their existence. Otherwise, the presence of two SLI links per GPU means that the SLI scheme ought to have sufficient bandwidth to work well enough. The main problem may be that driving four GPUs will incur additional overhead on CPUs that are already having trouble keeping up with dual-GPU graphics subsystems.
The first quad SLI cards will have their GPUs clocked at 500MHz, and each GPU will get 512MB of 600MHz memory. Clock speeds for these puppies are lower than for the 7900 GTX because we're looking at two GPUs and their memory chips sandwiched close together on two circuit boards, creating power and heat concerns. Dialing the clock speed back a little bit helps greatly on that front.
A fully configured quad-SLI rig will be a powerful thing indeed. If you like to count it this way, a quad-SLI graphics subsystem will have 96 pixel shader pipes, 32 vertex pipes, 64 ROPs, and 2GB of RAM onboard. More notably, the setup will have a total of 15.6 gigatexels/s of fill rate and 153.6GB/s of memory bandwidth. Such a system will probably require a power supply between 800 and 1000W.
Yep, a kilowatt of power. Not a typo.
How much will the privilege of spinning your power meter at this rate cost you? Probably a whole truckload of cash, because quad SLI will initially be available only through PC system builders like Voodoo and Alienware, who like to charge about the price of a Mazda for a well-equipped PC. If you do happen to have a Mazda burning a hole in your pocket, though, you should be able to order one of these things today. Quad-SLI graphics cards for us PC DIY types will be sold as separate products at a later date.
|Report: Comcast will abandon Time Warner acquisition||59|
|Friday Night Shortbread||69|
|Acer's Switch 10 is a svelte, Atom-powered convertible||19|
|Hardware makers want to standardize the stylus||46|
|Deal of the week: The M500 960GB for $290, Battlefield Hardline for $36, and lots more||22|
|Thermaltake's Pacific radiators come in all the sizes||12|
|Modders can now charge for their work on Steam Workshop||263|
|Samsung's new 840 EVO fix starts trickling out||26|
|Arkham Knight requires at least 2GB of graphics memory||116|