THE REPONSE TO MY RECENT article on next-generation graphics chips was interesting. A number of graphics professionals wrote meeither excitedly or with great skepticismabout the prospect of GPUs (or VPUs or whatever they're calling 'em this month) rendering "production quality" graphics. Most of the skepticism seemed ill-informed to me, but a few of my correspondents pointed out a very practical problem with rendering high-quality graphics in real time (or nearly so) on a graphics chip: getting those rendered frames back from the video card and into main memory or stored on disk.
You see, today's graphics subsystems have gobs of bandwidth, from main memory (2.1GB/s or greater) to the AGP bus (1GB/s for AGP 4X) to the graphics card's own internal memory (20GB/s in some cases). Graphics cards can render hundreds of frames per second for display. But once the frames have been sent out the RAMDAC to a monitor, standard operating procedure is simply to discard them.
That's all well and good when all you want to do is play video games, but for other uses, it's a real problem. Now that graphics chips are getting the internal precision to produce some truly stunning images, we'll want to capture that high-quality output and store it. Unfortunately, even the very newest cards and drivers don't seem to be up to the job. Despite over 1GB per second of potential bandwidth on the AGP bus, current cards transfer data back into main memory much, much slower than needed. At least, that was the claim of more than one person who wrote in response to my article.
One of the folks who wrote me about this problem was Mark Randall of Serious Magic, developers of PC video editing software. Mark mentioned that the folks at Serious Magic had developed benchmarks internally to test transfer rates from AGP cards to main memory. Mark and others at Serious Magic, including CTO Stephan Schaem, were kind enough to supply us with a copy of their benchmark so we could test for this problem ourselves.
In fact, they've made the test available for public download right here, so you can run it on your own system.
Here is their explanation of the benchmark, the AGP texture download problem, and the complications this problem causes:
THE SERIOUS MAGIC TEXTURE DOWNLOAD BENCHMARKNotice that they're quick to point out the problem isn't likely a hardware issue. There should be plenty of bandwidth on the AGP bus, but graphics chip makers don't seem to have written their drivers to handle transfers from AGP cards to main memory properly.
This benchmark exposes a significant issue in PC Graphics Card performance. While today's graphics cards can render images very quickly, the software drivers are painfully slow at getting rendered output back to the PC where it could be saved and put to work by users. Current generation drivers achieve only 1/100th of the theoretical download transfer speed that the hardware you've already paid for is capable of. It's remarkable that a graphics card with a video input and video recorder software can record TV-quality images to the PC HD in real-time, yet the same card can't even record it's own renderings at 1/10th this speed.
The problem isn't the hardware, it's the software drivers. In fact, the speed could be dramatically increased with revised software drivers. However, no manufacturer has presently made this aspect of driver performance a priority. The first card manufacturer to address this issue would deliver the following benefits to their users:
- Their graphics cards would become invaluable for rendering production output for TV, film and video. For typical video resolution images only about 5% of the time is spent on rendering. Why? Because now 95% of the time must be used just to transfer the rendered output back to the PC. This effectively nullifies the advantages gained from the amazing speed of the graphics card hardware.
- Users could actually record game output in real-time with little impact on game performance. After playing there could be a compressed movie of your game play saved on your HD. On a reasonably fast machine you could actually record your game play digitally to your DV camcorder as you play or even compress and burn it to a Video CD or DVD at the same time you are actually playing.
- Screen capture software that grabs motion images of user interfaces for the purposes of tutorials and training is a vital business application. Yet these useful tools are currently limited by the graphics card software driver to transferring only a few frames a second. It should be simple to record 30 frame per second output to your hard drive but this is unfortunately not possible due to the texture download issue. The first card manufacturer to revise their software drivers to address the issue will likely garner significant appeal among businesses, training and graphics professionals.
- Despite the popularity of Internet streaming, it is not currently possible to stream live output from graphics cards over the Internet. The connections, processors and codecs are all fast enough today. Sadly, all of this horsepower is being held back by one remaining weak link: the texture download speed of today's graphics card drivers.
In the first mode the benchmark renders and displays a simple image. In the second mode it renders, displays and downloads the same image to the PC. Downloading is required to use / save / stream any 3D output of the card (although this benchmark neither uses nor saves the images). The "T" key switches modes. The transfer rate and frame rate are displayed. We've intentionally kept the benchmark and the image rendered very simple (two planes sliding, one black, one white) to eliminate any variability in rendering speed (no slo-mo gunfights here, but the same results would apply). This serves to isolate the performance of the critical capability being measured by this benchmark, namely the incredibly long amount of time it now takes to actually lay your hands on the images your hyper-fast card rendered at such blinding speeds.
As you might imagine, we rounded up a number of the latest video cards from various manufacturers, slapped them on various motherboards with different chipsets, and tested away. The results were.. well, read on...