The PhysX card itself looks an awful lot like a video card, with its centrally located cooler and four-pin Molex aux-power connector, but don’t be fooled. This card’s metal slot cover is devoid of outputs, and the golden fingers extending from the board are intended to slip into a humble 32-bit PCI slot. This card is made for crunching numbers, not driving a display.
A pair of Ageia partners, Asus and BFG Tech, have brought PhysX cards to market. The board you see above is the BFG Tech version, and it comes with 128MB of Samsung GDDR3 memory chips attached. These chips run at an effective data rate of 733MHz on a 128-bit interface, which works out to 12 GB/s of memory bandwidth dedicated solely to physics processing.
Pop the cooler off of the card, and you’ll find the star of the show, the PhysX chip, residing below.
This custom-designed physics processor measures roughly 14 mm by 14 mm, or 196 mm2. TSMC packs about 125 million transistors into this space when it fabricates the chip using its 130 nm manufacturing process.
The itty little rectangular chip you see situated below the PhysX PPU, by the way, is not a bridge chip like you might see on some PCI cards these days. This chip comes from Texas Instruments and is used to step down the voltage coming in from the PCI bus. As a low-voltage 130 nm device, the PPU probably needs its assistance in talking to the relatively high-voltage PCI bus.
Ageia aims for physics coprocessing
Since Ageia first started making noise about accelerating gaming physics calculations in hardware, folks have been raising questions about whether such a thing makes sense. The acceleration of specific, particularly intensive computing problems using custom hardware can be a very potent thing, as the rise of custom graphics hardware in the past decade has demonstrated. CPUs are good at many things, but offloading certain jobs to custom hardware is often faster, more energy efficient, and enables more consistent performance. We now have custom chips or logic units to handle a host of specific tasks inside of a PC, including audio processing, video scaling and playback, and various types of I/O.
At first blush, physics seems like a pretty good candidate for hardware acceleration. Physics involves lots of floating-point math calculations that can potentially be processed in parallel. In games, physics processing must be fast in order to be useful; all calculations need to happen in real time, with updates coming each time the screen is redrawn. Much like graphics, physics is the sort of computing problem at which coprocessors can excel. Not only does that fact open the door for a custom physics processor, but the parallel nature of the task could help turn physics acceleration into a solid long-term chip business. The number of transistors possible on a chip should double every couple of years for as long as Moore’s Law holds out, and Ageia should be able to build ever more powerful physics processors as a result.
Sounds easy, right? Not exactly, but Ageia has taken on the twin challenges of building a custom physics processor and cultivating a healthy market for such a chip.
On the hardware end of things, the PhysX PPU is a bit of a mystery. Aside from transistor counts and the like, we know surprisingly little about it. The PPU has “many copies” of Ageia’s physics processing core onboard, but we don’t know exactly how many. Ageia is also coy about the clock speed of the PPU. Apparently they don’t want to give away the ingredients for the special sauce this early into the game. Without this information, we can’t come up with theoretical peak gigaFLOPs numbers or the like, for whatever they’re worth.
We do know a few things, though, and can speculate on some others. We know the SIMD exection cores must be geared to handle floating-point datatypes, for instance, and SIMD execution is probably narrower than on a graphics chip. Processing physics involves lots of interactions between different objects, and managing those interactions would require robust communication between the different execution cores on the chipprobably via an internal switched fabric. Ageia claims the PhysX PPU has “two terabits per second”or about 250 GB/sof bidirectional memory bandwidth internally.
On the software front, Ageia has its PhysX API that game developers can use to access a range of physics capabilities. When we think of gaming physics, the first thing that comes to mind is typically rigid-body interactionsa grenade bouncing around a corner, a race car running into a wall, etcbut rigid bodies are just part of the picture. Ageia’s software can also handle particles (for smoke, dust, etc.), fluids, hair, cloth and clothing. Virtual joints and springs give “rag doll” characters and vehicles the appropriate flexibility and resistance, and materials can be programmed with specific propertiesso that ice is slick, for instance. The PhysX software also supports collision detection for each type of object, so that different types of objects can interact realistically.
Most of these effects are familiar to gamers by now, but the distinctive thing about PhysX is that the PPU can accelerate these things in hardware. The PPU is programmable, and Ageia has chosen a subset of commonly used, performance-critical functions in the PhysX API to hand off to hardware. Over time, Ageia plans to accelerate more and more of the API’s capabilities via the PPU. In the absence of PPU hardware, the PhysX API will fall back to software processing. In fact, the PhysX software is multithreaded in order to take advantage of multi-core CPUs.
This software fallback is key to Ageia’s world domination plans. The company is licensing its entire PhysX API and software development kit, complete with tools, to PC game developers free of charge. The only catch: those games must take advantage of a PhysX PPU if present. Ageia has also shepherded the PhysX API’s migration on to next-gen game consoles. On the Xbox 360, game development houses can license the SDK for about $50,000, and it will use all three of the cores on the Xbox 360 CPU. Sony simply bought out the rights to the PhysX SDK for the PlayStation 3 so all developers can use it for free, and Sony engineers have ported the physics processing routines to the Cell processor. These efforts have made the PhysX API a reasonably complete, low-cost, cross-platform physics engine, and Ageia has had some success persuading game developers, game engine companies, and tool makers to use it.
The list of upcoming PhysX-fortified titles is long, but easily most prominent among them are Unreal Engine 3 and Unreal Tournament 2007. Unfortunately, the list of current titles with PhysX support is depressingly short. We’ll test the card with a couple of the most prominent titles shortly.
Ageia’s many challenges
Since their first press release hit the wire, naysayers have been predicting Ageia’s failureand with good reason. You may have gathered by now that Ageia is attempting to do something fundamentally difficult. They face a number of challenges, including the chicken-and-egg problem involving a dearth of PhysX support in games and the lack of an installed base of PPU hardware. I must admit that I don’t have much of a taste for all of the triumphalist doomsaying we’ve been hearing. As a PC enthusiast, I love the idea of realistic physics simulations in games, and I’m generally favorably inclined toward new types of custom chips to make it happen. One would hope the PC gaming market would attract efforts like this one and reward them if they succeed.
That said, Ageia’s prospects are undeniably cloudy. We’ve already talked about how Ageia is addressing the software development question, but we should probably consider some other dark clouds on the horizon, as well. Among them:
Physics is not graphics. Everyone loves to use the analogy of GPUs when thinking about the development of physics acceleration. It’s almost inescapable. That analogy is sometimes helpful but fundamentally flawed, kind of like the car analogies that have plagued CPU performance discussions since the dawn of time (or the 1970s, whichever came first.) There are many ways in which physics and graphics are different, but the one that matters most, I think, has to do with the way physics support can be incorporated into games.
Old-timers like me remember when the first 3D graphics cards arrived. We were able to pop in a 3dfx Voodoo card or the like and get better graphics almost instantly thanks to modified versions of existing games, like GLQuake. The image quality was higher than software acceleration, and we could run games at higher display resolutions, too. This instant gratification sparked a wave of upgrades and helped 3dfx become a household name in a matter of months. Physics, however, has no easy analog to higher display resolutions and better quality texture filtering. Gamers can’t grab a PhysX card and expect an instant payoff. We’ll have to wait for games to catch up, and that could literally take years.
Eye candy isn’t interactivity. When PhysX support does arrive in games, it will likely take the form of improved visual effects, as it does today in Ghost Recon Advanced Warfighter. When you blow stuff up, the smithereens are legion. Bits and pieces of things are flying everywhere. But none of it affects gameplay in any meaningful way, because the game’s physical world wasn’t designed with hardware-accelerated physics in mind. Nifty visual effects present lots of opportunities to game developers, but if that PhysX card is going to be worth my money, I want to feel the impact of physics acceleration. Getting game developers to change their assumptions and really take advantage of physics hardware in ways that alter gameplay will probably be extremely tough, especially since, one would presume, the software-based fallback will be much slower. Playing the same game on a non-PPU-equipped system would have to be a different experience, with fewer physical objects onscreen and fewer possibilities for interaction.
There’s no killer app. This one flows from the last two and is very simple. I’m not the first one to say it, either. The PhysX PPU needs at least one really good game to demonstrate its power and really sell people on the concept. So far, it’s not here, and I’m not even sure a strong contender for this role is imminent.
The GPU/MPU challenge. ATI and NVIDIA have already teamed up with Ageia’s rival in the physics middleware market, Havok, to get preliminary demos of a GPU-accelerated physics API up and running. The graphics guys are talking big about the number of FLOPS they can bring to physics processing and are hungry to prove GPUs can do more than push pixels. The GPU-accelerated physics API, Havok FX, is currently an eye-candy-only affair, so game-critical physics simulations must still happen on the CPU. Still, if dedicating a second or third (or fifth) graphics card to physics can achieve results similar to Ageia’s in the short term, Ageia’s life could get complicated.
On top of that, execution cores in microprocessors are multiplying like rabbits these days, with two cores set to begin giving way to four by this time next year. I’m convinced that a custom chip designed for physics could theoretically outdo a multi-core CPU and probably a GPU in terms of peak physics processing capabilities, and Ageia talks a lot about the joys of custom chips when this topic comes up. Best to leave the graphics to the GPU and the game AI to the CPU, they say. What they haven’t convinced me, however, is that a multi-core CPU or a GPUnot to mention the combination of the twoisn’t sufficient to deliver in-game physics that are incredibly realistic and compelling.
Of course, just above we were fretting that the gap between the PPU haves and have-nots might be too large, so who the heck knows?
Is PCI the short bus to physics? I mention this one because we have a few folks in our news comments who persistently mention it as a show-stopping problem. Right now, PhysX cards will only plug into a PCI slot, and ye olde PCI is known for being relatively slow. No doubt Ageia chose PCI for cogent reasons, like the fact that they started development long ago, when PCI-E was but a gleam in Intel’s eye, or that they want to sell lots of cards as upgrades for existing PCs. Still, Ageia does have plans for a PCI Express version of the PhysX card at some point in the future.
Some folks seem to think PCI is impossibly slow for a really solid PPU implementation. I’ve asked Ageia about this issue repeatedly, they insist using PCI is really not a problem. Given the fact that the cards have 128MB of fast local memory, I’m mostly inclined to believe them. Of course, we won’t really know until we have games that really stress a PhysX card’s capabilities.
Is there room in the market? So you drink Bawls soda by the gallon and split your free time between Counter-Strike tournament play, trying out the latest game demos, and mastering the nuances of Oblivion. On the side, you’ve developed an entire alternate personality in WoW. You want to prove your bona-fides by building yourself the ultimate gaming rig, but your you’re on a generous-but-strict $1200 budget. How do you choose between paying more for a dual-core CPU, ponying up for a discrete sound card, abusing the plastic for a second video card, adding a drive for RAID 0, or going for that gorgeous new 20″ LCD? Wait, now you’re supposed buy a separate card for physics, too? For almost $300?!
Even if PhysX is worthy, Ageia may find it difficult to prosper in a PC market crowded with other pricey, enthusiast-oriented goodies.
I suppose I could dream up some more potential problems for Ageia, but that about covers the major ones. I think one really good killer app that illustrates compelling potential for PhysX could cut through most of these clouds like a bolt of sunlight, but it’s not here yet.
Since we’re a PC hardware review site, I’m bound by law and social contract to test the PhysX card and make some graphs. That portion of the review follows.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.
Our test system was configured like so:
|Processor||Athlon 64 X2 4800+ 2.4GHz|
|System bus||1GHz HyperTransport|
|Motherboard||Asus A8R32-MVP Deluxe|
|North bridge||Radeon Xpress 3200|
|South bridge||ULi M1575|
|Chipset drivers||ULi Integrated 2.20|
|Memory size||2GB (2 DIMMs)|
|Memory type||Corsair CMX1024-4400 Pro
DDR SDRAM at 400 MHz
|CAS latency (CL)||2.5|
|RAS to CAS delay (tRCD)||3|
|RAS precharge (tRP)||3|
|Cycle time (tRAS)||8|
|Hard drive||Maxtor DiamondMax 10 250GB SATA 150|
|Audio||Integrated M1575/ALC880 with Realtek 5.10.00.5247 drivers|
|Graphics||ATI Radeon X1900 XTX 512MB with Catalyst 6.5 drivers|
|Physics||BFG Tech PhysX 128MB PCI with 2.4.3 FC1 drivers|
|OS||Windows XP Professional (32-bit)|
|OS updates||Service Pack 2, DirectX 9.0c update (April 2006)|
Thanks to Corsair for providing us with memory for our testing. Although these particular modules are rated for CAS 3 at 400MHz, they ran perfectly for us with 2.5-3-3-8 timings at 2.85V.
Our test systems were powered by OCZ GameXStream 700W power supply units. Thanks to OCZ for providing these units for our use in testing.
Unless otherwise specified, image quality settings for the graphics cards were left at the control panel defaults.
The test systems’ Windows desktops were set at 1280×1024 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.
We used the following versions of our test applications:
The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Ghost Recon Advanced Warfighter
GRAW is the first big game title with PhysX support, and Ageia chose to delay the launch of the first retail PhysX cards until its release. We’ve already covered the brouhaha that erupted when Havok announced that most of GRAW’s physics are done with Havok software, so we won’t rehash that one. The developers of the PC version of GRAW added extra physics-based eye candy using Ageia’s API and hardware. Here’s how it looks.
Explosions like these are one of the few places in GRAW where PhysX effects are apparent. In the first frame, an almost supernatural amount of debris comes flying out of the exploded bus terminals (or phone booths or whatever.) The volume of junk flying through the air seems a little over the top to me, personally. The effects are more believable in frames two and three, and you can see how much more debris is scattered around on the ground in the third frame with PhysX.
Notice that the volume of smoke in frame three doesn’t seem to change when a PhysX card is added. These particle effects may be accelerated by the PhysX card, but it’s not apparent.
….aaand, that’s about it, really. GRAW isn’t much of a showcase for PhysX, and having seen it, I don’t think I’ve seen much of anything that wouldn’t be possible in software alone.
PhysX effects may be rare in GRAW, but we can measure performance when they’re present. I used FRAPS to capture frame rates during a 20-second sequence where I blew up some bus terminals and a jeepthe same scenes depicted in the GRAW screenshots above. I played through the sequence the same each time, and we averaged the results from five runs. In order to diminish the effect of outliers, we’ve reported the median of the five low frame rates we encountered.
In spite the fact that we used newer versions of the PhysX drivers and GRAW than most sites did during the initial wave of PhysX reviews, we still see performance degraded with PhysX enabled. This is an extreme example because we’re only testing a worst-case scenario where lots of physics effects are present, but this sort of scene is where PhysX is supposed to shine. Sadly, that’s not what happens.
GRAW may not be much of a showcase for PhysX, but the CellFactor demo is much more exciting. This first-person shooter-style demo was commissioned by Ageia to show off the PPU’s abilities, and it has more objects bouncing around onscreen and interacting with one another than any other game I can recall. We’re talking more rigid bodies than a three-story morgue. Not only can you blow up an unprecedented number of standard-issue FPS pipes, crates, and barrels, but the players also have Star Wars “force”-style telekinetic powers. Part of the fun is sending a pile of junk careening off of a platform and raining death on the unsuspecting players below. PhysX may not have a killer app yet, but playing this demo, you’ll definitely catch a glimpse of Ageia’s vision.
Being the nosy sort, I wanted to find a way to benchmark this app with and without a PhysX card, but Ageia said it would only run on PhysX hardware. I thought I was out of luck until I saw a post in our forms describing how to get CellFactor running without a PhysX card. I tried it, and lo and behold, it worked! CellFactor would start and run, and for the most part, it was intact. The huge numbers of objects onscreen remained, and performance was pretty snappy on our Athlon 64 X2-based test system. The only things missing were the occasional cloth and fluid effects, used sparingly in the demo level to simulate a giant banner, some radioactive goo, and (of course) blood.
Here’s a quick look at what was missing and what wasn’t.
The pipes and other objects bounce around in the software mode the same as ever, but Ageia’s very realistic tearable cloth is completely missing.
Fluids are also missing in CellFactor’s software mode, but particle volumes look to be the same as with the PhysX card.
Obviously, disabling some of the effects means the software mode is less physics-intensive than CellFactor is with PhysX hardware. Still, the basic CellFactor gameplay is intact in software mode, complete with telekinetic powers and flying crates galore. The software-only version ran so well, I couldn’t help but try some benchmarking to see how it compared to the PhysX card.
I tested frame rates using FRAPS over the course of a 20-second sequence in which I blew stuff up with grenades. This test was about rigid bodies, and large numbers of ’em were onscreen during this sequence. No liquids were present onscreen during this test, and no cloth except for a few strips way in the background, and those were typically not affected by the action. As with GRAW, I tried to play through the scene the same way each time, and the results you see below are the average and median scores from five runs.
CellFactor appears to be capped at 45 FPS, by the way. The cap will no doubt keep average frame rates down.
The CPUs used to obtain the results below were all Athlon 64s with 1MB of L2 cache per core running at the speeds specified. (The 2.4GHz dual-core chip was an Athlon 64 X2 4800+, and the 2.8GHz one was an overclocked FX-62. The single-core 2.4GHz chip was an underclocked FX-57.) CellFactor was running with its default low-quality graphics settings.
The performance delta between the single-core CPU with and without PhysX is substantial, but that difference shrinks pretty dramatically when we add a second CPU core to the mix. Shockingly, the system with a 2.8GHz dual-core processor performs almost identically to the dual-core 2.4GHz system with a PhysX card.
Testing PhysX performance in this way may be mostly bogus, but these aren’t the sort of results Ageia wants to see, no doubt. In fact, Ageia seems to have been tracking various forum discussions about CellFactor software mode performance, and they’ve taken several steps to address the question.
First, they released a new version of the CellFactor demo with some changes. In the new version, cloth is enabled in software mode, despite the fact that it causes major visual artifacts and brings frame rates to a near standstill. Ageia also claims rigid-body performance and explosions are faster with this new version of Cell Factor. For what it’s worth, I tried the new R36 release of CellFactor with a PhysX card in our benchmarking sequence, and performance was essentially unchanged.
Second, Ageia’s release notes for CellFactor R36 now have a section titled “Framerate Numbers” that says the following:
It should be noted that the use of software frame grabber applications like FRAPS, even just to display FPS numbers, can slow down the game by up to 10 FPS or more, depending on the system and its settings. Due to this significant performance impact, software like this is not an accurate representation of overall game performance.
Hmm. Denigrating a particular testing method when you don’t like the results is a time-honored tradition in high-tech PR, but generally FRAPS has been one of the good guys in such debates, with 3DMark and timedemo functions getting the brunt of it. Now, the tables have turned. My, how times change!
I tried to test the impact of FRAPS on CellFactor frame rates in a very straightforward way. The CellFactor demo itself includes a metrics overlay, which you can see in my screenshots. I ran the demo with and without FRAPS and watched the demo’s built-in frame rate counter to see if things ran slower. From what I could see, FRAPS has no noticeable impact on frame rates.
Ageia has also tackled the CellFactor software mode issue by publishing an interview with the demo’s developer. The developer talks down rigid-body objects and collisions as “simple” and talks up cloth and fluids as the truly unique features in CellFactor. Here’s one key question and answer:
AGEIA: So you mentioned a “software mode” – is it possible to run CellFactor without hardware? If so, how does it run?
JS: Only if you want to miss out on a lot of what makes the game fun! I read on a forum somewhere that a player had done a command line switch and disabled support for the PhysX card. Of course he benchmarked it and it came back with a decent result. The reason for that is pretty simple – we never really intended for players to actually play the game like that, so we stripped the more advanced features out of the software mode (such as fluid and cloth); let’s not forget that AGIEA makes a very powerful physics software engine as well, so doing rigid body collisions (where boxes get tossed around) isn’t too much for the highest-end CPU’s to handle. With that said, in software mode, you’ll still notice a significant slow-down at moments of peak physics interaction on even the latest and greatest multi-core machines. That’s why we have the PhysX card listed as a requirement.
It’s good to see that Ageia’s PR folks are on the ball when it comes to this sensitive issue, but embedded in that answerand in our CellFactor software benchmark resultsis a noteworthy and inescapable truth: even the very large numbers of rigid-body collisions we see in CellFactor can be handled pretty well on a high-end, dual-core processor. The same would appear to be true for the particle effects we’ve seen in CellFactor and GRAW.
We measured total system power consumption at the wall socket using a watt meter. We plugged the monitor into a separate outlet, so its power draw was not part of our measurement. Also, we turned on the Athlon 64 X2’s Cool’n’Quiet clock throttling function for the idle tests at the Windows desktop, but disabled CnQ for the “load” tests. In order to make sure the PPU got a good workout, we took the power draw measurements under load with the system running the Cell Factor demo in its “low” graphics setting.
Well, the PhysX card is certainly consistent. Whether it’s sitting idle at the desktop or cranking away in CellFactor, it looks to add about 20 watts to the system’s total power draw. That’s not much in the grand scheme, and I can’t help but wonder whether the PhysX card wouldn’t draw more power if more fully utilized by software. The PhysX card does have an active cooling fan with variable fan speeds, and when it kicks into high gear, the fan definitely adds to overall system noise.
Clearly, Ageia faces some major hurdles in its quest to make physics processors a new must-have component of a PC gaming rig, but they’re making the logical and necessary steps to lay a foundation for success.
As for whether or not they’ll succeed, I have no idea.
Right now, it seems to me they have two very big problems. One, few games take advantage of the PhysX hardware at all, and the few that do don’t use the PPU’s processing power in a compelling way. Two, the most common physics elements in today’s games, rigid-body objects and particle effects, can be handled fairly well by high-end, dual-core CPUs, based on our CellFactor tests. That fact may narrow the PhysX cards’ primary selling points to the acceleration of things like fluids, cloth, and complex interactive particles. Ageia argues that offloading all physics to the PPU is good because it frees up CPU time, but it ain’t easy to sell folks on offloading a CPU core that would otherwise be largely idle.
Ageia may have a third big problem if it turns out that this first-gen PhysX PPU is simply a dog of a performer. We don’t have enough evidence draw that conclusion yet, but nothing I’ve seen so far convinces me this chip offers the sort of major leap in physics performance that Ageia claims for it. I remain hopeful, though, and we’ll be watching new PhysX-enabled games as they become available to see whether any of them deliver on PhysX’s promise.