Nvidia has long promoted its PhysX game physics middleware as an example of a computing problem that benefits greatly from GPU acceleration, and a number of games over the past couple of years have featured PhysX with GPU acceleration. Those games have often included extra physics effects that, when enabled without the benefit of GPU acceleration, slow frame rates to a crawl. With the help of an Nvidia GPU, though, those effects can usually be produced at fluid frame rates.
We have noted in the past that some games implement PhysX using only a single thread, leaving additional cores and hardware threads on today's fastest CPUs sitting idle. That's true despite the fact that physics solvers are inherently parallel and are highly multithreaded by nature when executing on a GPU.
Now, David Kanter at RealWorld Technologies has added a new twist to the story by analyzing the execution of several PhysX games using Intel's VTune profiling tool. Kanter discovered that when GPU acceleration is disabled and PhysX calculations are being handled by the CPU, the vast majority of the code being executed uses x87 floating-point math instructions rather than SSE. Here's Kanter's summation of the problem with that fact:
x87 has been deprecated for many years now, with Intel and AMD recommending the much faster SSE instructions for the last 5 years. On modern CPUs, code using SSE instructions can easily run 1.5-2X faster than similar code using x87. By using x87, PhysX diminishes the performance of CPUs, calling into question the real benefits of PhysX on a GPU.
Kanter notes that there's no technical reason not to use SSE on the PC—no need for additional mathematical precision, no justifiable requirement for x87 backward compatibility among remotely modern CPUs, no apparent technical barrier whatsoever. In fact, as he points out, Nvidia has PhysX layers that run on game consoles using the PowerPC's AltiVec instructions, which are very similar to SSE. Kanter even expects using SSE would ease development: "In the case of PhysX on the CPU, there are no significant extra costs (and frankly supporting SSE is easier than x87 anyway)."
So even single-threaded PhysX code could be roughly twice as fast as it is with very little extra effort.
Between the lack of multithreading and the predominance of x87 instructions, the PC version of Nvidia's PhysX middleware would seem to be, at best, extremely poorly optimized, and at worst, made slow through willful neglect. Nvidia, of course, is free to engage in such neglect, but there are consequences to be paid for doing so. Here's how Kanter sums it up:
The bottom line is that Nvidia is free to hobble PhysX on the CPU by using single threaded x87 code if they wish. That choice, however, does not benefit developers or consumers though, and casts substantial doubts on the purported performance advantages of running PhysX on a GPU, rather than a CPU.
Indeed. The PhysX logo is intended as a selling point for games taking full advantage of Nvidia hardware, but it now may take on a stronger meaning: intentionally slow on everything else.
|1. BIF - $340||2. chasp_0 - $251||3. mbutrovich - $250|
|4. Ryu Connor - $250||5. YetAnotherGeek2 - $200||6. aeassa - $175|
|7. dashbarron - $150||8. Lucky Jack Aubrey - $100||9. Captain Ned - $100|
|10. Anonymous Gerbil - $100|
|Here are the winners of our Macrium Data Disasters contest||9|
|PC Perspective pokes and prods the Radeon Pro Duo||36|
|Microsoft finalizes closing of Lionhead Studios||15|
|AMD completes spin-off of its assembly and test operations||21|
|Deals of the week: Asus' MG278Q display for $400 and more||23|
|Phanteks wraps its Enthoo Evolv ATX case in sheets of glass||15|
|AOC Agon AG271QX is the first in a new line of gaming displays||27|
|We take a seat on Turris' VR Chair||23|
|HP's Chromebook 13 is dressed for success at $499||26|
|LOVE THIS ARTICLE. MORE OF THIS PLEASE.||+37|