PhysX hobbled on the CPU by x87 code

Nvidia has long promoted its PhysX game physics middleware as an example of a computing problem that benefits greatly from GPU acceleration, and a number of games over the past couple of years have featured PhysX with GPU acceleration.  Those games have often included extra physics effects that, when enabled without the benefit of GPU acceleration, slow frame rates to a crawl.  With the help of an Nvidia GPU, though, those effects can usually be produced at fluid frame rates.

We have noted in the past that some games implement PhysX using only a single thread, leaving additional cores and hardware threads on today’s fastest CPUs sitting idle. That’s true despite the fact that physics solvers are inherently parallel and are highly multithreaded by nature when executing on a GPU.

Now, David Kanter at RealWorld Technologies has added a new twist to the story by analyzing the execution of several PhysX games using Intel’s VTune profiling tool.  Kanter discovered that when GPU acceleration is disabled and PhysX calculations are being handled by the CPU, the vast majority of the code being executed uses x87 floating-point math instructions rather than SSE. Here’s Kanter’s summation of the problem with that fact:

x87 has been deprecated for many years now, with Intel and AMD recommending the much faster SSE instructions for the last 5 years. On modern CPUs, code using SSE instructions can easily run 1.5-2X faster than similar code using x87.  By using x87, PhysX diminishes the performance of CPUs, calling into question the real benefits of PhysX on a GPU.

Kanter notes that there’s no technical reason not to use SSE on the PC—no need for additional mathematical precision, no justifiable requirement for x87 backward compatibility among remotely modern CPUs, no apparent technical barrier whatsoever.  In fact, as he points out, Nvidia has PhysX layers that run on game consoles using the PowerPC’s AltiVec instructions, which are very similar to SSE.  Kanter even expects using SSE would ease development: "In the case of PhysX on the CPU, there are no significant extra costs (and frankly supporting SSE is easier than x87 anyway)."

So even single-threaded PhysX code could be roughly twice as fast as it is with very little extra effort.

Between the lack of multithreading and the predominance of x87 instructions, the PC version of Nvidia’s PhysX middleware would seem to be, at best, extremely poorly optimized, and at worst, made slow through willful neglect.  Nvidia, of course, is free to engage in such neglect, but there are consequences to be paid for doing so.  Here’s how Kanter sums it up:

The bottom line is that Nvidia is free to hobble PhysX on the CPU by using single threaded x87 code if they wish. That choice, however, does not benefit developers or consumers though, and casts substantial doubts on the purported performance advantages of running PhysX on a GPU, rather than a CPU.

Indeed.  The PhysX logo is intended as a selling point for games taking full advantage of Nvidia hardware, but it now may take on a stronger meaning: intentionally slow on everything else.

Comments closed
      • Shining Arcanine
      • 9 years ago

      The follow-up article suggests that such optimizations would change the view that CPUs are deficient, and that the view that CPUs are deficit are the direct result of a lack of such “optimizations”, but honestly, a factor of 2 speed up is not very impressive. Even if they could do a factor of 3, that will be it and there is not much more that can be done. There is no conspiracy here, but it seems like people are anxious to say that there is one. :/

    • PopcornMachine
    • 9 years ago

    I guess they need this kind of edge since they can’t make decent hardware.

    • RealPjotr
    • 9 years ago

    I try again since I only got ignorant ansewrs the first time: Does anyone know how many times faster the nVidia PhysX is on a GPU compared to the x87 code?

    We can estimate use of SSE and multi-cores to be roughly 5x-10x faster than x87 code. Does that beat GPU PhysX or is it still light years behind!?

    • Wintermane
    • 9 years ago

    Why in flaming donky dingos would anyone sane have allowed hector ruinz near a ceo seat on thier company?

    And again its just to show what it looks like its not an option ment to actauly PLAY. Why would they waste time optimizingf the code path when that would STILL result in something you can only watch and not play?

    • ronch
    • 9 years ago

    Obviously, the use of ancient x87 instructions on modern, multi-core, 128-bit SIMD CPU architectures is to make PCs with no Nvidia GPU to use x87 and make it look bad.

    • blubje
    • 9 years ago

    Wow, 2 pages of the exact same comment. I don’t disagree, but I think “the CPU might be as fast as the GPU” is a rather uneducated conclusion. It definitely depends on the application, but most physics calculations, as mimicking the real world (where interactions are local) can be parallelized very well.

      • Shining Arcanine
      • 9 years ago

      Actually, real world physics calculations cannot be parallelized well. A professor of mine that attended an 8 hour conference on the topic of physics in games summarized his experience there by saying that the conference went over how physics was supposed to be done, but then said that it was too computationally expensive, and then spent the remaining 8 hours talking about ways of cheating at doing it, so that things look real, but are not actually real.

      That is why physics calculations in games can be parallelized, because game developers cheat by ignoring how the world actually works and replacing the math used to calculate it with math that approximates it, but is much easier to calculation. It is all smoke and mirrors.

        • ronch
        • 9 years ago

        Shining Arcanine, I agree. How do you parallelize the physics of a stack of dominoes falling? How do you parallelize the movements and orbits of the planets around the solar system, where tiny variations in the axial and orbital movements of the planets are caused by other planets in the system? This is somewhat akin to the butterfly effect, which states that all things, one way or another, affect everything else. Now, not all scenarios have such dependencies or call for such high levels of precision, but it’s wrong to say that physics calculations, in general, can be highly parallelized.

          • XaiaX
          • 9 years ago

          The “three body problem” you’re sort of describing there can’t be effectively/[

            • Shining Arcanine
            • 9 years ago

            Solutions to the general 3-body problem cannot be calculated analytically, which requires that numerical methods be used to obtain approximate algorithms. Those numerical methods are inherently serial, so that they cannot be calculated using multiple cores. You could try, but all cores would effectively be computing the same thing, which is pointless.

            • designerfx
            • 9 years ago

            you just broke your own statement. if everything is serial individually, then each part can be made parallel fail.

            • Shining Arcanine
            • 9 years ago

            You misread what I said. Go back and read it again. I hope that you will realize that I said “inherently” and not “individually”, contrary to what your statement suggests you think I said.

            By the way, for anyone who does not understand what it means when I say that this calculation is inherently serial, here is a description of the calculation:

            You have 3 objects, with masses, starting positions and starting velocities. You calculate the center of gravity in the system and assume that it is constant. You then calculate the new positions and velocities of each object in the system over the interval of something like a picosecond. You then have moved one picosecond in the future and can repeat the calculation with the new starting positions and velocities. If you want to move a year into the future, you repeat this computation approximately 3.1556926e19 times.

            This is an inherently serial operation. If you can find a way to parallelize it, you will likely receive the equivalent of a noble prize.

            • Stargazer
            • 9 years ago

            q[

            • Shining Arcanine
            • 9 years ago

            In order to do that, you need to ignore distant objects in your system to break it into several subgroups, under the assumption that their positions are not significantly affected by distant objects. It is a fair assumption to make, but the result of such a calculation is always wrong unless you make careful consideration of margins of error.

            That is likely possible, but at the same time, it is very difficult to prove that your error boundaries are correct and unless they are correct, the result is useless.

            This same issue exists in molecular dynamics simulations, although in the case of molecular dynamics simulations, the results of the computation can explain experimental results of how actual molecules behave. This is because it is impossible to directly observe the molecules interacting with one another, which is unlike the n-body problem, with is directly observable via telescopes.

            • Shining Arcanine
            • 9 years ago

            <please delete>

            • Shining Arcanine
            • 9 years ago

            <Please delete>

        • derFunkenstein
        • 9 years ago

        I think at this point all that really matters is appearance of the end result. Will this approximation result in an “uncanny valley” of sorts eventually? Sure. But for now, a “best approximation” that can use multi-threaded tasks is for sure the better way to go.

        • Laykun
        • 9 years ago

        Not that you’re saying that it’s a bad thing, but computer graphics on the whole is a system of cheating to emulate reality. Raster graphics ignore how the real world works in terms of light (ray tracing is much closer) and is effectively cheating in the same way to achieve better performance.

        We don’t exactly need the most realistic true physics in games. At least not until we get realistic graphics which are true to light.

        • Meadows
        • 9 years ago

        Graphics works the same way.

          • Shining Arcanine
          • 9 years ago

          Computer graphics do not attempt to model the real world and with different types of shaders, that is more obvious than it is with others. One example are the shaders that produce the 3D cartoon look. One example of this is Battlefield Heroes. Another type of shader is for 2D cartoons like anime, which is commonly used in Japan.

            • Laykun
            • 9 years ago

            Cartoon images are composed of reflected light, meaning they are still real world items we are attempting to emulate. We use cheat by using shaders to emulate penciled character outlines. You need to reconsider your definition of realism, it does not mean flesh and bone characters.

            Also, pointing out one end of the spectrum does not negate the other end of the spectrum of ultra realistic graphics and physics. Often cartoony games are coupled with cartoony like physics.

    • zimpdagreene
    • 9 years ago

    Well we have always known that they were screwing with the software to destroy it and only make it run on there GPUs.But to fuck the code up so it runs x87. That’s a company that’s fucking up software on purpose. But I know ATI is doing something also with there crossfire configs being locked up like it is. Well I hope someone or a group will hack the hell out of it and come up with a one click solution to enabling it to work on ATI GPUs and release it to the wild freely. Have it like the old days. That will send the best statement ! Stop fucking people over and sell real hardware that’s not fucked over with shity software. Work together instead of money money money. Work that way and the money will flow.

    • provoko
    • 9 years ago

    Those bastards!

    • Chrispy_
    • 9 years ago

    I have ALWAYS interpreted Nvidia’s “TWIMTBP” branding as “We Bribed This Developer To Prioritise Nvidia’s Exclusive Optimisations Instead Of Letting Them Use Generic Code Which Would Make Us Look Bad Against The Competition”

    You’d be naive for thinking that it was ever anything else, but WBTDTPNEOIOLTUGCWWMULBATC isn’t as catchy as TWIMTBP 😉

    • swaaye
    • 9 years ago

    Not surprising. What would the motivation be for them to put resources into running their proprietary Physx API fast on x86 chips? There’s no benefit there. They clearly want it to only be usable on their GPUs because it sells their product and puts the CPU and GPU competition at a disadvantage.

    On the consoles the GPUs are useless for Physx so if NV wants to push their proprietary Physx API on them they need to optimize for the CPU.

    It’s a textbook case of why having an API under the control of a sole hardware vendor is bad. I wonder if Ageia originally wrote CPU Physx for x87. They had no reason to optimize it either.

    Aren’t we at something like 4x x87 performance with SIMD on modern x86 CPUs? P4 was particularly terrible with x87, and Core 2 and Phenom doubled the per clock SIMD performance of their predecessors.

    • anotherengineer
    • 9 years ago

    No surprise coming from Nvidia, same old classy tricks. I formatted my moms PC yesterday, I go and download the ULi driver (nvidia bought them out and shut them down) and low and behold same driver version that was out 5 yrs ago.

    Nfail lol

      • d0g_p00p
      • 9 years ago

      Oh no, you mean nVidia does not have updated drivers for a chipset that is 5 years old and no longer in production.

        • anotherengineer
        • 9 years ago

        Well they have newer drivers for the older nforce4 chipset lol

          • Meadows
          • 9 years ago

          Because they still break.

        • yuhong
        • 9 years ago

        Yea, in any case I would not consider it that much of a trick, but yea anybody remember when nVidia buying ULi causing uncertainty for ATI chipset motherboard manufacturers who was depending on their southbridge because SB450 sucked. It was right before AMD bought ATI, and ATI was promising a SB600 to fix SB450’s problems, which they did release in time for the move to AM2 right around when AMD bought ATI.

      • derFunkenstein
      • 9 years ago

      Yeah, I can’t really blame them in the case of your mom’s PC.

    • lethal
    • 9 years ago

    I wonder what kind of spin Brian_S can pull out of this =p.

    • bcronce
    • 9 years ago

    Just looking at raw GFLOPs obtained via SIMD/etc, I always claimed NVIDIA had to be making their code purposefully bad. Well, here it is.

    Really, threaded and SIMD optimized physics would be crazy easy to code compared to the type of work those engineers are use to. Hell, their graphics drivers are SIMD and thread optimized, yet a lone a hell of a lot harder to code.

    • bcronce
    • 9 years ago

    With the graphics race so close, why does NVIDIA want to push away the enthusiast market?

    • Sahrin
    • 9 years ago

    Using x87 *has* to be intentional on the part of nVidia. There is no rational reason for x87 to be used over SEE. At best, nVidia is incompetent – something that, despite the lackluster execution on drivers and GF100 lately, I think we all know is unlikely.

    The company has, however, shown absolutely no shame about intentionally hobbling performance and features to force consumers to use their products.

    PhysX could be a great product – if nVidia wanted it to be. Instead, they want it to be a marketing bullet-point. It’s not that I think nVidia is breaking some law – just violating the laws of common sense; and hurting their consumers.

    As far as I can tell, all of this goes back to nVidia’s CEO – the one arrogant enough to demand of the AMD board that he be the CEO of a merged AMD-nVidia entity.

    My dearest hope is that nVidia goes bankrupt, to serve as an abject lesson to all technology companies that arrogance and hubris will lose in the end.

      • September
      • 9 years ago

      Didn’t nVidia say they were becoming more than a graphics card company, that it was their software that was going to change everything?

        • Shining Arcanine
        • 9 years ago

        I think that they were talking about how their hardware would be designed for doing GPGPU calculations.

      • Shining Arcanine
      • 9 years ago

      Don’t you think that you are being a little irrational? As far as I know, Nvidia and AMD never discussed a possible merger, so Nvidia’s CEO never could make such a demand.

      Nvidia’s engineers are trying to optimize GPU performance. I doubt that even realized that there was something that they could enable to improve it. Not to mention that x86 is not a good ISA from the beginning. If it were a good ISA, x87 would never have been a part of it.

        • Sahrin
        • 9 years ago

        AMD contacted several companies when they were seeking merger partners in 2005-2006. nVidia was among them.

        Regarding x86; I let the results speak for themself. 90%+ of the world’s CPU’s use it. If it were so terrible, don’t you think it would’ve been abandoned by now? Compared that to Intel’s vastly superior IA-64, or Power5 x86 is struggling in the marketplace…wait, both of those have been effectively abandoned.

        The argument that “nVidia is focused on GPU” optimization only makes sense if PhysX was GPU-only. Developing a CPU application at all (and then expecting end users to use it) requires attention to detail. “Focus on another product” is not an argument in favor of ridiculously shoddy workmanship.

        Your version of events would be like Ford selling a gas-powered bicycle because they are focused on Diesel and electric products. nVidia isn’t just not investing resources into it, they are actively choosing the worst (performance-wise) and most difficult (in terms of development – both for nVidia and software developers) path possible. That’s just stupid, regardless of how you slice it.

          • Shining Arcanine
          • 9 years ago

          Less than 10% of the world’s CPUs use x86. Most CPUs use either ARM or MIPS processors.

        • flip-mode
        • 9 years ago

        Do you ever google anything? Really dude. Google. It’s a search engine. Searches the internet. Type something in, get something back. Not hard.

        I wonder what would happen if you typed “amd nvidia merger”. Dunno. Perhaps you’d be tempted to click the first link. Perhaps you’d read the article that first link delivered to you. Perhaps that article would say something like this:

        q[

          • MadManOriginal
          • 9 years ago

          If that quote is accurate at least it gives AMD fans one good thing to say about Mr Ruiz.

    • bill94el
    • 9 years ago

    Entitlement…that’s the word you’re looking for

    • RealPjotr
    • 9 years ago

    So with 2x performance from SSE, 4x performance on a regular quad if multi-threaded… that’s 8x performance over existing x87 implementation. What is the nVidia GPU performance compared to their x87 code? The article only says “x87 crawls”.

    • dpaus
    • 9 years ago

    I suspect this has more to do with Intel’s on-going pot-shots at nVidia (in response to nVidia’s nagging, yapping-dog heel-nipping on Intel) than anything else. Having said that, who in nVidia thought they could get away with such a crude hack not being eventually discovered – and coming home to roost?

    • mczak
    • 9 years ago

    Only 2x improvement for SSE?
    I actually wonder if that’s not underestimated (if the code would be well optimized), even in practice. Now David Kanter knows his stuff but this still looks a little low to me (if performance is indeed dominated by the x87 instructions).
    On older CPUs (prior to Core2 or Phenom) the SSE/SSE2 instructions actually only ran with 64bit internal width – hence with with single precision the theoretical max throughput was only twice that of x87, and with double precision actually the same (it was still faster than x87 even in this case but that gets a bit complicated to explain).
    But todays cpus are indeed 4x faster in theory with SP and 2x faster with DP (and the article mentions that). It is quite possible real world gains are smaller, but with a problem which can be made to run so easy in parallel I’m left wondering if real world gains couldn’t be close to the theoretical max. Of course performance could be limited by other things (like memory bandwidth) but I doubt it.
    Oh and btw there’s IMHO absolutely no reason to believe the x87 code is optimized neither.

      • Shining Arcanine
      • 9 years ago

      As long as Nvidia is not shipping debug binaries, the x87 should be “optimized”.

        • mczak
        • 9 years ago

        Well, I’m more thinking about algorithimic optimization here. There’s simply no reason to believe nvidia paid attention to writing efficient code.

    • Bensam123
    • 9 years ago

    People had to have seen this coming a mile away. Nvidia wouldn’t just buy up PhysX without destroying the one thing that would actually make people want to adopt it.

    Ageia was all about proving how much better their hardware was at doing what it does best then a CPU, they made it so it would run well regardless, but run that much better with it. One can speculate Nvidia acquired PhysX merely to trump AMD and had little to do with developing physics. I’m sure NVidia doesn’t give a crap if physics die off as long as they can plaster the banner over whatever they want.

    This is really quite sad as physics have sooooo much to offer games in terms of game mechanics, fidelity, immersion, and just all around fun. The possibilities are endless and it opens up sooo many different corridors. Game developers are partly at fault for not picking it up, but people keep buying their shit and they keep getting paid so there really is no reason to improve when they can just do ‘good enough’, which has been relegated to console level crap.

    It’s too bad that games are so expensive to make and so impossible to plow into with a entrepreneur spirit. The industry saps everything from you before you get to the point where you can actually make what you want unless you start with a fortune to fund your own work.

    Honestly someone should take this a bit further. I would wager to say Nvidia is purposely crippling PhysX. This could be easily determined by taking a older version of the driver and running it compared to the driver today. IE a version before NVidia aquired PhysX.

    As it is right now reminds me of the specialty branded ‘X-Fi’ sound cards with memory on them vs the ones without, which of course killed off any hope that it would actually be used.

    It would be nice to see what PPUs do in terms of acceleration now too compared to back then. A good longitudinal study is in order here.

    • Hattig
    • 9 years ago

    More nVidia cheating. Surprise! Not.

    • Wintermane
    • 9 years ago

    The function of the cpu mode is simply to let non nvidia card owners see what it looks like so they can decide if they give a rats arse.

    On the consoles however it is entirely different.

    • flip-mode
    • 9 years ago

    I’ll simply stick with ATI until Nvidia gets out of the gutter. Period. I’m not going to support that schit. Nvidia wil /[

    • DrDillyBar
    • 9 years ago

    Title = … sure.

    • DeadOfKnight
    • 9 years ago

    What if I bought one of those AGEIA cards to go with my ATI? Would that work?

      • Goty
      • 9 years ago

      No, NVIDIA has dropped all support for the original dedicated PhysX cards.

    • Richie_G
    • 9 years ago

    It seems Nvidia are very Creative. Can’t say I’m surprised though.

      • ssidbroadcast
      • 9 years ago

      Ha, good one. That was very… Creative of you.

        • Meadows
        • 9 years ago

        And /[

          • ssidbroadcast
          • 9 years ago

          You’re just in AWEq[<32<]q of our word play.

            • 5150
            • 9 years ago

            I’m going to cause a Fatal1ty.

      • adisor19
      • 9 years ago

      You forgot the *puts on sunglasses* part 😀

      Adi

        • jackaroon
        • 9 years ago

        I wish we’d all forgot the *put on sunglasses* part

      • Krogoth
      • 9 years ago

      YEEEEAHHHHHHHHHHHH!

    • Gnerma
    • 9 years ago

    Would SSE code run on the (still supported as far as I know) dedicated PhysX cards?

      • balzi
      • 9 years ago

      good point. but I would’ve thought that the Engine would have a seperate code path for dedicated Physx cards, much like it should have a seperate code path for graphics cards.
      actually “code paths” is probably the wrong term. Maybe library, DLL, something like that. What I mean is that the code using x87 is unlikely to be the exact binary code that is expected to run the data thru a Physx card!

      • grantmeaname
      • 9 years ago

      nope, because the dedicated physx cards aren’t CPUs.

      • JustAnEngineer
      • 9 years ago

      NVidia killed support for the Ageia PhysX cards this year.

        • BobbinThreadbare
        • 9 years ago

        That’s really hitting your own customers where it hurts.

    • can-a-tuna
    • 9 years ago

    Why won’t physX just die already.

      • Krogoth
      • 9 years ago

      It was stillborn.

      At least Glide had some life in it before OGL and D3D caught up and overtook it.

        • Goty
        • 9 years ago

        Oh man, I remember playing Tribes in Glide… Good times.

          • juampa_valve_rde
          • 9 years ago

          Glide felt lotta more eficient than direct3d and opengl, games with glide run smooth and nice. 10 years or more from that haha

      • shank15217
      • 9 years ago

      Seriously, there is nothing wrong with the standard…

      • l33t-g4m3r
      • 9 years ago

      Want the truth? It because they pay off the devs, and force them to use physx in the contracts.
      That’s how the TWIMTPB program works, and this shady behavior needs to be stopped.

      Not that we even need physx. Seriously!
      Look at some of the older games before phsyx came around:
      Half-Life2, Farcry, Painkiller, etc.
      We’re being forced to buy something that we’ve previously had and taken for granted.

      • pogsnet
      • 9 years ago
    • Krogoth
    • 9 years ago

    PhysX is a sinking ship.

      • Meadows
      • 9 years ago

      You made me giggle. I guess it’s because I can’t know if you made that typo on purpose or not.

      g{

        • balzi
        • 9 years ago

        and I got a free giggle from it, because I didn’t notice the typo until you pointed it out… so thanks! 🙂

    • Anonymous Coward
    • 9 years ago

    That company just can’t be trusted.

    • FireGryphon
    • 9 years ago

    I’m kind of surprised that some investigative reporter didn’t realize this when PhysX rendering came about.

      • BobbinThreadbare
      • 9 years ago

      Based on those numbers, if you had an 8 core processor, and got linear scaling, CPU physics could be faster.

        • Meadows
        • 9 years ago

        I was basing my estimations on an average quad CPU – considering that SSE “could be twice as fast” and then considering “times four” for the cores, we should see an eight-fold decrease in burden.

        Since the framerate _[

          • Anonymous Coward
          • 9 years ago

          It would be awesome if, when the GPU is enabled for PhysX, nVidia switched to lots of threads with full SEE optimization, and didn’t have the GPU do anything. In some cases, that it probably the best way.

          • BobbinThreadbare
          • 9 years ago

          They didn’t make the LOOK worse, they made it actually worse.

    • YeuEmMaiMai
    • 9 years ago

    Nvidia trying to make themselves look good by purposely holding back physx on non nVidia hardware? who woulda thunk it?

      • d0g_p00p
      • 9 years ago

      To be fair it’s a nVidia feature for their video cards. ATi does not support it anyway and no way will nVidia port it over to their architecture. Crippling it to not use Intels SSE instruction is way different.

Pin It on Pinterest

Share This