NVIDIA graphics drivers to go multithreaded

I spoke recently with Ben de Waal, NVIDIA’s Vice President of GPU software, and he revealed that NVIDIA has plans to produce multithreaded ForceWare graphics drivers for its GeForce graphics products. Multithreading in the video driver should allow performance increases when running 3D games and applications on dual-core CPUs and multiprocessor PCs. De Waal estimated that dual-core processors could see performance boosts somewhere between five and 30% with these drivers.

Most imminent on the horizon right now is ForceWare release 75, which will bring a number of improvements for SLI performance and 64-bit Windows, among other things, but release 75 will not be multithreaded. The next major iteration of the driver, release 80, is slated to bring support for multiple threads. We may not see this version for a few months; NVIDIA hasn’t given an exact timetable for the completion of release 80.


Out of curiosity, I asked de Waal why NVIDIA’s drivers don’t already take advantage of a second CPU. After all, the driver is a separate task from the application calling it, and Hyper-Threaded and SMP systems are rather common. He explained that drivers in Windows normally run synchronously with the applications making API calls, so that they must return an answer before the API call is complete. On top of that, Windows drivers run in kernel mode, so the OS isn’t particularly amenable to multithreaded drivers. NVIDIA has apparently been working on multithreaded drivers for some time now, and they’ve found a way to fudge around the OS limitations.

De Waal cited several opportunities for driver performance gains with multithreading. Among them: vertex processing. He noted that NVIDIA’s drivers currently do load balancing for vertex processing, offloading some work to the CPU when the GPU is busy. This sort of vertex processing load could be spun off into a separate thread and processed in parallel.

Some of the driver’s other functions don’t lend themselves so readily to parallel threading, so NVIDIA will use a combination of fully parallel threads and linear pipelining. We’ve seen the benefits of linear pipelining in our LAME audio encoding tests; this technique uses a simple buffering scheme to split work between two threads without creating the synchronization headaches of more parallel threading techniques.


Despite the apparent gains offered by multithreading, de Waal expressed some skepticism about the prospects for thread-level parallelism for CPUs. He was concerned that multithreaded games could blunt the impact of multithreaded graphics drivers, among other things.

Comments closed
    • makfu
    • 16 years ago

    Okay, he clearly doesn’t understand how drivers work in NT kernel OS’s. There is NO context switch for low level miniport (hardware control) drivers when executing driver routines, an ISR or DPCs (the non-time critical portion of the driver interrupt processing code).

    Each thread has a user and kernel stack. On the interrupt handing side, when an interrupt is trapped, the IRQL of the processor is raised and the ISR is pushed onto the kernel stack of the current executing thread on the processor for which the interrupt has affinity (the affinity can be hard strapped if so desired). The immediate time critical request is serviced at the hardware IRQL, above dispatch (such as acknowledging the device request) and then additional work is handled by creating a deferred procedure call which is placed into a per-processor queue. The DPC queue is drained at IRQL dispatch level. There are 32 IRQL’s including passive, APC and dispatch, everything above dispatch is reserved for hardware interrupts at which levels, depending on the importance of the device, the ISR is registered.

    As each ISR services requests in the order of IRQL priority, the IRQL of the processor falls until it reaches Dispatch, at which point the DPC queue is drained. Once there are no pending DPC’s, scheduling events, etc, the processor falls back to passive at which point “normal” thread execution can continue. If this sounds complicated, fear not as this all happens bazillions of times a second.

    In NT (of which XP and 2k3 are members), you have a whole lot of options with regards to what the driver interrupt handling code can do: you can target the DPC at another processor’s DPC queue (never needed to do that myself, but M$ provided network code does this), set the priority of the DPC to high so that it is at the front of the queue, queue a system worker thread so that interrupt processing can be done at passive IRQL (the level at which normal thread scheduling occurs).

    The IO request side is much the same, the current executing thread’s kernel stack is “reused” by having the routine pushed on to the stack of that thread (there is more to it then that, but this is getting to be a long post).

    So what does all this mean? One, NT’s driver model is IMPLICITLY multithreaded as drivers execute in the context of an already scheduled thread. Two, it is possible to queue DPC’s in a targeted fashion to leverage additional CPU’s in the systems or you can queue a system worker thread (which can be scheduled on any CPU), thus allowing normal thread scheduling to continue along side driver operations. I might add, it is also possible to set interrupt affinities, though preventing certain CPU’s from servicing interrupts should only be done in certain circumstances (you can read more on technet and msdn).

    Let me just state for the record, for all the cool sounding functionality, it is generally considered a BAD practice to schedule system work threads to perform driver functions as context switches are expensive operations. As a rule, you should ONLY do this if you need to continually poll a device or something similar. Additionally, targeting DPCs is also not something that should be done lightly and, while well documented by Mark Russinovich (and prototyped in DDK headers) is not officially supported or documented in the DDK.

    In the end, this is all just silliness. I like Nvidia, and think they make great hardware (I have a GF6800GT myself) but this is GF4 MX style marketing nonsense. I am no M$ fanboy, but it is generally acknowledged that NT’s contextless interrupt handling is among one of its more innovative and well executed features. In the end, driver code execution (unless broken) accounts for fractional amounts of CPU time. Use sysinternals.com process explorer to view interrupt and DPC time to see what I mean. Drivers should be written to do their work as quickly and inexpensively as possible. Why Nvidia feels the need to explicitly thread (if they are even doing that) their drivers is beyond me. Frankly, it all sounds like a crock to me.

      • WaltC
      • 16 years ago

      I agree. Prepare for the butchering of the latest buzzword: “multithreaded.” Pretty soon the word will be so abused by marketing that only those with attention spans longer than six months will be able to tell you what it actually means. It’s clear that, as usual, the PR types at nVidia aren’t concerned with reaching those who understand what the buzzwords actually mean, but are intent on reaching those who don’t…;) (Much easier to pull the wool over the eyes of blind sheep than the sheep who can see, if you catch my drift.)

    • Krogoth
    • 16 years ago

    WTF? This is a pure PR move, multi-threaded drivers make no freaking sense for gaming cards. It’s the games that are supposed to be multi-threaded in order to reap the benefits of SMP environments.

      • albundy
      • 16 years ago

      Holy Shiat! Your Right! Its the software that should be multithreaded, not just the drivers.

        • Krogoth
        • 16 years ago

        Thanks for the sarcasm, but in reality multi-threaded drivers for games are useless to 95% of gamers out there who don’t have SMP systems or a SLI setup(where it would greatly help) for that matter.

        Wake me, when SMP systems and multi-threaded apps are much more commonplace for me or mainstream driver developers to have any real interest in multi-threaded drivers.

    • BigDukeSix
    • 16 years ago

    This is so late 90’s 🙂 Multi-threaded drivers have been available for WinNT since the DP Oxygen family.

    • Ruiner
    • 16 years ago

    Well, IF it works, dual core support at the driver level would be more useful than waiting for the developers to program for it at the game level.

      • Shintai
      • 16 years ago

      It wont do anything for games actually. And 5-30%? We got more in some nVidia drivers in the past (50%). I doubt this will be any special “multithreaded” More like cheap marketing on the current hype.

      This is no different for the drivers on SMP systems for quadro cards etc you have been using last many years.

    • MadCatz
    • 16 years ago

    So will this work on hyperthreaded machines or just dual core machines?

      • Shintai
      • 16 years ago

      Any machine actually. But it should already work today and last many years if they did their job even half right.

      But hyperthreading aswell of dualcore will benefit most. Singlecore without hyperthreading wont only in some rare cases gain performance.

    • Shintai
    • 16 years ago

    Heh, I wrote drivers myself. This looks more like a marketing stunt than anything real. Well..both ATi and nVidia “fixes” drivers etc for benchmarks and others. So why not. They really hate one another LOL.

      • Ma10n3!
      • 16 years ago

      What hardware did you write drivers for?

        • Shintai
        • 16 years ago

        NICs, soundcards and HD controllers.

    • UberGerbil
    • 16 years ago

    Yeah, his explanation doesn’t entirely make sense.

    I will predict, however, that you’ll be seeing a lot more blue screens, lock ups, and other crashing driver bugs for several months as they try to do this right (while releasing driver revisions where they don’t).

    • Dauntless
    • 16 years ago

    Let me see if I get this right since I’m shaky on process scheduling and threading.

    Windows drivers are synchronous. That means the processes of the driver blocks (it waits) until it receives an answer from the hardware (an I/O request). What does this have to do with the drivers being in kernel mode? All process scheduling has to be done via kernel threads anyways (which is inherently in kernel mode), though it’s possible to have the threads themselves running in user space via the library. Since the scheduling is done in kernel mode, and the process is synchronous, it seems to me that you want to remove as much I/O contention as possible, since otherwise the process halts and is in a waiting state until the I/O is completed in the hardware (the GPU).

    Wouldn’t that be what multithreading is good for? To split the tasks up so that it effectively reduces the I/O wait time? Is de waal saying that there aren’t tasks which can be made to run in parallel? It seems to me to be a natural fit with SLI. Have each card have its own access to each CPU. This is an even better fit with the Hypertranport system. That way I/O requests and interrupts are cut down virtually in half.

    • Shining Arcanine
    • 16 years ago

    y[

      • hmmm
      • 16 years ago

      That struck me as an odd thing for him to say. The goal is higher game performance. If the game is single-threaded, then making the driver multithreaded will help to some small degree even the load between all the available processor cores. If the game is multithreaded, then all the cores are hopefully being utilized, so multithreading the driver may make very little difference, but that is hardly a bad thing. Maybe they won’t be able to say their wizzbang new drivers provide eighty bajillion more frames per second compared to the old ones, but we’ll still be getting more FPS overall because the game is using all the cores. If you want high game performance, then anything that makes better use of all the available processor cores is a good thing. Who cares if it will make the benefits of multithreaded drivers less significant? Our overall performance will go up; that’s all that matters.

    • Flowboy
    • 16 years ago

    r[

    • Ma10n3!
    • 16 years ago

    Well, release your next generation GPU already nVidia!

    Will ATI follow suit?

Pin It on Pinterest

Share This