![]()
![]()
| Edit Reply |
|
makfu |
Okay, he clearly doesn't understand how drivers work in NT kernel OS's. There is NO context switch for low level miniport (hardware control) drivers when executing driver routines, an ISR or DPCs (the non-time critical portion of the driver interrupt processing code).
Each thread has a user and kernel stack. On the interrupt handing side, when an interrupt is trapped, the IRQL of the processor is raised and the ISR is pushed onto the kernel stack of the current executing thread on the processor for which the interrupt has affinity (the affinity can be hard strapped if so desired). The immediate time critical request is serviced at the hardware IRQL, above dispatch (such as acknowledging the device request) and then additional work is handled by creating a deferred procedure call which is placed into a per-processor queue. The DPC queue is drained at IRQL dispatch level. There are 32 IRQL's including passive, APC and dispatch, everything above dispatch is reserved for hardware interrupts at which levels, depending on the importance of the device, the ISR is registered. As each ISR services requests in the order of IRQL priority, the IRQL of the processor falls until it reaches Dispatch, at which point the DPC queue is drained. Once there are no pending DPC's, scheduling events, etc, the processor falls back to passive at which point "normal" thread execution can continue. If this sounds complicated, fear not as this all happens bazillions of times a second. In NT (of which XP and 2k3 are members), you have a whole lot of options with regards to what the driver interrupt handling code can do: you can target the DPC at another processor's DPC queue (never needed to do that myself, but M$ provided network code does this), set the priority of the DPC to high so that it is at the front of the queue, queue a system worker thread so that interrupt processing can be done at passive IRQL (the level at which normal thread scheduling occurs). The IO request side is much the same, the current executing thread's kernel stack is "reused" by having the routine pushed on to the stack of that thread (there is more to it then that, but this is getting to be a long post). So what does all this mean? One, NT's driver model is IMPLICITLY multithreaded as drivers execute in the context of an already scheduled thread. Two, it is possible to queue DPC's in a targeted fashion to leverage additional CPU's in the systems or you can queue a system worker thread (which can be scheduled on any CPU), thus allowing normal thread scheduling to continue along side driver operations. I might add, it is also possible to set interrupt affinities, though preventing certain CPU's from servicing interrupts should only be done in certain circumstances (you can read more on technet and msdn). Let me just state for the record, for all the cool sounding functionality, it is generally considered a BAD practice to schedule system work threads to perform driver functions as context switches are expensive operations. As a rule, you should ONLY do this if you need to continually poll a device or something similar. Additionally, targeting DPCs is also not something that should be done lightly and, while well documented by Mark Russinovich (and prototyped in DDK headers) is not officially supported or documented in the DDK. In the end, this is all just silliness. I like Nvidia, and think they make great hardware (I have a GF6800GT myself) but this is GF4 MX style marketing nonsense. I am no M$ fanboy, but it is generally acknowledged that NT's contextless interrupt handling is among one of its more innovative and well executed features. In the end, driver code execution (unless broken) accounts for fractional amounts of CPU time. Use sysinternals.com process explorer to view interrupt and DPC time to see what I mean. Drivers should be written to do their work as quickly and inexpensively as possible. Why Nvidia feels the need to explicitly thread (if they are even doing that) their drivers is beyond me. Frankly, it all sounds like a crock to me. |
![]()
| Edit Reply |
|
Krogoth |
WTF? This is a pure PR move, multi-threaded drivers make no freaking sense for gaming cards. It's the games that are supposed to be multi-threaded in order to reap the benefits of SMP environments.
|
![]()
| Edit Reply |
|
BigDukeSix |
This is so late 90's :) Multi-threaded drivers have been available for WinNT since the DP Oxygen family.
|
![]()
| Edit Reply |
|
Shintai |
Heh, I wrote drivers myself. This looks more like a marketing stunt than anything real. Well..both ATi and nVidia "fixes" drivers etc for benchmarks and others. So why not. They really hate one another LOL.
|
![]()
![]()
| Edit Reply |
|
Ruiner |
Well, IF it works, dual core support at the driver level would be more useful than waiting for the developers to program for it at the game level.
|
![]()
![]()
| Edit Reply |
|
UberGerbil |
Yeah, his explanation doesn't entirely make sense.
I will predict, however, that you'll be seeing a lot more blue screens, lock ups, and other crashing driver bugs for several months as they try to do this right (while releasing driver revisions where they don't). |
![]()
| Edit Reply |
|
Dauntless |
Let me see if I get this right since I'm shaky on process scheduling and threading.
Windows drivers are synchronous. That means the processes of the driver blocks (it waits) until it receives an answer from the hardware (an I/O request). What does this have to do with the drivers being in kernel mode? All process scheduling has to be done via kernel threads anyways (which is inherently in kernel mode), though it's possible to have the threads themselves running in user space via the library. Since the scheduling is done in kernel mode, and the process is synchronous, it seems to me that you want to remove as much I/O contention as possible, since otherwise the process halts and is in a waiting state until the I/O is completed in the hardware (the GPU). Wouldn't that be what multithreading is good for? To split the tasks up so that it effectively reduces the I/O wait time? Is de waal saying that there aren't tasks which can be made to run in parallel? It seems to me to be a natural fit with SLI. Have each card have its own access to each CPU. This is an even better fit with the Hypertranport system. That way I/O requests and interrupts are cut down virtually in half. |
![]()
| Edit Reply |
|
Flowboy |
He explained that drivers in Windows normally run synchronously with the applications making API calls, so that they must return an answer before the API call is complete. On top of that, Windows drivers run in kernel mode, so the OS isn't particularly amenable to multithreaded drivers. NVIDIA has apparently been working on multithreaded drivers for some time now, and they've found a way to fudge around the OS limitations.
Er, this is BS, or he's not explained it well. You've been able to create worker threads within NT drivers since, well, the first ever version of NT. Some parts of the graphics stack might need to be synchronous, but there's nothing to stop you handing off some tasks (e.g. vertex processing) to several worker threads and then just waiting for the threads to complete that task. |
|
Jazztags: (they MUST be closed) r{ red }r g{ green }g /[ italic ]/ *[ bold ]* _[ underline ]_ -[ |
That is nonsense. While the mutithreaded drivers won't see the same boost over single threaded drivers with mutithreaded games than they will with single threaded games. They'll be better off performance wise with mutithreaded drivers than they will be with single threaded drivers, single threaded games or not.