Crayon Shin Chan wrote:Surely it'd be better to keep the single threaded program (ffmpeg2theora) on the same core all the time so you don't have to waste intercore bandwidth copying/syncing L2/L1 caches. I understand video encoding doesn't benefit too much from caches because the data just flows in and isn't used again, but still, why does WIndows pass the single thread around to every CPU in my Phenom II X6? Instead, I have to set the process's affinity to run on one CPU... surely I shouldn't have to do this all the time?
Because CPU scheduling on most OSes, Windows included, is simple? It is a balancing act between the general case and the special case. For best average throughput of a process with multiple processes running, you want the process to be able to switch execution units as necessary. As you note, for a single CPU bound process on an idle system, running on the same execution unit provides the best throughput.
So, how do you figure out on the fly which case you are in? How do you tell which processes will suffer the most from waiting to run on the same core they last ran on? Windows is a desktop OS, so to ensure perceived responsiveness of the OS, it is best that a process run again as soon as possible, even if that is less efficient overall. Turning on processor affinity lets you override this behavior when you know best and allows the OS scheduler to remain simple and effective for the average case. Linux allows you, at least now, to use entirely different scheduling algorithms to fit the workload you are giving the machine, I don't know if Windows has some funky registry tuning parameters that do similar things, though I tend to doubt it.