A CPU only does work when it has something to do. Generally speaking, to have something to do it must have both a stream of instructions to run and the data those instructions need... and that's the rub. Sometimes the data is immediately available -- it's already sitting in a register, or is an immediate (constant) value in the instruction stream; more commonly, the data needs to come from memory. If the programmer is lucky or good, the data is already cached and the CPU doesn't have to go all the way out to main memory. If the data is in the L1 cache then it is available almost instantly; if it's in the L2 cache, there will be a few cycles delay; in the L3 cache, a few more. And if it has to go all the way out to main memory, it's quite a few more cycles than that. (That's what the
"Sandra Cache and Memory Latency" graph in TR's CPU reviews is showing you.) The more cycles the CPU has to wait, the less "utilized" it can be. If it is spending half the available cycles waiting on data, it's going to be "50%" busy (at best ). And that's assuming the data is already in memory. What if it's sitting out on an SSD, or even worse, a hard drive? Worse yet, what if that data is sitting somewhere out on the internet? And for web browsers, that's pretty much always the case: a web page (and the scripts and everything else associated with it) is a big wad of data and it all needs to get pulled down from a server somewhere. For a CPU that measures its cycles in nanoseconds and its bandwidth in GB/s, the multi-ms latency and low Mb/s speed of your typical net connection means a
lot of time spent twiddling its thumbs (ie either idle or doing something else; either way, the browser's utilization of the CPU is going to be far from total).
This is why the programs used to "burn" CPUs (like Prime number generators, etc) generally run entirely out of cache: the only way to really max out the CPU execution resources is not to ever have to wait on memory (let alone anything slower than memory). Even browser benchmarks run entirely locally, using data already cached in memory.
And of course a lot of programs aren't
trying to get high utilization out of the CPU. They aren't doing anything computationally intensive; in many cases, they're twiddling their thumbs waiting on input from the user. And once they've got it, they don't necessarily do a whole lot with it. Your typical IM app for example is just grabbing text from the keyboard and tossing it down the network pipe, and then retrieving a response and displaying it; most of the time it isn't doing anything but waiting for a human at one end of the conversation or the other to do something (and for a CPU that operates in nanoseconds, the seconds or minutes of human timescales makes for a
lot of waiting). There may be some fancy GUI stuff going on but it's nothing that will tax a modern CPU; even your typical word processing or email client , with all its spell-checking and whatnot, isn't really doing much by modern CPU standards.
Now, separate from all that is the scheduling that OS uses to make sure every active program gets a timeslice of the CPU. Some programs have higher priority than others, so they get relatively more timeslices (and will see higher CPU utilization) but all program threads get interrupted regularly. This was more of an issue in the era of single-core CPUs, but even now no matter how many cores you have (on a typical desktop/laptop) you probably have more than that many processes (or, more precisely, active threads). Meanwhile the OS itself also has to to use the CPU to do its own housekeeping. So in ordinary use no program can fully occupy all the cores all the time, and in practice most aren't trying to: that background spellchecking in your word processor is running on a separate thread, but it's trying to stay out of the way. Being a good citizen in a multitasking environment means leaving as much CPU (and every other resource, like memory and battery life) available for whatever else the user might be running at the same time. Of course some programs genuinely need to crank a thread hard, at least temporarily. When that photoshop filter or video encoder is churning, the OS scheduler will generally try to keep it on the same core (or cores, in the case of multiple worker threads) to maximize cache utilization, and you will see the CPU usage spike. Of course, what you're actually seeing is an
average spread out over time: certain hardware counters get updated periodically and those are averaged and reported in task manager. It would be too much overhead to update those continuously, and we poor humans couldn't make sense of information coming in every nanosecond anyway, so what you see as some percentage utilization is the average utilization over a more reasonable timespan.