flip-mode wrote:Someone will likely strongly disagree with this, but increasing thread count has sharply diminishing returns for desktop users once you hit 4 threads. Beyond 8 thread, it's a near vertical drop. I'm glad Intel (and hopefully AMD has reached this point too) isn't focusing on adding additional thread width that's going to be essentially useless.
I strongly disagree, because your claim is false. Why would you say something like that?
There are always scenarios where more stuff does not yield more quality. For example, a 400 page book is worse than a 100 page book, if the author is terrible; a 500 horsepower engine is worse than a 100 horsepower engine, if you have a tiny gas tank or a light car with slick tires. Throwing 4x as much sugar into a good cookie recipe will not magically yield better cookies. Similarly, a crappy programmer will be unable to gain any advantage from multiple threads. That does not mean there's some magical wall, such that 400 page books, 500 horsepower engines, 50% sugar recipes (like fudge), or 8-core computers fundamentally cannot surpass their smaller siblings.
I've written multithreaded code (for graph smoothing) that showed a ~7.5x speedup on a 4-core hyperthreaded computer, going from 1 thread to 8. That was anomalous (it was floating-point, and mainly I write integer code) and I'm just mentioning it to point out that hyperthreading actually can be useful; otherwise, I've never seen more than a 40% speedup from hyperthreading, and <20% is more typical for my programs. But setting hyperthreading aside, most of my programs scale at about 0.8X the number of cores, up to 80 cores (which is our biggest single node, so the largest I've been able to test). The lines are pretty flat; there's no "vertical drop" after 4. And these programs don't deal with 64k datasets that fit in a cache (which is the perfect setting for scalability) - they deal with hundreds of gigs of data being randomly accessed, which is pretty much a worst-case scenario, since adding cores does not improve the memory bottleneck.
I use other people's programs, too. And the multithreaded ones usually scale like mine do - pretty much linearly, though some start to drop off a bit around 16 cores. If programmers are forced to write scalable multithreaded code (which they are in my field), basically, the incompetent programmers drop out and the remaining people write scalable multithreaded code that is not limited by some imaginary 4-core wall.