Captain Ned wrote:
I'm pretty sure the nature of compiling, lots of usually unpredictable jumps, would mean that amd chips generally perform better.
Benchies to prove/disprove? I admit I have no idea if your viewpoint is valid, but a pseudo-blanket statement such as yours begs for some empirical data, n'est-ce pas?
From a theoretical standpoint, his viewpoint is completely valid. Compilers (and especially optimizing compilers) use algorithms that tend to emphasize branches over loops. In the simplest case, a compiler has a single loop over the program input, with the body of the loop containing several switch statements. When optimizations come into play (common expression elimination, hoisting, loop unrolling, etc) you have more switches. Often a lot
more. A compiler thus tends towards the "more jumps/branches than loops" end of the spectrum, whereas a signal processing application like an MP3 encoder tends towards the other end. The P4, with its long pipeline and high memory bandwidth, is optimized for loops not branches (everytime it mispredicts a branch you have a potential stall and pipeline flush). This is why it does so well on encoding tasks, and why you would expect it to not do as well in compilation.
And if you follow the link I posted above, you will see that's exactly what happened. At least on that one bench, though it's a pretty good one and probably indicative of relative performance in this type of task in general. I haven't bothered to look for more tests of this type, though I'd be interested to see them if anyone has links.