Our sweet new suite
Many of the benchmark tests you'll see below are revised versions of what we've used in the past here at TR. In some cases, they're new versions of familiar programs like 3DMark. In others, like the POV-Ray rendering program, we've added recompiled binaries made from the same source code as the binaries we've used in the past. Either way, the fact these tests are new is important, because in either case, the tests have been compiled using newer compilers. The tests like POV-Ray, where we compare older binaries to new ones, ought to demonstrate the potential benefits of recompiling older code to accommodate newer processor designs. Generally, there are two broad reasons why recompiling can help performance.
Newer is better
First, newer compilers better optimize code for newer processors. With sophisticated out-of-order instruction execution capabilities and the like, the latest x86 CPU designs stand to benefit greatly from friendlier code.
The Pentium 4, for instance, has a 20-stage branch prediction/recovery pipelinetwice the depth of the Athlon's or Pentium III's. This pipeline executes instructions speculatively by attempting to anticipate what the program will request next. Get a prediction right, and the results are available almost instantly. Get it wrong, and the results have to be discarded, which takes time. A deeper pipeline carries with it a heavier penalty for a branch misprediction. Better code can help improve a processor's efficiency by reducing branch mispredictions.
Well, that's one of many reasons newer compilers help. We'll leave the rest to the hard-core processor geeks; suffice to say that better code runs faster. Surprisingly, a lot of the executable programs out there are really better suited to a 386, 486, or Pentium processor than they are to an Athlon or a Pentium 4.
The power of alphabet soup
Second, newer x86 processors can execute new instructions designed to improve efficiency even further. Intel and AMD marketing types have given these new sets of instruction names like MMX, SSE, 3DNow!, and SSE2. Most of these new instructions employ a technique called SIMD, for "single instruction multiple data," to perform a single mathematical operation on multiple chunks of data at once. Using these instructions in the right situationnot every situation is right for SIMDand even an old K6-2 becomes a number-crunching monster.
To review, both the Athlon and Pentium 4 can execute MMX instructions, which are oriented toward integer math and thus not terribly thrilling. The Athlon uses 3DNow! to handle floating-point SIMD math, and the Pentium 4 uses both SSE and SSE2. SSE2 is the newest set of SIMD extensions on the block, and it's one of the Pentium 4's biggest potential advantages. SSE2 handles floating-point calculations with much more precision than 3DNow! or SSE, so it's quite a bit more useful. For certain types of tasks, such as streaming video encoding or real-time 3D rendering, SSE2 could allow the P4 to whup up on the competition. Maybe.
Of course, all of these newer instructions require recompiling applications to take advantage of them. And, in many cases, a recompile alone doesn't help muchprograms often need to be heavily tweaked or rewritten to take advantage of SIMD instructions.
The truth about optimizations
Since the Pentium 4's launch, the Athlon has been beating out the new Intel regularly in most benchmark tests. Almost just as regularly, Intel has claimed that recompiled binaries, newer versions of applications, and SSE2 optimizations would help the P4's performance considerably. They're right, but it's not that simple. The Athlon stands to benefit from newer compilers and SIMD optimizations, too.
Also, the usefulness of such optimizations is limited. In reality, an awful lot of applications will make use of older, less efficient code for years to come, just because no one will bother to optimize or recompile them. Intel has its own compilers that are pretty good at making things run faster on its processors, but tools from Microsoft and other companies are much more widely used. There are reasons why this is the case, and we'll touch on a few of them below. Finally, as we've noted, SIMD extensions are of limited use, and they require extra work to implement.
Then there's the issue of a processor's performance profile, as I will call it. It may well be true that the P4 will gain more from recompiled code than an Athlon will, but the sword cuts both ways. If that's true, one could argue that the Pentium 4 simply does a poorer job executing legacy code. This is an especially tricky subject when it comes to benchmarks, since both Intel and AMD take an active interest in seeing their processors do well on commonly used performance tests. However, not every optimized piece of code you see spitting out numbers in a benchmark test accurately reflects the sort of code your processor may encounter in daily use.
That's a lot of considerations to keep in mind, and we're just scratching the surface of a very complex issue. Hold tight, and we'll consider these things as we go.
| Friday night topic: The trouble with Best Buy | 128 |