x264 HD video encoding
This benchmark tests performance with one of the most popular H.264 video encoders, the open-source x264. The results come in two parts, for the two passes the encoder makes through the video file. I've chosen to report them separately, since that's typically how the results are reported in the public database of results for this benchmark. These scores come from the newer, faster version 0.59.819 of the x264 executable.
One can surmise by looking at these results that x264's second pass is more widely multithreaded than the first. True to form, the Shanghai Opterons are faster in pass one, while Istanbul is faster the second time around. Due to the flexibility of its "Turbo mode" mechanism, the Xeon X5550's performance is excellent in both cases.
We've included this final test largely just to satisfy our own curiosity about how the different CPU architectures handle from SSE extensions and the like. SiSoft Sandra's "multimedia" benchmark is intended to show off the benefits of "multimedia" extensions like MMX, SSE, and SSE2. According to SiSoft's FAQ, the benchmark actually does a fractal computation:
This benchmark generates a picture (640x480) of the well-known Mandelbrot fractal, using 255 iterations for each data pixel, in 32 colours. It is a real-life benchmark rather than a synthetic benchmark, designed to show the improvements MMX/Enhanced, 3DNow!/Enhanced, SSE(2) bring to such an algorithm.
The benchmark is multi-threaded for up to 64 CPUs maximum on SMP systems. This works by interlacing, i.e. each thread computes the next column not being worked on by other threads. Sandra creates as many threads as there are CPUs in the system and assignes [sic] each thread to a different CPU.
The benchmark contains many versions (ALU, MMX, (Wireless) MMX, SSE, SSE2, SSSE3) that use integers to simulate floating point numbers, as well as many versions that use floating point numbers (FPU, SSE, SSE2, SSSE3). This illustrates the difference between ALU and FPU power.
The SIMD versions compute 2/4/8 Mandelbrot point iterations at once - rather than one at a time - thus taking advantage of the SIMD instructions. Even so, 2/4/8x improvement cannot be expected (due to other overheads), generally a 2.5-3x improvement has been achieved. The ALU & FPU of 6/7 generation of processors are very advanced (e.g. 2+ execution units) thus bridging the gap as well.
We're using the 64-bit version of the Sandra executable, as well.
In our final benchmark, the Istanbul Opteron produces a bit of a surprise, besting the Xeon X5550 in all three testsand even outperforming the $1600-a-pop Xeon W5580 in the integer x8 test.