Single page Print

Windows Media Encoder x64 Edition
Windows Media Encoder is one of the few popular video encoding tools that uses four threads to take advantage of quad-core systems, and it comes in a 64-bit version. Unfortunately, it doesn't appear to use more than four threads, even on an eight-core system. For this test, I asked Windows Media Encoder to transcode a 153MB 1080-line widescreen video into a 720-line WMV using its built-in DVD/Hardware profile. Because the default "High definition quality audio" codec threw some errors in Windows Vista, I instead used the "Multichannel audio" codec. Both audio codecs have a variable bitrate peak of 192Kbps.

Here, the Barcelona and Xeon quad-core processors at 2GHz go head to head, and the Xeon comes away victorious. Fortunately, the 2.5GHz Opteron 2360 SE looks like it may be relatively stronger.

SiSoft Sandra Mandelbrot
Next up is SiSoft's Sandra system diagnosis program, which includes a number of different benchmarks. The one of interest to us is the "multimedia" benchmark, intended to show off the benefits of "multimedia" extensions like MMX, SSE, and SSE2. According to SiSoft's FAQ, the benchmark actually does a fractal computation:

This benchmark generates a picture (640x480) of the well-known Mandelbrot fractal, using 255 iterations for each data pixel, in 32 colours. It is a real-life benchmark rather than a synthetic benchmark, designed to show the improvements MMX/Enhanced, 3DNow!/Enhanced, SSE(2) bring to such an algorithm.

The benchmark is multi-threaded for up to 64 CPUs maximum on SMP systems. This works by interlacing, i.e. each thread computes the next column not being worked on by other threads. Sandra creates as many threads as there are CPUs in the system and assignes [sic] each thread to a different CPU.

We're using the 64-bit version of Sandra. The "Integer x16" version of this test uses integer numbers to simulate floating-point math. The floating-point version of the benchmark takes advantage of SSE2 to process up to eight Mandelbrot iterations in parallel.

I've been looking forward to seeing the results of this little test, because it has the potential to demonstrate what Barcelona's single-cycle 128-bit SSE enhancements can do when given a simple, parallelizable task, just as it did when the Core microarchitecture arrived. The story that it tells is intriguing. We see huge improvements between the Opteron 2218 HE and the 2360 SE¬ónearly 4X in the integer test, with only twice as many cores on the 2360 SE. The magnitude of the gain in the floating-point test is lower, but still well past the doubled score one might expect from twice the cores in ideal conditions. Overall, though, the Xeons' per-clock throughput remains much higher than Barcelona's.