In other words, they weren’t far from your average $1400 personal computer on display at Best Buy, with the obvious exception that today’s cheap PCs have gobs more computing power than those dinosaurs. Speaking of dinosaurs, yesterday’s workstation makers are now flirting with dinosaur status. SGI bled all its engineering talent to companies like 3dfx, ATI, and NVIDIA, and Sun is, well, not high on my list of stock picks, let’s just say. The workstation world ain’t what she used to be.
Over time, workstations themselves have been transformed—as have most servers, home computers, and—heh—pocket calculators—from proprietary boxes running proprietary software (various vendor-specific flavors of Unix) to x86-based systems running either the same Windows OS your grandma’s PC runs or the latter-day One True Unix, Linux. As a result, workstations are rather difficult to separate from your everyday, run-of-the-mill desktop Pee Cee. Indeed, the “workstation-class PC” has become an appendage of the larger PC market.
Through the magic of different drivers, purpose-engineered physical incompatibilities, and newly dreamed-up marketing names, pedestrian chips like GeForces, Athlons, Pentiums, and Radeons become exotic Quadros, Opterons, Xeons, and Fire GLs. The chips aren’t really that different, but the price differences can be rather impressive—and if you want a dual-processor system based on the Pentium 4 “Netburst” architecture, for instance, you’re gonna have to pony up the cash for a pair of Xeons.
Because the barriers between desktop- and workstation-class parts are semi-transparent, spotting the boundaries between the two worlds isn’t easy. Dell will sell you a Pentium 4-based Precision “workstation” with nearly all the same parts in it as in an Optiplex. In fact, Intel’s higher-end 875P chipset, with its exclusive Performance Acceleration Technology to improve memory access latency, is targeted at high-end desktop systems, enthusiasts, and—you guessed it—workstations.
So, you may be asking, what really distinguishes a workstation from a built-to-the-hilt desktop PC? The short answer: premium parts with higher price tags and, one hopes, some tangible benefits in terms of performance, capability, and reliability. For one thing, workstations tend to support ECC memory, in order to keep cosmic rays from scrambling your bits while you work on your CAD drawings. Also, workstations often have SCSI storage subsystems, with smarter drive control logic and higher spindle speeds. Then there are the aforementioned workstation chips. Workstation-class graphics chips tend to differ from their desktop counterparts in terms of drivers; the workstation cards’ drivers have optimized OpenGL code certified for use with high-end applications for tasks like design, engineering, and content creation. In CPUs, AMD and Intel have chosen to disable multiprocessing capabilities on their desktop processors, so they can charge premiums for CPUs that work in pairs. Workstation CPUs also sometimes get other enhancements, such as larger caches or wider paths to memory, in order to improve performance.
That’s about it, in a nutshell. Workstations are typically deployed where performance, stability, and compatibility are most important. They are also, as you might imagine, very smooth machines to use. Let’s take a look at several typical workstation-class PC configurations and see how they compare.
The processors
The systems we’re looking at today are based on x86 CPUs from AMD and Intel, the Opteron and Xeon, respectively.
$1176 worth of processors—at least back when we bought ’em
If you’re familiar with Pentium 4 processors, then Xeons ought to look mighty familiar. Today’s Xeon is essentially the same chip as the Pentium 4, sold under a different name. It has the same Netburst architecture as the Pentium 4, and it’s made on the same 0.13-micron fab process as the Pentium 4. Unlike the P4, however, Xeons are capable of running in multi-processor configurations. Xeons also come in a slightly larger package than the P4, with 604 pins to the Pentium 4’s current 478. The Xeon model we’re testing today is the 2.66GHz version with 512K of on-chip L2 cache and a 533MHz front-side bus.
At the time when we started work on this article, the Xeon 2.66GHz was exactly the same price—$294 a pop, American money—as the Opteron 240 to which we’ll be comparing it. Since then, Intel has moved to counter AMD’s Opteron with some aggressive moves, including cutting prices, adding a 1MB L3 cache to some models, and moving to an 800MHz front-side bus with dual channels of DDR400 memory. We invited Intel to participate in our comparison with newer Xeon models, but it elected not to do so. Newer Xeon models (and motherboards to support them) are just now becoming available, so you may not see too many pre-assembled workstation rigs based on them just yet, anyhow.
The Xeon’s 604-pin package
AMD’s Opteron chip may not be as familiar to most folks as the Xeon, because the desktop variant of the Opteron hasn’t yet hit store shelves. This new chip, based on AMD’s K8 “Sledgehammer” architecture, brings a number of enhancements over the K7 architecture in the Athlons XP and MP. The Opteron packs an on-chip, dual-channel memory controller to reduce memory access latencies and allow for better performance scaling as the number of processors in a system rises. Also, the Opteron supports Intel’s SSE2 instruction set (in addition to AMD’s own 3DNow!), enabling accelerated SIMD computations with double-precision floating point datatypes. Many workstation apps use SSE2, especially for 3D rendering, so this addition is important. The Opteron’s larger, 1MB L2 cache won’t hurt, either. On the speed front, AMD has lengthened the K8’s pipeline to 12 stages (from the K7’s 10) and moved to a new, 0.13-micron silicon-on-insulator fabrication process in order to help the chip run faster and cooler.
Finally and perhaps most importantly, the Opteron is a true 64-bit processor. Through the use of AMD’s 64-bit extensions to the x86 instruction set architecture (ISA), the Opteron can run 64-bit operating systems and applications. This 64-bit capability breaks several barriers, including the ability to address more than 4GB of memory directly. In the workstation market, the x86 PC’s traditional 4GB memory barrier can be a crippling problem, so AMD probably won’t have to work too hard to make the case for 64-bit computing here. Operating systems and applications will have to be recompiled in order to support AMD64, but that work is already happening in both the Windows and Linux universes. The AMD64 ISA also includes more registers, or temporary on-chip storage slots, than the 32-bit x86 ISA. Recompiled applications may show substantial performance gains on AMD64 even if they can’t take advantage of AMD64’s expanded memory address space, because the chip won’t have to resort to cache accesses as often.
We haven’t yet tested the Opteron with 64-bit software, but you’ll see shortly that it performs quite well running 32-bit code. AMD’s 64-bit extensions haven’t diminished the K8’s 32-bit performance.
The Opteron’s imposing 940-pin underbelly
Asus SK8N: Single-processor Opteron with nForce3 Pro 150
The first of the three workstation platforms we’re comparing today is based on NVIDIA’s nForce3 Pro 150 chipset. This Opteron core-logic chipset is a single-chip solution intended only for single-processor systems. Asus was first to market with the nForce3 Pro, and as far as I know, the SK8N motherboard is still the only nForce3 Pro mobo available.
Asus’ SK8N has an unconventional but clean layout
Because we’re dealing with an Opteron here, the core-logic chipset arrangement is a bit different from a traditional system. The dual-channel Opteron memory controller is integrated on the processor, so there’s no conventional north bridge, and memory access doesn’t happen over the front-side bus. What’s more, NVIDIA has folded the remaining north bridge functions into a single chip along with the usual south bridge I/O capabilities. The Opteron processor communicates with the lone nForce3 Pro chip over a HyperTransport connection. HyperTransport provides the key plumbing for K8-based systems, offering a high-bandwidth transport over pairs of narrow, unidirectional links with high clock speeds. In the case of the nForce3 Pro 150, those links provide 3.6GB/s of peak bandwidth.
The nForce3 Pro supports most of the latest standards, including AGP 8X, ATA/133, and USB 2.0. This chipset doesn’t have a native Serial ATA controller, but NVIDIA says one of the chipset’s three ATA/133 channels can be overclocked and bridged to support two SATA devices (a master and a slave) at 150MB/s. Asus chose not to go that route, opting instead for a Promise RAID controller. Curiously, NVIDIA’s datasheet also says the nForce3 Pro supports RAID levels 0, 1, and 0+1, but the SK8N manual mentions only RAID via the Promise controller.
The SK8N lacks some of the high-end features present in many workstation systems today, like 64-bit/66MHz PCI or PCI-X slots and AGP Pro. Also, surprisingly, there’s no Gigabit Ethernet support, only a 10/100 Ethernet controller based on the Ethernet MAC in the nForce3 Pro. The SK8N isn’t certified for NVIDIA’s SoundStorm, either. Asus chose Realtek’s low-cost ALC650 codec for the SK8N, and supplies Realtek audio drivers with the board.
Unlike most desktop systems, though, the memory controller in the Opteron chip requires registered DIMMs in order to operate properly. (We tried to get the system to boot with non-registered DIMMs, to no avail.) Registered memory relieves some of the electrical load on the memory controller, but it does so at the expense of one clock cycle of additional memory access latency. Given the proximity of Opteron’s built-in memory controller, that’s a fair trade-off. The K8 memory controller also supports ECC memory for improved data integrity, though it doesn’t require ECC DIMMs.
All in all, the nForce3 Pro is a decidedly low-end workstation chipset, especially as implemented on the Asus SK8N. With the exception of its single-processor limitation, though, it fits in nicely with the other platforms we’re looking at here. Expect the desktop versions of nForce3 for the upcoming Athlon 64 to look very similar to the nForce3 Pro.
MSI’s 9130 K8T Master2: A KT8T00-based dual Opteron
MSI’s 9130 K8T Master2 is a very interesting low-end workstation and server motherboard. It’s based on VIA’s do-everything K8T800 chipset, which can scale from 4- and 8-way servers to single-CPU desktop systems based on the Athlon 64. The first thing you’ll notice about the 9130 K8T Master2 mobo is those two matching 940-pin sockets for Opteron chips.
Dual sockets for dual-processor action
Yep, this puppy’s a dually. VIA’s K8T800 gives the MSI 9130 several advantages over the SK8N, and multiprocessor operation is one of them. VIA’s K8T800 chipset also has a more traditional layout than the nForce3 Pro, with separate north bridge and south bridge chips. VIA’s proprietary, HyperTransport-like V-Link interconnect links the two at a rate of 1.06GB/s. By breaking things out into two chips, VIA can update the south bridge silicon independently from the north bridge, or vice-versa. VIA’s 8237 south bridge includes a true Serial ATA drive controller with support for RAID 0 and 1, plus a pair of ATA/133 interfaces. The 8237 also has a six-channel AC’97 audio controller, which MSI pairs up with an ALC210A codec. (The audio ports themselves are located on a PCI slot plate that connects to the 9130 via a header.)
Much like the SK8N, the use of a Realtek codec prevents the 9130 from earning VIA’s Vinyl Audio designation. MSI bypasses VIA’s Fast Ethernet controller, as well, and incorporates a Broadcom GigE chip.
A block diagram of the K8T800. Source: VIA.
The MSI 9130 has only 32-bit/33MHz PCI slots, but the K8T800 is capable of supporting faster PCI standards, all the way to up PCI-X, by an unorthodox arrangement in which a VIA PCI controller chip hangs off of the north bridge’s AGP 8X port. This setup could be useful for servers, where PCI-based graphics would suffice, but not for workstations, where the AGP port would best be dedicated to a graphics card. Accordingly, MSI has given the 9130 an 8X AGP Pro slot.
The MSI 9130’s talents are considerable, but MSI chose a peculiar cost-saving measure in designing this board. Although the 9130 has dual Opteron processors, each with its own dual-channel DDR memory controller, MSI connected only one of the two CPUs to DIMM slots. The second CPU has no local memory, just as indicated in the diagram above. This is not a typical arrangement for multiprocessor Opteron systems, and VIA says the K8T800 works fine with multiple processors using multiple memory controllers. The MSI 9130’s performance isn’t bad, as you’ll see soon, but it has half the memory bandwidth of optimal dual-Opteron configurations, and the system’s second processor must always resort to non-local memory access. What’s more, the 9130’s multiprocessor config offers less redundancy. If CPU 0 fails, CPU 1 cannot access memory, and the system croaks. And last but not least, you’ll need to use 2GB DIMMs if you want to reach the 9130’s max of 8GB RAM, because it has only 4 DIMM slots.
The 9130 does have another leg up over its nForce3 Pro competition, though, thanks to VIA’s Hyper8 technology. Hyper8 is a complete implementation of the fastest link afforded by the HyperTransport spec. On the K8T800, the HyperTransport connection between the north bridge and the primary processor is 16 bits wide in each direction and runs at 800MHz, yielding 6.4GB/s of bandwidth. NVIDIA’s solution, by contrast, has only 3.6GB/s of peak bandwidth between the CPU and the nForce3 Pro chip. To highlight this difference, VIA has released a HyperTransport analysis tool, and as you can see in the screenshots, it indicates the nForce3 Pro has an 8-bit upstream link and a 16-bit downstream link, both of which run at 600MHz.
HT analyzer on the K8T800 |
HT analyzer on the nForce3 Pro |
With memory already local to the processor, a faster HyperTransport link should primarily affect AGP performance, along with other forms of direct-memory-access I/O.
Tyan’s Tiger i7505: Intel E7505 chipset with dually Xeons
Tyan’s Tiger i7505 mobo is based on Intel’s E7505 chipset, a.k.a. Granite Bay, which is the direct precursor to the Canterwood chipset now at the top of Intel’s Pentium 4 lineup. The E7505 north bridge is very much like Canterwood, only it runs at lower frequencies, with a 533MHz bus and dual channels of DDR266 memory. The E7505, of course, also supports multiprocessor configurations.
The Tiger i7505 is loaded
Xeon systems are more conventional than Opterons in terms of chipset layout, with the memory controller residing on the north bridge chip, as ever. In the case of the E7505, the front-side bus and dual channels of DDR266 memory are matched at 4.3GB/s each. The Xeons share memory and bus bandwidth between themselves.
The E7505 north bridge can connect directly to an Intel PCI-X chip, and a second PCI-X chip can hang off of that one, if needed. That’s a decent arrangement, especially since the connection between the north and south bridge chips on the E7505 is only 266MB/s. Tyan chose to include only 32-bit, 33MHz PCI slots on the i7505, though.
Nevertheless, the Tiger i7505 itself bristles with ports, slots, and sockets. Tyan’s characteristically conservative approach to motherboard design feels right at home in the workstation market, where overclocking and flamboyance aren’t exactly orders of the day. Unlike the Opteron boards, the Tiger i7505 doesn’t require—and indeed won’t operate with—registered DIMMs. The board does support both ECC and non-ECC memory types, though. As with all these dual-channel designs, DIMMs must be installed in pairs for optimal performance.
The i7505 includes both an AGP Pro slot and Gigabit Ethernet, the latter courtesy of an Intel PCI Ethernet chip. Like our other two contestants, the i7505 uses a Realtek codec chip to translate AC’97 audio from the south bridge. The older ICH4 south bridge chip on the Tiger only supports ATA/100, and no RAID. To augment the board’s disk I/O capabilities, Tyan has chosen a Promise Serial ATA RAID controller.
Despite its lack of 64-bitness, the dual Xeon i7505 is one sweet workstation-class system. With Hyper-Threading enabled, the dual Xeons show up as four logical processors in utilities like Windows Task Manager, causing a little twinge of excitement in the heart of any true geek.
About the tests
We’ll be testing the three workstation platforms we’ve just described against a pair of familiar foes: the Pentium 4 3.2GHz and Athlon XP 3200+. Now, before you market segmentation purists get your briefs bunched up, let me explain why. First, there’s the issue of price. A pair of Xeons 2.66GHz chips or Opteron 240s will set you back a little more than a single Athlon XP 3200+, or a little less than a Pentium 4 3.2GHz. The comparison seems only fair, in that regard. Then there’s the fact that at least one of these two platforms, the Pentium 4 with 875P chipset, is actually considered a workstation-class product by its manufacturer. (The P4 with an 865 chipset is Intel’s mainstream desktop combo.) We would, thus, be remiss not to include it.
The next facet of our tests to cause potential bunching effects is our choice of graphics cards, which is the decidedly non-workstation-class GeForce FX 5900 Ultra. Never mind that there’s a Quadro FX based on the same chip to match it, the card we’ve used for testing doesn’t fit the workstation mold, and we own up to it. Truth is, we blew our budget on the workstation-class processors, because they are the focus of our attention today. The better-tuned and certified OpenGL drivers that come with a Quadro would only have made a difference in a couple of our benchmarks, anyhow: SPECviewperf and the OpenGL portions of Cinebench. Otherwise, most of the tests we employed are platform-bound.
Finally, we threw in a few gaming tests at the end of our benchmarks, just for fun. I’m sure product managers everywhere will be aghast. We plead guilty to whatever labels you want to throw on us for this torrid act of sedition—even the “enthusiast” tag. We are bad, bad men. We don’t know how we sleep at night.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least twice, and the results were averaged.
Our test systems were configured like so:
Athlon XP | K8T800 Opteron | nForce3 Opteron | Xeon | Pentium 4 | |
Processor | Athlon XP ‘Barton’ 3200+ 2.2GHz | AMD Opteron 240 1.4GHz 2 x AMD Opteron 240 1.4GHz |
AMD Opteron 240 1.4GHz 2 x AMD Opteron 240 1.4GHz |
2 x Xeon 2.66GHz | Pentium 4 3.2GHz |
Front-side bus | 400MHz (200MHz DDR) | HT 16-bit/800MHz downstream HT 16-bit/800MHz upstream |
HT 16-bit/600MHz downstream HT 8-bit/600MHz upstream |
533MHz (133MHz quad-pumped) | 800MHz (200MHz quad-pumped) |
Motherboard | Asus A7N8X Deluxe v2.0 | MSI 9130 | Asus SK8N | Tyan Tiger i7505 | Abit IC7-G |
North bridge | nForce2 SPP | K8T800 | nForce3 Pro | E7505 MCH | 82875P MCH |
South bridge | nForce2 MCP-T | VT8237 | 82801DB ICH4 | 82801ER ICH5R | |
Chipset drivers | nForce Unified 2.45 | 4-in-1 v.4.49 AGP 4.42 |
AGP 3.34 ATA 3.44 Audio 5.10.0.5100 |
INF Update 5.0.2 IAA 2.3.0.2160 Audio 5.10.0.5250 |
INF Update 5.0.1015 ATA 5.0.1007.0 Audio 5.10.0.5250 |
BIOS revision | 1005 | 1.0 | 1002 | 1.01 | 1.6 |
Memory size | 1GB (2 DIMMs) | 1GB (2 DIMMs) | 1GB (2 DIMMs) | 1GB (4 DIMMs) | 1GB (2 DIMMs) |
Memory type | Corsair TwinX XMS4000 DDR SDRAM at 400MHz | Infineon PC2700 registered ECC DDR SDRAM at 333MHz | Infineon PC2700 registered ECC DDR SDRAM at 333MHz | Corsair TwinX XMS3200LL DDR SDRAM at 266MHz | Corsair TwinX XMS4000 DDR SDRAM at 400MHz |
Hard drive | Seagate Barracuda V 120GB ATA/100 | Seagate Barracuda V 120GB SATA 150 | Seagate Barracuda V 120GB ATA/100 | Seagate Barracuda V 120GB ATA/100 | Seagate Barracuda V 120GB SATA 150 |
Audio | nForce2 MCP/ALC650 | Creative SoundBlaster Live! | nForce3 Pro/ALC650 | ICH4/ALC650 | ICH5/ALC650 |
Graphics | GeForce FX 5900 Ultra | ||||
OS | Microsoft Windows XP Professional | ||||
OS updates | Service Pack 1, DirectX 9.0b |
All tests on the Pentium 4 and Xeon systems were run with Hyper-Threading enabled.
One note about sound. The MSI 9130’s audio ports are located on a PCI slot plate that connects to the 9130 via a header. We first received the 9130 sans manual or audio connectors, so we tested it with a separate sound card. We used built-in audio solutions on the other platforms.
Thanks to Corsair for providing us with memory for our testing. If you’re looking to tweak out your system to the max and maybe overclock it a little, Corsair’s RAM is definitely worth considering.
The test systems’ Windows desktops were set at 1152×864 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.
We used the following versions of our test applications:
- Cachemem 2.65MMX
- SiSoft Sandra MAX3! (2003.7.9.73)
- Compiled binary of C Linpack port from Ace’s Hardware
- Discreet 3ds max 5.1 SP1
- NewTek Lightwave 7.5
- Cinebench 2003
- POV-Ray for Windows v3.5
- PICCOLOR v4.0 build 451
- SPECviewperf 7.1
- ScienceMark 2.0 beta (06SEP03-A build)
- Sphinx 3.3
- LAME 3.93.1 (build from mitiok.cjb.net)
- Xmpeg 5.0.1 with DivX Video 5.05
- FutureMark 3DMark03 build 330
- Comanche 4 demo
- Quake III Arena v1.31
- Serious Sam SE v1.07
- Unreal Tournament 2003 demo v.2206
- Wolfenstein: Enemy Territory v2.55
All the tests and methods we employed are publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Benchmark results
Memory performance
Our synthetic memory tests should help show us how the Opteron benefits from its on-chip memory controller, among other things.
On the bandwidth front, the Opterons easily outrun the Athlon XP, although the Pentium 4 is still king. The dual Xeons, with their dual DDR266 memory config, don’t fare so well in Sandra. Cachemem’s bandwidth test, however, generally seems a little more indicative of real-world performance, and the Xeons do well there.
Notice how the dual Opteron K8T800 is slower than the same board with only one processor. That’s because the second CPU is saddled with non-local memory access.
Also note how the K8T800 pulls slightly ahead of the nForce3 Pro in these memory bandwidth tests. That’s an unexpected outcome, because the two systems share the same memory controller, embedded in the Opteron processor. VIA and MSI have pulled off a minor coup in beating out the nForce3 Pro here.
Linpack shows us the impact of the Opteron’s large 1MB L2 cache. Unfortunately, at only 1.4GHz, the Opteron 240 can’t quite keep up with the Athlon XP or the Xeons until their caches are exhausted. The Pentium 3.2GHz with dual DDR400 is a monster in Linpack, beating out the Opteron on larger matrix sizes, even with a smaller 512K L2 cache.
With its integrated memory controller, the Opteron shows us some very low latencies for memory access. Only the Pentium 4 is able to keep up by virtue of its very high internal clock frequencies, fast bus, and PAT-enabled north bridge chip.
The dual Opteron system suffers the effects of non-local memory access rather acutely here, as one might expect.
3ds max rendering
We begin our 3D rendering tests with Discreet’s 3ds max, one of the best known 3D animation tools around. 3ds max is both multithreaded and optimized for SSE2, making it a perfect playground for our Xeons and Opterons. We rendered a couple of different scenes at 1024×465 resolution, including the Island scene shown below. Our testing techniques were very similar to those described in this article by Greg Hess. In all cases, the “Enable SSE” box was checked in the application’s render dialog.
The single-processor Pentium 4 system ties the Xeon on the simpler Earth-Apollo.max scene, but the duallies pull ahead in the more complex Island scene. Intriguingly, the Opteron’s enhancements over the Athlon XP aren’t enough to make up for the 800MHz difference between the 1.4GHz Opteron 240 and the 2.2GHz Athlon XP 3200+.
Lightwave rendering
NewTek’s Lightwave is another popular 3D animation package that includes support for multiple processors and is heavily optimized for SSE2. Lightwave can render very complex scenes with realism, as you can see from the sample scene, “A5 Concept,” below.
Where 3ds max is self-tuning, though, Lightwave is not. Users may choose the number of rendering threads to execute in Lightwave. We tried a number of different configurations and came up with the following. The single-processor Opteron and Athlon systems always performed best with a single thread. The multi-processor and Hyper-Threaded systems were somewhat trickier, so we’ve reported scores with two, four, or eight threads, depending on the situation. In all cases, the fastest score was always reported.
The Xeons perform exceptionally well here, but the dual Opterons are right on their heels. The Opteron’s ability to execute SSE2 code allows our single-Opteron systems to outrun the Athlon XP 3200+, which is the only processor in our round-up without SSE2 capability.
POV-Ray rendering
POV-Ray is the granddaddy of PC ray-tracing renderers, and it’s not multithreaded in the least. Don’t ask me why—seems crazy to me. POV-Ray also relies more heavily on x87 FPU instructions to do its work, because it contains only minor SIMD optimizations.
We’ve tested with a pair of scenes: the old “chess2.pov” scene we’ve been using forever, and the POV-Ray 3.5 benchmark scene, which apparently tests some of the SIMD-optimized codepaths in this new POV-Ray release.
Clock speed and FPU prowess are the keys here, and our Athlon XP and P4 systems have the advantage.
Cinebench 2003 rendering and shading
Cinebench is based on Maxon’s Cinema 4D modeling, rendering, and animation app. This revision of Cinebench measures performance in a number of ways, including 3D rendering, software shading, and OpenGL shading with and without hardware acceleration.
Cinema 4D’s renderer is multithreaded, so it takes advantage of Hyper-Threading and SMP. For the Athlon XP and the single-CPU Opteron systems, I’ve reported the single-processor results. For the rest of the systems, I’ve reported the multi-threaded results, which in all cases were notably faster.
In the CPU-based Cinema 4D renderer, the Xeons simply rule. This test has always responded well to Hyper-Threading, and doing it on two different processors at once results in a big performance win.
The OpenGL-based shading tests are captured by the fast single-CPU P4 and Athlon systems.
SPECviewperf workstation graphics
SPECviewperf simulates the graphics loads generated by various professional design, modeling, and engineering applications. It’s an interesting test for workstation-class systems, although, as we’ve noted, we don’t really have a workstation-class graphics card (or at least driver) in our test systems.
viewperf doesn’t really benefit from a second CPU, which is obvious from looking at the test results. Thus, the P4 and Athlon XP systems trade back and forth in the top two slots, for the most part.
One interesting note: in four of the six tests, the K8T800 outruns the nForce3 Pro, sometimes by a fair margin. We may be seeing VIA’s Hyper8 advantage in action.
ScienceMark
I’d like to thank Alex Goodrich for his help working through a few bugs the 2.0 beta version of ScienceMark. Thanks to his diligent work, I was able to complete testing with this impressive new benchmark, which is optimized for SSE, SSE2, 3DNow! and is multithreaded, as well.
In the interest of full disclosure, I should mention that Tim Wilkens, one of the originators of ScienceMark, now works at AMD. However, Tim has sought to keep ScienceMark independent by diversifying the development team and by publishing much of the source code for the benchmarks at the ScienceMark website. We are sufficiently satisfied with his efforts, and impressed with the enhancements to the 2.0 beta revision of the application, to continue using ScienceMark in our testing.
The molecular dynamics simulation models “the thermodynamic behaviour of materials using their forces, velocities, and positions”, according to the ScienceMark documentation. Sounds simple enough, right?
Primordia “calculates the Quantum Mechanical Hartree-Fock Orbitals for each electron in any element of the periodic table.” In our case, we used the default element, Argon.
The next test measures performance in AES encryption.
In the three tests above, the Athlon XP and Pentium 4 are consistently the best performers, even though ScienceMark obviously benefits from SMP systems. The dual Xeon and dual Opteron systems trade off in the third-place slot.
The Blas tests below measure matrix multiplication performance, much like Linpack. However, these tests are optimized in various ways, and we can see how the different codepaths perform. The SGEMM test measures single-precision floating-point math, where 3DNow! and SSE are able to help, while DGEMM is double-precision, so only SSE2 can accelerate these calculations.
The Pentium 4 and Xeon systems come out looking very good in these tests. The most interesting thing to note, however, is the relative performance on different codepaths. In SGEMM, the Athlon XP is faster in 3DNow!, while the Opteron is faster with SSE. On the DGEMM test, the Pentium 4 screams with SSE2 and packed data, but the Athlon XP nearly keeps up using only its FPU. The Opteron, meanwhile, performs just about the same with SSE2 packed, SSE2 scalar, and x87 assembly—oddly balanced across the board.
LAME MP3 encoding
We used LAME 3.92 to encode a 101MB 16-bit, 44KHz audio file into a very high-quality MP3. The exact command-line options we used were:
lame –alt-preset extreme file.wav file.mp3
Unfortunately, LAME isn’t multithreaded.
As we’ve seen all along, our 1.4GHz Opterons just can’t keep up with the pack. The Xeons smoke ’em.
DivX video encoding
We tested DivX encoding using a new and different move clip than the one we’ve used in past tests, so don’t let these scores throw you. Xmpeg is partially self-tuning, and we noted that it chose the SSE2 Optimized iDCT on the Opteron processors. Xmpeg and DivX are also multithreaded, so a second processor should speed things up.
These are remarkable results from the Opteron. The dual Opteron 240 system pulls out a rare victory over the dual Xeon 2.66GHz system, and more impressively, the single Opteron 240 systems nearly catch up with the Athlon XP 3200+. The Opteron’s integrated memory controller and SSE2 support give it way more clock-for-clock oomph than the Athlon XP, which was itself no slouch.
Sphinx speech recognition
Ricky Houghton first brought us the Sphinx benchmark through his association with speech recognition efforts at Carnegie Mellon University. Sphinx is a high-quality speech recognition routine that needs the latest computer hardware to run at speeds close to real-time processing. We use two different versions, built with two different compilers, in an attempt to ensure we’re getting the best possible performance.
There are two goals with Sphinx. The first is to run it faster than real time, so real-time speech recognition is possible. The second, more ambitious goal is to run it at about 0.8 times real time, where additional CPU overhead is available for other sorts of processing, enabling Sphinx-driven real-time applications.
The Pentium 4 just throttles the competition here, taking the top spot by a wide margin. With a fast P4 system, Sphinx-driven speech recognition apps could well be a reality. I half expected the Opterons to set new records in Sphinx, but I suspect we’ll need DDR400 memory and higher clock speeds from the Opteron in order for it to match the Pentium 4.
PICCOLOR image analysis
We thank Dr. Reinert Muller with the FIBUS Institute for pointing us toward his PICCOLOR benchmark. This image analysis and processing tool is partially multithreaded, and it shows us the results of a number of simple image manipulation calculations. The overall score is indexed to a Pentium III 1GHz system based on a VIA Apollo Pro 133. In other words, the reference system would score a 1.0 overall.
Remarkably, the wildly different Opteron 240 and Xeon 2.66GHz systems tie exactly in terms of overall score. Here are the results of the individual tests…
The K8T800 system stumbles on the two video tests. I’m unsure what the problem was, but the low score was consistent across all of our benchmark runs. We’ll have to investigate further.
Otherwise, the individual scores are interesting. Some tests, like Float Interpolation, are clearly multithreaded. Others, like ArrayIndex, are not.
Just for fun: gaming performance
Quake III Arena
We were able to test at least one game, Ye Olde Quake III Arena, with multithreading. The “r_smp 1” console command invokes Q3’s multithreaded mode, which we tested alongside its regular mode.
Whoa. The dual Opteron K8T800 system darn near knocks off the Pentium 4 3.2GHz in Q3A—not exactly what I’d expected to see. Even with only one CPU, the Opteron 240/K8T800 system performs very well, nearly matching the Athlon XP 3200+. The nForce3 Pro is a fair bit slower than the K8T800 here, which suggests VIA’s faster HyperTransport link really can make a difference.
I’m going to throw the rest of the gaming results at you without much comment. These are workstation systems, after all.
3DMark03
Comanche 4
Serious Sam SE
Unreal Tournament 2003
Wolfenstein: Enemy Territory
Ok, I have to make some comment. The Xeon and Opteron systems are, at least with the GeForce FX 5900 card we’re using, very competent gaming machines. The Opterons and Xeons trade places enough that I can’t say either one is particularly better than the other, but they’re both pretty decent. The K8T800 earns special distinction for outrunning the nForce3 Pro pretty darn consistently in gaming and graphics benchmarks.
Workstations are typically very specialized machines, so you’ll have to decide for yourself which of these platforms makes the most sense for your particular needs. If you are faced with the choice of a top-of-the-line single-processor PC or a lower-end workstation-class system, you can see from our tests that the choice may be quite a dilemma. Clearly, in most tasks, the faster single-processor system outperforms a slower dual-processor workstation. However, for tasks that are easily parallelizable, like 3D rendering, multiprocessor systems can be worth investigating. Also, most workstations aren’t bought on the tightest of budgets, from what I gather, and dual-processor rigs promise the highest outright performance when price isn’t an object. Driver certifications and the like for your particular application may play into the decision, too.
If you’re choosing between Xeons and Opterons, the choice is also tough. Unfortunately, our dual Opteron test motherboard didn’t have DIMM slots for the second processor’s memory controller, so the Opteron may have been shortchanged a little bit. Then again, the version of Windows we used for testing wasn’t NUMA-aware, anyhow.
Clearly, the Xeons at 2.66GHz outran the dual Opteron 240 setup in most of our tests. With its faux-quad-processors via Hyper-Threading, our dual Xeon Tiger i7505 system ripped through multithreaded tests, absolutely devastating the competition in Cinema 4D rendering. As new Xeon DP chips with 1MB cache, higher core clock speeds, and 800MHz front-side bus speeds with dual-channel DDR400 memory become available, the Xeon should become even more formidable. Obviously, Intel is not taking the Opteron threat lightly.
However, the Opteron itself showed quite a bit of promise, tying or beating its older sibling, the Athlon XP 3200+, in several tests despite a cavernous 800MHz clock-frequency gap. With 64-bit applications, a 64-bit OS, and a proper memory configuration, the Opteron could be a deadly fast workstation config. The addition of an integrated memory controller and SSE2 puts the Opteron in good position to take on a charging pair of Xeons without getting gored. Most important for the Opteron, though, is getting the clock speed up. The new 2GHz Opteron chips should be much more potent competitors, even against Intel’s 3.2GHz Netburst-based processors. (I’m not kidding—keep an eye on TR for more soon.)
Among the Opteron chipsets, the K8T800 looks like the better workstation platform right now. The nForce3 Pro doesn’t have any great flaws, and its single-chip approach is elegant, but the K8T800 offers more. The K8T800 supports multiple CPUs and Serial ATA, and its faster HyperTransport link between CPU and chipset brings better performance whenever AGP is invoked. I’m not too enamored with the MSI 9130’s funky memory config, but its single-processor performance is impressive.