Xeons go quad-core, boggle the mind
We've already previewed the new quad-core Xeons, so I won't spend too much restating their features. The key thing to know about "Clovertown" Xeons is that they're essentially two dual-core "Woodcrest" Xeon chips situated together in a single package. Since each Woodcrest has two execution cores, Clovertown has four; since each Woodcrest Xeon has 4MB of L2 cache onboard, Clovertown has a total of 8MB of L2 cache. It's really no more complicated than that.


Cosmetically, Clovertown looks like any other recent Xeon

The twin chips inside of this package have no real provision to communicate directly with one another. Instead, the two chips share the front-side bus with the Intel 5000P (Blackford) Memory Controller Hub, or north bridge. Each chip creates an electrical load and additional overhead on the bus, and each chip participates independently in the system's cache coherency scheme.


A block diagram of the Bensley platform. Source: Intel.

Fortunately, Intel's "Bensley" platform for Xeon processors looks to have sufficient capacity to handle this additional overhead. The Blackford MCH has dual, independent front-side busses, one for each CPU socket, and those FSBs run at a lofty 1333MHz when coupled with the fastest Xeons. That works out to roughly 10.5GB/s per socket. Also, Blackford's four memory channels can host Fully Buffered DIMMs (FB-DIMMs) at clock speeds up to 667MHz, for the same 21.2GB/s of memory bandwidth that a dual Socket F Opteron system has. If it all works as planned, Clovertown Xeons should offer substantially improved performance over Woodcrest Xeons, despite being a simple drop-in replacement.

The Clovertown processors we've tested are the top-of-the-line Xeon X5355 model, with a 120W TDP rating. However, Intel offers a range of quad-core Xeons, like so:

Clock speedL2 cacheFront-side busTDPPrice
Xeon E53101.60GHz8MB1066MHz80W$455
Xeon E53201.86GHz8MB1066MHz80W$690
Xeon E53452.33GHz8MB1333MHz80W$851
Xeon X53552.66GHz8MB1333MHz120W$1172

Almost miraculously, the lower speed grades of the Xeon 5300 series have a TDP of only 80W. By emphasizing parallelism and keeping clock speeds low, Intel has achieved a potentially very potent combination of performance and power efficiency.

If you're about to compare those numbers to the TDP ratings of the Socket F Opterons, be aware that direct comparisons between them are somewhat controversial because AMD and Intel use different methods of rating their processors' TDP values. AMD claims the two methods aren't entirely comparable because its TDP estimates are more conservative. However, Intel points out that the two companies' TDP ratings are used similarly by system builders, and are thus functionally equivalent. Regardless of whose argument is more persuasive, we believe system-wide power draw is the more important consideration, and we've measured that ourselves in the following pages.


Colfax's Xeon system employs a ducted cooling design that vents air across the FB-DIMMs

Test notes
The tests we used for these processors are almost entirely based on widely multithreaded applications. We used many of them in our recent review of AMD's Quad FX platform, but I had originally intended them for this article. Not all of the applications are able to use eight threads, as you will see, but many do. I should mention that we had hoped to expand our range of tests here to include a couple of SPEC benchmarks, including CPU2006 and JBB2005. We requested these tests from SPEC, but unfortunately, they didn't arrive in time for inclusion in this article. Perhaps next time.

We're also using some new methods for measuring and quantifying power efficiency. We believe these methods offer a great deal more insight into power efficiency than other techniques we've seen to date, but we should acknowledge something up front about the mix of processors we're testing. Both the Xeon 5160 and Xeon 5355 are the top speed grades in their respective product lines, and as such, each has a higher TDP rating than the next product down the rung. For example, the Xeon 5355 has a TDP of 120W, while the bulk of the 5300 series is rated at 80W. The Opteron 2218, meanwhile, comes from the other side of a similar divide in AMD's product line. The top speed grade of the Opteron lineup is the 2220 SE, which has a 120W TDP, but the 2218 we're testing is a regular 95W Opteron.

Higher CPU speed grades do offer more performance, but the top speed grade generally doesn't offer the best combination of power efficiency and performance together. As a result, we have limited our tests of power efficiency to just a couple of applications. We may take a more extensive look at power efficiency and power-efficient performance in a future article, and if we do so, we'd like to include a broader range of products, including some lower speed grades of Xeon processors.

Then again, going to a lower clock frequency does have performance penalties. Keep in mind as you see our performance results that AMD does produce a faster speed grade in the Opteron 2220 SE.

We should also address another issue: the Opteron's NUMA memory subsystem and operating system support. In our recent Quad FX article, we disabled the "node interleave" option in our test system's BIOS—per AMD's recommendation—because we were using Windows XP Pro x64 Edition, which is supposed to be NUMA-aware. Here, we're using another version of Windows derived from the same code base, Windows Server 2003 R2 Enterprise x64 Edition. This time around, though, we did some testing both with and without node interleave, and we found that enabling node interleave produced better performance in everything we tried, including synthetic memory tests and real applications. As a result, we've conducted our testing with node interleave enabled. Perhaps Windows Vista's improved NUMA support will produce better results on Opteron systems without node interleaving.