An experiment
We knew going into this project that testing Nehalem-EX with our usual suite of benchmarks wouldn't suffice, but we had an idea for a test that might do a better job of pushing the limits of a system like the R810.
We'd long been looking for a test that involved virtualization performance in some way. Trouble is, most of the formal virtualization benchmarks we've seen have very steep requirements for a valid run, including large numbers of network clients, making them impractical for our use. Fortunately, we discovered that our friend Paul Venezia at InfoWorld had been working on a promising test of his own. After a quick conversation in which we offered a couple of suggestions, Paul went home and produced a very slick working benchmark setup, which he generously shared with us.
The basics are straightforward. The test setup, packaged as an OVF template, includes images for a number of virtual machines. Each VM hosts a portion of a fairly robust LAMP web hosting setup: up to four web servers, one or two database servers, and a load balancer. In order to keep things simple and bypass any network bottlenecks, the client also runs on a local VM. The core of the test is based on ApacheBench, and the performance outcomes are delivered as ApacheBench results. Because the VMs are packaged into a single template, the benchmark can be deployed easily and repeatably on any system running a VMWare hypervisor. (In our case, we used VMWare ESX 4.1.)
We found out during testing that our nefarious plan to use enterprise-class SLC solid-state disks to provide our local storage for this test wasn't going to fly. Our OCZ Vertex drives seemed ideal for a high-IOps scenario like this one, but the drives are only 60GB in size, so even a dual-drive RAID 0 was too small to house all eight of the VMs that comprise the test. In order to make this work, we needed similar performance but substantially more capacity.

Fortunately, the folks at Corsair offered us a solution in form of a couple of Force-series F240 SSDs. Although these drives use slower MLC-style NAND Flash, their SandForce SF-1200 controllers have proven capable of delivering exceptionally high IOps rates, making them a good fit here. Just a single drive in each of our test systems offered enough capacity and performance for our purposes.
With that issue settled, we proceeded to tune the benchmark for our two test systems, the R810 and our dual Westmere-EP box. The benchmark can be configured in various ways, and finding the right mix isn't easy. Eventually, we decided on using the full complement of four web servers and two database servers, with a ratio of static to dynamic web requests of 994. Both systems appeared to deliver their highest peak throughput at around 200 concurrent requests, so that became our standard. We found that longer tests tended to produce higher average response rates, so we decided to use 4 million requests in each test run. Tuning the knobs and dials in this way produced appropriately high CPU utilization across all of the VMs, with the exception that the client and load-balancer machines usually weren't fully taxed.
Here's how the results came out.

The dual Westmere-EP system serviced more requests per second, on average, than the Xeon X7560 system. That's an unfortunate result for the Nehalem-EX, no doubt. However, the more detailed results reveal a different aspect of the story.

The Xeon X5670 services two-thirds of the requests substantially quicker, on average, than the X7560. We'd attribute that outcome to a number of factors, including the X5670's higher clock frequencies, higher measured memory bandwidth, and what we suspect are substantially lower memory access latencies since there's no memory buffer chip in the mix. However, for the final 20% of the requests, the X5670's response times are much higher than the X7560's. The EX box's higher core count and larger L3 caches grant it an advantage, as well.
Depending on the sort of performance characteristics you value, the EX's showing may be the more impressive one. Avoiding those longer response times may be more desirable than simply serving more requests per second.
Then again, we expressly tuned the benchmark config to produce the best request rate averages. It's possible a different set of parameters might yield more optimal response times overall on the Xeon X5670. We may have to experiment further in the future. Still, we're pleased to see that this new addition to our test suite offers us a different sort of insight into the performance of these two systems, at least giving us a hint of the Nehalem-EX's scalability advantage when running multiple VMs.
| MSI's Z87-GD65 Gaming motherboard reviewed | 16 |