Intel’s Skulltrail dual-socket enthusiast platform

Skulltrail. If you hang around these parts, you’ve been hearing that codename bandied about for the better part of a year now. Not only does it have the distinction of being, quite probably, the single coolest codename known to all of geekdom, but it’s also attached to the sort of hardware required to back up the copious bravado it implies. You see, Skulltrail is a high-end desktop PC platform based on workstation-class chips taken from Intel’s Xeon parts shelf. We’re talking about some wicked numbers here, such as dual sockets, eight cores, four graphics card slots, and dual 1600MHz front-side buses with a total of 25.6GB/s of bandwidth.

Gulp.

Of course, Xeon-based workstations have long sported impressive stats, but they’ve rarely set PC hobbyists’ hearts aflutter for various reasons. Chief among them: buttoned-down motherboards with very little tweakability, foreign expansion options, and limited feature sets. We ran into these problems when we reviewed Intel’s powerful-yet-frustrating V8 “media creation platform” last May. When your mobo’s BIOS seemingly equates changing the CAS latency with opening up a liquor store in Riyadh, you know you’re in the wrong neighborhood. Miraculously, though, some folks inside of Intel managed to wrangle approval to do something about that problem, and Skulltrail is the result: a truly tweakable motherboard coupled with unlocked Core 2 Extreme QX9775 processors clocked at 3.2GHz, primed for use with both SLI and CrossFire.

Beautimous, isn’t it?

We probably have AMD to thank for Skulltrail’s existence. When it couldn’t keep up with Intel by delivering four cores per socket, the firm hatched its Quad FX scheme and pledged to make enthusiast-class dual-socket motherboards a part of its long-term technology direction. That direction was to include an upgrade to dual quad-core Phenom processors as soon as they became available. Of course, AMD has since canceled Quad FX and failed to provide the promised upgrade path for owners of Quad FX systems, but Skulltrail was already deep into development by the time AMD peed down its leg. End result: Intel makes good on its answer to AMD’s promises. I can live with that.

The D5400XS motherboard

Indeed, Skulltrail is very easy to live with thanks to a motherboard expressly tailored for this purpose. The technology’s workstation heritage is clear—it’s based on the Stoakley platform and 45nm Harpertown Xeons we reviewed last fall—but the motherboard is utterly devoid of PCI-X slots and the like. Instead, the Intel D5400XS bristles with the sort of amenities a desktop rig might need. Here’s a quick look at the key specs.

CPU support Dual
LGA771-based C
ore
2 Extreme and Xeon processors
North bridge Intel
5400 MCH
South bridge Intel
6321ESB ICH
Interconnect PCIe
x4 + DMI x4
Expansion slots 4
PCI Express x16 (PCIe 1.1) via dual Nvidia nForce 100 switches
2 32-bit PCI
Memory 4
240-pin FB-DIMM sockets
Maximum of 16GB of DDR2-667/800 FB-DIMM memory
Storage I/O 1 ATA/100 port
6 Serial ATA
3Gbps ports with RAID 0, 1, 5, 10 support
Audio 8-channel HD audio via
6321ESB and SigmaTel STAC9274D5 codec
Ports
and Headers
2 eSATA with RAID 0,1
support via Marvell 88SE6121
6
USB 2.0 with headers for
4 more

1
RJ45 Ethernet 10/100/1000 via Intel 82573L

1 IEEE1394 (FireWire) with header for 1 more via TI TSB43AB22A

1 analog front out
1 analog
center/LFE out
1 analog rear out
1 analog surround out/line
in
1 analog mic in
1 TOS-Link digital S/PDIF out

1
HD Audio front-panel header for analog headphone out and mic in

1 HD Audio Link header

1 3-pin S/PDIF out header

1 Consumer Infrared front-panel receiver header

1 Consumer Infrared transmitter header

Form
factor
EATX
(13″ x 12″) with LGA775-style mounting holes for cooling

In a nutshell, this thing is loaded. Intel has backed up the specs with additional goodness where possible, too. For instance, the D5400XS’s eight-channel audio is Dolby Home Theater capable, which means it supports both Pro Logic II and Dolby Digital Live encoding.

In some ways, of course, the D5400XS’s workstation-class foundation is inescapable, but Intel has made accommodations where possible. Although it uses Xeon-style LGA771 processor sockets, the mounting holes around those sockets use LGA775-style spacing, so they should be compatible with desktop-class coolers, including the more exotic varieties involving liquid cooling or phase-change. Similarly, cramming all of the D5400XS’s features onto a standard ATX-sized board would be nearly impossible, but the 13″ x 12″ board is small enough to fit into an EATX-ready enclosure like the Cooler Master Cosmos.

One place where Skulltrail can’t escape its workstation heritage is in its use of fully buffered DIMMs. FB-DIMMs are typically more expensive than standard DDR2 memory, and they consume more power than standard DIMMs. FB-DIMMs also tend to have higher latency than DDR2 memory. They somewhat make up for these drawbacks in server and workstation settings by allowing tremendous amounts of bandwidth and lots and lots of DIMM slots with fewer traces on the motherboard.

The D5400XS has only four FB-DIMM slots, but each one of those slots is connected to a memory channel in the Intel 5400 north bridge. This north bridge has two separate memory “branches” with two channels each. Fully populated with 800MHz memory modules, this board can sustain a peak throughput of 25.6GB/s to main memory—enough to saturate its dual 1600MHz front-side buses. Our Skulltrail sample came from Intel with a pair of 800MHz FB-DIMMs, and that’s how we tested it. Adding two more memory modules would have increased the system’s peak bandwidth, but at the cost of additional memory access latency and power consumption.

The D5400XS motherboard — continued

One thing you won’t find on the D5400XS is a legacy port—not a PS/2 mouse or keyboard plug, not a serial or parallel port, and no floppy drive connector. Those things are all missing. The only obvious concession Intel has made to older standards is the inclusion of an ATA connector presumably intended for optical drives. I can’t say I’d miss anything Intel has left out here, and it does make for a tidy port cluster, despite the board’s ample array of connections for standards invented in this century.

Even with the omission of legacy I/O and the use of the EATX form factor, real estate on this board is tight. We were only able to fit the brackets for these Zalman coolers onto the board in one specific orientation around each socket. In neither case did that orientation put the retention bracket’s larger, curved edge in the proper place to allow clearance for the CPU retention lever. As a result, we’ll have to remove the bracket in order to swap out either processor.

In the picture above, you can also see the board’s dual eight-pin power connectors situated between the sockets. Seeing these together offers a hint about something you may have suspected: a setup like this requires a potent power supply unit. Intel recommends a kilowatt or better PSU for a system with 4GB of memory, two GPUs, and two CPUs, and if you want to go for it all with four GPUs and 8GB of memory, they recommend a PSU rated for over 1400W. Holy moly. Fortunately, though, we were able to power our test rig quite well with “only” a 1kW unit, PC Power & Cooling’s Turbo-Cool 1kW-SR, which has dual eight-pin aux power connectors. Even more fortunately, you may be able to get by with a lesser PSU than the one we used, since Intel says that connecting both of those eight-pin plugs is only necessary for “extreme overclocking.”

Speaking of which, we should talk about precisely what Intel has wrought with the D5400XS’s BIOS. Here’s a look at the voltage ranges and basic tweaking options exposed in the BIOS menu.


Bus speeds

FSB: 133-550MHz in
1MHz increments

PCIe: 100-120MHz in 1MHz increments

DRAM: 667, 800MHz (Auto by SPD only)


Bus multipliers
CPU: 5x-40x
Voltages CPU 0: 1.2875-1.6V
in 0.0125V increments

CPU 1: 1.2875-1.6V in 0.0125V increments

DRAM (1.8V): 1.8-2.8V in 0.04V increments

DRAM (1.5V): 1.5-2.5V in 0.04V increments

FSB: 1.1-1.5V in 0.025V increments
NB
:
1.275-1.6V in 0.025V increments


Monitoring
Voltage, fan
status, and temperature monitoring

The Skulltrail board’s voltage ranges and granularity of control are in league with some of the best enthusiast-class mobos on the market, with 1MHz increments for the FSB clocks and 0.0125V increments for CPU voltage. Not only that, but Intel augments the numbers you see above with additional options that add incremental steps of 300 mV and allow the CPU and FSB reference voltages to be modified further. Yes, the max CPU voltage could be somewhat higher, like the hair-raising 2.35V available on Gigabyte’s GA-X38-DQ6, but this board probably offers sufficient range for use with 45nm processors and all but the most exotic chilling schemes.

When coupled with the unlocked Core 2 Extreme QX9775 processors we used for testing, overclocking can be as simple as turning up the CPU multiplier. What may be more exciting, though, is the prospect of grabbing a pair of, say, Xeon E5410 processors for just over 300 bucks each, with a native clock speed of 2.33GHz and FSB speed of 1333MHz, and taking them up from there. Chips like these—or, even better, their new 45nm variants—could sustain some titanic overclocking exploits inside a Skulltrail rig.

So Intel has given us plenty of leeway to fine-tune our abuse of its processors, but the BIOS’s knobs and dials offer less control on other fronts. The south bridge voltage can’t be modified. Although the menu allows tweaking of the four major memory timing parameters we all know and love, delving into the deep voodoo magic settings beyond those isn’t possible. Such things are likely to be strange and foreign territory for FB-DIMMs, anyhow, I suppose. The bigger disappointment is the utter lack of clock speed control for those FB-DIMMs. You’re driving at the SPD limit, like it or not.

The D5400XS does have extensive monitoring capabilities, with readouts for nine voltage values, eight fan speeds, and six temperature zones. The board uses those values to modulate the speed of some CPU and system fans, but here’s the catch: no speed control options are exposed in the BIOS menu. Nada. No enable/disable choices, no temperature targets, nothing. That fact is punctuated by the high-pitched drone of the board’s south bridge cooler. The south bridge’s voltage and temp aren’t shown in the BIOS monitoring screen, and this fan apparently doesn’t respond to changes in temperature at all. I’ve already started playing with sticking a manual control unit on this fan, because it annoys me. I know that a board like this one will probably involve lots of heat and noise by its nature, but to me, part of building a really good system is tweaking it out for the optimal mix of cooling and acoustics. I’d like to see the kind of control here we find on most enthusiast mobos.

Along similar lines, the pre-release D5400XS BIOS we used didn’t yet support either C1E halt or SpeedStep dynamic clock throttling. These things may not make a Skulltrail rig especially cool or quiet, but they’re still features I want. Intel says it will have them working by the time the board is released to the public. Skulltrail systems should become available within the next 30 days.

I should mention a few more things before we finish with the D5400XS. The picture above gives you a view of several nice touches. Contrary to how it may look in the picture, one of those nice touches is the placement of the board’s six SATA ports. Because this board is 13″ deep, those ports won’t interfere with the installation of extra-long graphics cards like the Radeon HD 3870 X2. The shrouded south bridge cooler is also quite obviously tailored for keeping things cool with a full slate of PCIe x16 slots populated, and it shouldn’t present any clearance problems, either. Two other touches that will warm any DIYer’s heart are the POST-code LEDs on the bottom corner of the board (two gray rectangular doodads in the picture) and the built-in power and reset buttons along the board’s edge—very helpful for testing and troubleshooting.

The deal with multi-GPU configs

One of the things that makes Skulltrail unique is its support for both Nvidia’s SLI multi-GPU scheme and AMD’s similar CrossFire technology. Both GPU makers tend to limit support for their multi-GPU capabilities on third-party chipsets in order to sell more of their own chipsets. ATI (now AMD’s graphics division) has long made an exception for Intel’s chipsets, but Nvidia has kept SLI largely exclusive to its own nForce lineup. Intel went to great lengths in order to incorporate SLI support into Skulltrail, and it affects the physical hardware on the board. Have a look at the block diagram below to see how.

The bit of the diagram we’re concerned with is the top left portion, where the north bridge (or MCH) connects to the PCIe x16 slots. In between the MCH and each pair of PCIe x16 graphics slots sits an nForce 100 PCI Express switch chip from Nvidia. Each nForce 100 has 16 lanes routed to each PCIe x16 slot and 16 lanes back to the MCH. This is by no means the most optimal configuration, because Nvidia’s switch chip only supports PCIe version 1.1, whereas the Seaburg MCH supports PCIe 2.0, with twice the bandwidth per lane. The more optimal configuration would be to route eight lanes of PCIe 2.0 connectivity directly from the MCH to each of the four PCIe x16 slots. The Seaburg MCH is designed to work this way, in fact, and such a configuration would yield fewer chips, fewer hops between chips, less complex trace routing, lower power consumption, lower costs, and twice the total bandwidth back to the MCH (via 32 lanes of PCIe 2.0 instead of 32 lanes of PCIe 1.1).

So why did Intel do things this way? Because by incorporating Nvidia’s chips, they could win Nvidia’s blessing for SLI configurations on this motherboard. You’d think that simply paying a license fee to Nvidia would be sufficient, but Nvidia apparently wanted to maintain the (technical) fig leaf covering its true (business) reasons for locking third-party chipsets out of SLI. Nvidia continues to claim intermittently that the PCI Express implementations on its nForce chipsets contain special sauce to make SLI work. I’m not buying it. CrossFire works quite well on Intel’s chipsets, and it’s a very, very similar thing.

Anyhow, Intel played along and put Nvidia’s PCIe 1.1 switch chips on the Skulltrail board—two of them, in fact, to make three- and four-way SLI possible. After all, doing so only makes a certain kind of sense when you’re setting out to build an “ultimate” motherboard like this one.

Here’s where things get sticky. I was all set to try out three-way SLI on our Skulltrail board, just for kicks, but when I asked Intel about this possibility, they told me I’d need to check with Nvidia about drivers to make it happen. When I asked Nvidia about it, I got this response:

3-way SLI is not supported on Skulltrail because the MCP bridge chip only supports communication between two graphics cards. There is no driver workaround for this.

Uh, yeah. At this point, I reminded the folks at Nvidia that Skulltrail’s dual nForce 100 configuration is, if anything, more elegant than the switch-chip-plus-south-bridge mish-mash on the nForce 780i. I told them I wasn’t buying it.

Then, I went back to Intel and got this official statement from them:

Mechanically and electrically, with 4 PCIe slots, Skulltrail can support up to 4 graphics cards. Drivers and validation and are up to the graphics card vendors as always.

Somewhere in a back room at Intel, as the PR rep uttered these words, an engineer must have bit his tongue in two.

A little later, I got an interesting reply from Nvidia essentially admitting its earlier explanation didn’t make any sense. I’m still awaiting a better rationale from Nvidia, but if I were betting, I’d say we’ll never see support for three- or four-way SLI on Skulltrail.

Skulltrail is a niche product, and it’s not one I care particularly about, but these shenanigans make me angrier than anything I’ve seen from this industry in a good while. This is the kind of crap that makes folks give up on PC gaming and go buy an Xbox. I don’t believe for a minute that this is about anything other than vindictiveness. I’ve heard credible rumors that Nvidia has seeded some PC makers with three-way SLI drivers that work perfectly on Skulltrail. I’ve also heard whispers that Intel is paying as much as $100 per motherboard for those nForce 100 chips. Yet Nvidia is still locking them out. Sheesh.

On the bright side, Skulltrail ought to work fine with AMD’s CrossFire X when it arrives—even though, heh, the D5400XS’s PCIe x16 slots are driven by Nvidia silicon. Imagine that.

Our testing methods

As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.

Our test systems were configured like so:

Processor Core 2 Quad Q6600 2.4GHz
Core 2 Extreme QX6800 2.93GHz
Core 2 Duo E6750 2.66GHz
Core 2 Extreme QX6850 3.00GHz
Core
2 Extreme QX9770 3.2GHz
Dual
Xeon
X5365
3.00GHz
Dual
Core
2 Extreme QX9775 3.2GHz
Athlon 64 X2 5600+ 2.8GHz
Athlon 64 X2 6000+ 3.0GHz
Athlon 64 X2 6400+ 3.2GHz
Dual Athlon 64 FX-74 3.0GHz Phenom
9600
2.3GHz

Phenom engineering sample (ES) 2.6GHz
Core 2 Extreme QX9650 3.00GHz
System bus 1066MHz (266MHz quad-pumped) 1333MHz (333MHz quad-pumped) 1600MHz
(400MHz quad-pumped)
1333MHz
(333MHz quad-pumped)
1600MHz
(400MHz quad-pumped)
1GHz HyperTransport 1GHz HyperTransport 1GHz HyperTransport 1GHz HyperTransport
Motherboard Gigabyte GA-P35T-DQ6 Gigabyte GA-P35T-DQ6 Gigabyte
GA-X38-DQ6
Intel
S5000VXN
Intel
D5400XS
Asus M2N32-SLI Deluxe Asus L1N64-SLI WS MSI
K9A2 Platinum
Asus
M3A32-MVP Deluxe
BIOS revision F1 F1 F6b S5000.86B.06.00.0076.

0409200070751

XS54010J.86A.0780.

2008.0110.1956

1201 0505 VP.0B7
(No patch)

V1.2B1 (TLB patch)

0307
F4
North bridge P35 Express MCH P35 Express MCH X38
Express MCH
5000X
MCH
5400
MCH
nForce 590 SLI SPP nForce 680a SLI 790FX 790FX
South bridge ICH9R ICH9R ICH9R 6231ESB ICH 6321ESB ICH nForce 590 SLI MCP nForce 680a SLI SB600 SB600
Chipset drivers INF Update 8.3.0.1013

Intel Matrix Storage Manager 7.5

INF Update 8.3.0.1013

Intel Matrix Storage Manager 7.5

INF Update 8.3.0.1013

Intel Matrix Storage Manager 7.5

INF
Update 8.3.0.1013

Intel Matrix Storage Manager 7.5

INF Update
8.5.0.1009

Intel Matrix Storage Manager 7.8

ForceWare 15.01 ForceWare 15.01
Memory size 4GB (4 DIMMs) 4GB (4 DIMMs) 4GB (4 DIMMs) 4GB
(4 DIMMs)
4GB
(2 DIMMs)
4GB (4 DIMMs) 4GB (4 DIMMs) 4GB (4 DIMMs) 4GB (4 DIMMs)
Memory type Corsair TWIN3X2048-1333C9DHX

DDR3 SDRAM at 1066MHz

Corsair TWIN3X2048-1333C9DHX

DDR3 SDRAM at 1333MHz

Corsair TWIN2X2048-8500C5D

DDR2 SDRAM at 800MHz

Samsung ECC DDR2-667
FB-DIMM at 667MHz
Micron
ECC DDR2-800 FB-DIMM at 800MHz
Corsair TWIN2X2048-8500

DDR2 SDRAM at ~800MHz

Corsair TWIN2X2048-8500C5D

DDR2 SDRAM at ~ 800MHz

Corsair TWIN2X2048-8500C5D

DDR2 SDRAM at 800MHz

Corsair TWIN2X2048-8500C5D

DDR2 SDRAM at 800MHz

CAS latency (CL) 8 8 4 5 5 4 4 4 4
RAS to CAS delay (tRCD) 8 9 4 5 5 4 4 4 4
RAS precharge (tRP) 8 9 4 5 5 4 4 4 4
Cycle time (tRAS) 20 24 18 15 18 18 18 18 18
Audio Integrated ICH9R/ALC889A

with Realtek 6.0.1.5449 drivers

Integrated ICH9R/ALC889A

with Realtek 6.0.1.5449 drivers

Integrated
ICH9R/ALC889A

with Realtek 6.0.1.5449 drivers

Integrated
6321ESB/ALC260

with Realtek 6.0.1.5449 drivers

Integrated
6321ESB/STAC9274D5

with SigmaTel 6.10.5511.0 drivers

Integrated nForce 590 MCP/AD1988B

with Soundmax 6.10.2.6100 drivers

Integrated nForce 680a SLI/AD1988B

with Soundmax 6.10.2.6100 drivers

Integrated
SB600/ALC888

with Realtek 6.0.1.5532 drivers

Integrated
SB600/AD1988B

with Soundmax 6.10.2.6180 drivers

Hard drive WD Caviar SE16 320GB SATA
Graphics GeForce 8800 GTX 768MB PCIe with ForceWare 163.11 and 163.71 drivers
OS Windows Vista Ultimate x64 Edition
OS updates KB940105, KB929777 (nForce/790FX systems only), KB938194, KB938979

Please note that testing was conducted in two stages. Non-gaming apps and Supreme Commander were tested with Vista patches KB940105 and KB929777 (nForce systems only) and ForceWare 163.11 drivers. The other games were tested with the additional Vista patches KB938194 and KB938979 and ForceWare 163.71 drivers.

Thanks to Corsair for providing us with memory for our testing. Their products and support are far and away superior to generic, no-name memory.

Our single-socket test systems were powered by OCZ GameXStream 700W power supply units. The dual-socket systems were powered by PC Power & Cooling Turbo-Cool 1KW-SR power supplies. Thanks to OCZ for providing these units for our use in testing.

Also, the folks at NCIXUS.com hooked us up with a nice deal on the WD Caviar SE16 drives used in our test rigs. NCIX now sells to U.S. customers, so check them out.

The test systems’ Windows desktops were set at 1280×1024 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled.

We used the following versions of our test applications:

The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Memory subsystem performance

We’ll start, as ever, with some quick synthetic tests of the memory subsystem, which will help give us the lay of the land before we get into our real-world benchmarks.

This useful little test gives us a look at L2 cache bandwidth. You’ll notice that it’s multithreaded, so systems with more cores show up as having higher L2 cache bandwidth. Not just one processor or cache is being measured. In fact, the four chips that house this Skulltrail rig’s eight cores have 6MB of L2 cache each, for a total of 24MB of L2 cache. All of that cache shows up in this bandwidth test, as Skulltrail’s two Core 2 Extreme QX9775 processors outstrip the previous-generation flagship 65nm Xeon X5365s at the 16MB test block size.

If we isolate the 1GB test block size, we can get a sense of the relative main memory throughput of these systems. Only AMD’s Phenom, with its integrated memory controller, achieves higher throughput here.

Incidentally, we did see higher throughput from a four-DIMM Xeon system in our review of the Stoakley platform, but it came at the expense of….

…higher latencies than we see with our Skulltrail rig. This isn’t bad performance for an FB-DIMM-based system, and the lower latencies are likely to pay dividends with desktop-style workloads.

Team Fortress 2

We’ll kick off our gaming tests with some Team Fortress 2, Valve’s class-driven multiplayer shooter based on the Source game engine. In order to produce easily repeatable results, we’ve tested TF2 by recording a demo during gameplay and playing it back using the game’s timedemo function. In this demo, I’m playing as the Heavy Weapons Guy, with a medic in tow, dealing some serious pain to the blue team.

We tested at 1024×768 resolution with the game’s detail levels set to their highest settings. HDR lighting and motion blur were enabled. Antialiasing was disabled, and texture filtering was set to trilinear filtering only. We used this relatively low display resolution with low levels of filtering and AA in order to prevent the graphics card from becoming a primary performance bottleneck, so we could show you the performance differences between the CPUs.

Notice the little green plot with four lines above the benchmark results. That’s a snapshot of the CPU utilization indicator in Windows Task Manager, which helps illustrate how much the application takes advantage of up to four CPU cores, when they’re available. I’ve included these Task Manager graphics whenever possible throughout our results. In this case, Team Fortress 2 looks like it probably only takes full advantage of a single CPU core, although Nvidia’s graphics drivers use multithreading to offload some vertex processing chores.

TF2 barely makes use of more than one CPU core, let alone eight. That’s not terribly surprising for today’s games, nor is it surprising to see the Skulltrail system performing well here anyhow. Intel’s top two 45nm desktop quad-core CPUs are slightly faster, probably because their DDR3 memory subsystem has lower latencies.

Lost Planet: Extreme Condition
Lost Planet puts the latest hardware to good use via DirectX 10 and multiple threads—as many as eight, in fact. Lost Planet‘s developers have built a benchmarking tool into the game, and it tests two different levels: a snow-covered outdoor area with small numbers of large villains to fight, and another level set inside of a cave with large numbers of small, flying creatures filling the air. We’ll look at performance in each.

We tested this game at 1152×864 resolution, largely with its default quality settings. The exceptions: texture filtering was set to trilinear, edge antialiasing was disabled, and “Concurrent operations” was set to match the number of CPU cores available.

Lost Planet engages all eight of Skulltrail’s cores in its Cave level, producing the highest score we’ve ever recorded on this test.

BioShock

We tested BioShock by manually playing through a specific point in the game five times while recording frame rates using the FRAPS utility. The sequence? Me trying to fight a Big Daddy, or more properly, me trying not to die for 60 seconds at a pop.

This method has the advantage of simulating real gameplay quite closely, but it comes at the expense of precise repeatability. We believe five sample sessions are sufficient to get reasonably consistent results. In addition to average frame rates, we’ve included the low frame rates, because those tend to reflect the user experience in performance-critical situations. In order to diminish the effect of outliers, we’ve reported the median of the five low frame rates we encountered.

For this test, we largely used BioShock‘s default image quality settings for DirectX 10 graphics cards, but again, we tested at a relatively low resolution of 1024×768 in order to prevent the GPU from becoming the main limiter of performance.

Rapture isn’t a happy place for our eight-way systems to visit. The Skulltrail system turns in an acceptable performance, but it’s not as quick here as the top desktop systems from Intel and AMD.

Supreme Commander

We tested performance using Supreme Commander‘s built-in benchmark, which plays back a test game and reports detailed performance results afterward. We launched the benchmark by running the game with the “/map perftest” option. We tested at 1024×768 resolution with the game’s fidelity presets set to “High.”

Supreme Commander’s built-in benchmark breaks down its results into several major categories: running the game’s simulation, rendering the game’s graphics, and a composite score that’s simply comprised of the other two. The performance test also reports good ol’ frame rates, so we’ve included those, as well.

We’ve had a heck of a time trying to tease out big performance differences between CPUs in this game. They don’t come easily and obviously aren’t very large. The Skulltrail system places just below its single-socket Core 2 Extreme siblings. The reality is that having eight cores in today’s games means having between four and six cores essentially idle.

Valve Source engine particle simulation

Next up are a couple of tests we picked up during a visit to Valve Software, the developers of the Half-Life games. They had been working to incorporate support for multi-core processors into their Source game engine, and they cooked up a couple of benchmarks to demonstrate the benefits of multithreading.

The first of those tests runs a particle simulation inside of the Source engine. Most games today use particle systems to create effects like smoke, steam, and fire, but the realism and interactivity of those effects are limited by the available computing horsepower. Valve’s particle system distributes the load across multiple CPU cores.

The Skulltrail rig excels here, showing the performance potential for its eight cores in future games.

Valve VRAD map compilation

This next test processes a map from Half-Life 2 using Valve’s VRAD lighting tool. Valve uses VRAD to precompute lighting that goes into games like Half-Life 2. This isn’t a real-time process, and it doesn’t reflect the performance one would experience while playing a game. Instead, it shows how multiple CPU cores can speed up game development.

This one is another clear win for Skulltrail, which finishes 10 seconds ahead of the dual Xeon X5365s and in under half the time of a Core 2 Quad Q6600.

WorldBench

WorldBench’s overall score is a pretty decent indication of general-use performance for desktop computers. This benchmark uses scripting to step through a series of tasks in common Windows applications and then produces an overall score for comparison. WorldBench also records individual results for its component application tests, allowing us to compare performance in each. We’ll look at the overall score, and then we’ll show individual application results alongside the results from some of our own application tests. Because WorldBench’s tests are entirely scripted, we weren’t able to capture Task Manager plots for them, as you’ll notice.

The Skulltrail system unexpectedly takes the top spot in this quintessentially desktop-oriented benchmark. Let’s look at WorldBench’s component tests to see why that is.

Productivity and general use software

MS Office productivity

This WorldBench component has a multitasking element, since several Office apps are in use at once. As a result, the systems with four or more cores are able to finish first, led by Skulltrail.

Firefox web browsing

The top two finishers in this test are the systems with 3.2GHz Intel processors and 1600MHz front-side buses.

Multitasking – Firefox and Windows Media Encoder

Here’s another WorldBench component with a multitasking element, but here, the dual QX9775s trail their single-socket counterpart.

WinZip file compression

The tables turn in WinZip, as the Skulltrail system beats its single-socket sibling.

Nero CD authoring

Performance in the Nero test is largely dependent upon the disk controller and is somewhat inconsistent from one run to the next. I wouldn’t put too much stock in it.

Image processing

Photoshop

Skulltrail performs well in Photoshop, finishing behind only the QX9770.

The Panorama Factory photo stitching
The Panorama Factory handles an increasingly popular image processing task: joining together multiple images to create a wide-aspect panorama. This task can require lots of memory and can be computationally intensive, so The Panorama Factory comes in a 64-bit version that’s multithreaded. I asked it to join four pictures, each eight megapixels, into a glorious panorama of the interior of Damage Labs. The program’s timer function captures the amount of time needed to perform each stage of the panorama creation process. I’ve also added up the total operation time to give us an overall measure of performance.

Here’s an application that uses eight threads well, giving Skulltrail the chance to shine—which it does.

picCOLOR image analysis

picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including MMX, SSE2, and Hyper-Threading. Naturally, he’s ported picCOLOR to 64 bits, so we can test performance with the x86-64 ISA. Eight of the 12 functions in the test are multithreaded, and in this latest revision, five of those eight functions use four threads.

Scores in picCOLOR, by the way, are indexed against a single-processor Pentium III 1 GHz system, so that a score of 4.14 works out to 4.14 times the performance of the reference machine.

With support for only four threads here, the QX9775 can’t overcome the single-socket quad cores that share its Penryn-derived processing cores. Dr. Müller is working on an eight-threaded version of picCOLOR, though. We’ll have to try it out next time.

Video encoding and editing

VirtualDub and DivX encoding with SSE4

Here’s a brand-new addition to our test suite that should allow us to get a first look at the benefits of SSE4’s instructions for video acceleration. In this test, we used VirtualDub as a front-end for the DivX codec, asking it to compress a 66MB MPEG2 source file into the higher compression DivX format. We used version 6.7 of the DivX codec, which has an experimental full-search function for motion estimation that uses SSE4 when available and falls back to SSE2 when needed. We tested with most of the DivX codec’s defaults, including its Home Theater base profile, but we enabled enhanced multithreading and, of course, the experimental full search option.

SSE4 brings formidable performance gains to this DixV motion-estimation algorithm, but only four of Skulltrail’s eight cores are able to help out here. Once again, it’s no faster than the single-socket quads of the same generation.

Windows Media Encoder x64 Edition video encoding

Windows Media Encoder is one of the few popular video encoding tools that uses four threads to take advantage of quad-core systems, and it comes in a 64-bit version. Unfortunately, it doesn’t appear to use more than four threads, even on an eight-core system. For this test, I asked Windows Media Encoder to transcode a 153MB 1080-line widescreen video into a 720-line WMV using its built-in DVD/Hardware profile. Because the default “High definition quality audio” codec threw some errors in Windows Vista, I instead used the “Multichannel audio” codec. Both audio codecs have a variable bitrate peak of 192Kbps.

Windows Media Encoder video encoding

Roxio VideoWave Movie Creator

The remainder of our video encoding tests all show the same thing: Skulltrail’s fast, but bringing eight cores to a quad fight is overkill.

LAME MT audio encoding

LAME MT is a multithreaded version of the LAME MP3 encoder. LAME MT was created as a demonstration of the benefits of multithreading specifically on a Hyper-Threaded CPU like the Pentium 4. Of course, multithreading works even better on multi-core processors. You can download a paper (in Word format) describing the programming effort.

Rather than run multiple parallel threads, LAME MT runs the MP3 encoder’s psycho-acoustic analysis function on a separate thread from the rest of the encoder using simple linear pipelining. That is, the psycho-acoustic analysis happens one frame ahead of everything else, and its results are buffered for later use by the second thread. That means this test won’t really use more than two CPU cores.

We have results for two different 64-bit versions of LAME MT from different compilers, one from Microsoft and one from Intel, doing two different types of encoding, variable bit rate and constant bit rate. We are encoding a massive 10-minute, 6-second 101MB WAV file here.

Here’s another result largely within expectations. Don’t overlook how the dual Xeon X5365 system finishes here, though; it’s well off the pace. The reality is that Skulltrail corrects for some of the older Xeon platform’s weaknesses. The Stoakley MCH’s faster front-side buses and more robust snoop filter, coupled with fewer and faster FB-DIMMs for lower latencies, have muted the negatives of its predecessor.

Cinebench rendering

Graphics is a classic example of a computing problem that’s easily parallelizable, so it’s no surprise that we can exploit a multi-core processor with a 3D rendering app. Cinebench is the first of those we’ll try, a benchmark based on Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. This test runs with just a single thread and then with as many threads as CPU cores are available.

This is the kind of application where a pair of Core 2 Extreme QX9775s can really come into their own. They even put some pretty good distance between themselves and the older-gen Xeons, probably thanks to the Penryn core’s clock-for-clock improvements in certain areas, like its faster divider.

POV-Ray rendering

We caved in and moved to the beta version of POV-Ray 3.7 that includes native multithreading. The latest beta 64-bit executable is still quite a bit slower than the 3.6 release, but it should give us a decent look at comparative performance, regardless.

3ds max modeling and rendering

The Skulltrail system’s dominance continues through our POV-Ray and 3dsmax rendering tests. For this sort of work, this system is ideal.

Folding@Home

Next, we have a slick little Folding@Home benchmark CD created by notfred, one of the members of Team TR, our excellent Folding team. For the unfamiliar, Folding@Home is a distributed computing project created by folks at Stanford University that investigates how proteins work in the human body, in an attempt to better understand diseases like Parkinson’s, Alzheimer’s, and cystic fibrosis. It’s a great way to use your PC’s spare CPU cycles to help advance medical research. I’d encourage you to visit our distributed computing forum and consider joining our team if you haven’t already joined one.

The Folding@Home project uses a number of highly optimized routines to process different types of work units from Stanford’s research projects. The Gromacs core, for instance, uses SSE on Intel processors, 3DNow! on AMD processors, and Altivec on PowerPCs. Overall, Folding@Home should be a great example of real-world scientific computing.

notfred’s Folding Benchmark CD tests the most common work unit types and estimates performance in terms of the points per day that a CPU could earn for a Folding team member. The CD itself is a bootable ISO. The CD boots into Linux, detects the system’s processors and Ethernet adapters, picks up an IP address, and downloads the latest versions of the Folding execution cores from Stanford. It then processes a sample work unit of each type.

On a system with two CPU cores, for instance, the CD spins off a Tinker WU on core 1 and an Amber WU on core 2. When either of those WUs are finished, the benchmark moves on to additional WU types, always keeping both cores occupied with some sort of calculation. Should the benchmark run out of new WUs to test, it simply processes another WU in order to prevent any of the cores from going idle as the others finish. Once all four of the WU types have been tested, the benchmark averages the points per day among them. That points-per-day average is then multiplied by the number of cores on the CPU in order to estimate the total number of points per day that CPU might achieve.

This may be a somewhat quirky method of estimating overall performance, but my sense is that it generally ought to work. We’ve discussed some potential reservations about how it works here, for those who are interested. I have included results for each of the individual WU types below, so you can see how the different CPUs perform on each.

This thing is an absolute beast for Folding. Please, buy one and join Team TR. Please.

MyriMatch proteomics

Our benchmarks sometimes come from unexpected places, and such is the case with this one. David Tabb is a friend of mine from high school and a long-time TR reader. He recently offered to provide us with an intriguing new benchmark based on an application he’s developed for use in his research work. The application is called MyriMatch, and it’s intended for use in proteomics, or the large-scale study of protein. I’ll stop right here and let him explain what MyriMatch does:

In shotgun proteomics, researchers digest complex mixtures of proteins into peptides, separate them by liquid chromatography, and analyze them by tandem mass spectrometers. This creates data sets containing tens of thousands of spectra that can be identified to peptide sequences drawn from the known genomes for most lab organisms. The first software for this purpose was Sequest, created by John Yates and Jimmy Eng at the University of Washington. Recently, David Tabb and Matthew Chambers at Vanderbilt University developed MyriMatch, an algorithm that can exploit multiple cores and multiple computers for this matching. Source code and binaries of MyriMatch are publicly available.
In this test, 5555 tandem mass spectra from a Thermo LTQ mass spectrometer are identified to peptides generated from the 6714 proteins of S. cerevisiae (baker’s yeast). The data set was provided by Andy Link at Vanderbilt University. The FASTA protein sequence database was provided by the Saccharomyces Genome Database.

MyriMatch uses threading to accelerate the handling of protein sequences. The database (read into memory) is separated into a number of jobs, typically the number of threads multiplied by 10. If four threads are used in the above database, for example, each job consists of 168 protein sequences (1/40th of the database). When a thread finishes handling all proteins in the current job, it accepts another job from the queue. This technique is intended to minimize synchronization overhead between threads and minimize CPU idle time.

The most important news for us is that MyriMatch is a widely multithreaded real-world application that we can use with a relevant data set. MyriMatch also offers control over the number of threads used, so we’ve tested with one to eight threads.

I should mention that performance scaling in Myrimatch tends to be limited by several factors, including memory bandwidth, as David explains:

Inefficiencies in scaling occur from a variety of sources. First, each thread is comparing to a common collection of tandem mass spectra in memory. Although most peptides will be compared to different spectra within the collection, sometimes multiple threads attempt to compare to the same spectra simultaneously, necessitating a mutex mechanism for each spectrum. Second, the number of spectra in memory far exceeds the capacity of processor caches, and so the memory controller gets a fair workout during execution.

Here’s how the processors performed.

Skulltrail crunches through our Myrimatch test workload in about half the time it takes a Core 2 Quad Q6600 to do the same work. The dual QX9775s again achieve solid gains over the older Xeons, as well.

STARS Euler3d computational fluid dynamics

Charles O’Neill works in the Computational Aeroservoelasticity Laboratory at Oklahoma State University, and he contacted us to suggest we try the computational fluid dynamics (CFD) benchmark based on the STARS Euler3D structural analysis routines developed at CASELab. This benchmark has been available to the public for some time in single-threaded form, but Charles was kind enough to put together a multithreaded version of the benchmark for us with a larger data set. He has also put a web page online with a downloadable version of the multithreaded benchmark, a description, and some results here. (I believe the score you see there at almost 3Hz comes from our eight-core Clovertown test system.)

In this test, the application is basically doing analysis of airflow over an aircraft wing. I will step out of the way and let Charles explain the rest:

The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, taper ratio of 0.66, and a quarter-chord sweep angle of 45º. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes.
The CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes . . . . The benchmark executable advances the Mach 0.50 AGARD flow solution. A benchmark score is reported as a CFD cycle frequency in Hertz.

So the higher the score, the faster the computer. Charles tells me these CFD solvers are very floating-point intensive, but oftentimes limited primarily by memory bandwidth. He has modified the benchmark for us in order to enable control over the number of threads used. Here’s how our contenders handled the test with different thread counts.

We have a new all-time record holder for this benchmark, beating out the Xeon E5472 system we tested last fall.

SiSoft Sandra Mandelbrot

Next up is SiSoft’s Sandra system diagnosis program, which includes a number of different benchmarks. The one of interest to us is the “multimedia” benchmark, intended to show off the benefits of “multimedia” extensions like MMX, SSE, and SSE2. According to SiSoft’s FAQ, the benchmark actually does a fractal computation:

This benchmark generates a picture (640×480) of the well-known Mandelbrot fractal, using 255 iterations for each data pixel, in 32 colours. It is a real-life benchmark rather than a synthetic benchmark, designed to show the improvements MMX/Enhanced, 3DNow!/Enhanced, SSE(2) bring to such an algorithm.
The benchmark is multi-threaded for up to 64 CPUs maximum on SMP systems. This works by interlacing, i.e. each thread computes the next column not being worked on by other threads. Sandra creates as many threads as there are CPUs in the system and assignes [sic] each thread to a different CPU.

We’re using the 64-bit version of Sandra. The “Integer x16” version of this test uses integer numbers to simulate floating-point math. The floating-point version of the benchmark takes advantage of SSE2 to process up to eight Mandelbrot iterations in parallel.

Yikes. I suppose those results are self-explanatory.

Power consumption and efficiency

Now that we’ve had a look at performance in various applications, let’s bring power efficiency into the picture. Our Extech 380803 power meter has the ability to log data, so we can capture power use over a span of time. The meter reads power use at the wall socket, so it incorporates power use from the entire system—the CPU, motherboard, memory, graphics solution, hard drives, and anything else plugged into the power supply unit. (We plugged the computer monitor into a separate outlet, though.) We measured how each of our test systems used power across a set time period, during which time we ran Cinebench’s multithreaded rendering test.

Almost all of the systems had their power management features (such as SpeedStep and Cool’n’Quiet) enabled during these tests via Windows Vista’s “Balanced” power options profile. The big exception here, of course, is Skulltrail, since its pre-release BIOS doesn’t support SpeedStep.

Anyhow, here are the results:

Let’s slice up the data in various ways in order to better understand them. We’ll start with a look at idle power, taken from the trailing edge of our test period, after all CPUs have completed the render.

Even without SpeedStep to dial the clock speed and voltage back at idle, the Skulltrail system draws considerably less power than the older dual Xeon rig, and it’s right in line with AMD’s Quad FX system.

Next, we can look at peak power draw by taking an average from the ten-second span from 30 to 40 seconds into our test period, during which the processors were rendering.

Skulltrail’s peak power draw of about 370W is surprisingly tame, all things considered. This is a nice improvement over the Xeon X5365s, even if it’s not exactly sipping power.

Another way to gauge power efficiency is to look at total energy use over our time span. This method takes into account power use both during the render and during the idle time. We can express the result in terms of watt-seconds, also known as joules.

All told, this eight-core system only uses slightly more energy during our test period than our Athlon 64 X2 6000+-based test rig.

We can quantify efficiency even better by considering the amount of energy used to render the scene. Since the different systems completed the render at different speeds, we’ve isolated the render period for each system. We’ve then computed the amount of energy used by each system to render the scene. This method should account for both power use and, to some degree, performance, because shorter render times may lead to less energy consumption.

Here’s the multicore system’s ace in the hole. Because it finishes the rendering work so quickly, Skulltrail requires less total energy than all but one of our other test systems.

Overclocking

With the unlocked multipliers on these Core 2 Extreme QX9775 processors, Skulltrail overclocking is embarrassingly easy. I was able to get the system to boot into Windows with both CPUs at 4GHz by setting their voltage to 1.35V, one-tenth of a volt above stock. However, the system wasn’t stable enough to run our eight-threaded Euler3D CFD benchmark without crashing at those settings. Raising both CPUs’ voltage to 1.375V didn’t quite cut it, but 1.4V seemed to do the trick. This is on air cooling using those Zalman coolers pictured earlier in the article. They’re nice coolers, but nothing too exotic.

4GHz wasn’t hard to attain. CPU-Z has the voltage wrong, though.

Let me just say: Holy crap! Here’s how the system performed at 4GHz.

That’s…. acceptable. We need to try putting some lower-speed, less expensive Xeons into this board to see what they can do. I’ll give that a shot when I can. For now, though, it’s pretty clear this Skulltrail rig has some overclocking headroom built in, despite the QX9775 processor’s status as a top-of-the-line product. That’s unusual, and we’ll take it.

Conclusions

If you came into this review expecting that Skulltrail’s “more is better” approach would yield consistent performance dividends across a broad range of applications, you may be disappointed to find that this system’s eight fast CPU cores can’t always outrun the top single-socket quad-core systems. The reality is that only certain types of applications will fully harness the power of a system this exotic—and we are, by and large, using the best examples of widely multithreaded software we could find in their respective categories. Programs that don’t scale well beyond four threads are the norm, not the exception. That’s both a developer support problem and a computer science one; extracting performance from multiple CPU cores can be difficult.

Many of us have seen this sort of thing before, though, and once you look beyond the scaling question, Skulltrail is exciting on a number of fronts. It is, in fact, the fastest computer ever to grace the dank confines of Damage Labs. Because it’s based on the Stoakley platform, Skulltrail avoids the performance pitfalls we saw in V8, Intel’s last attempt to push two sockets on the desktop. This system performs well at almost any task, even if six or seven of its cores are twiddling their little silicon thumbs, so to speak. (Ok, dork. -Ed.) We’re also immensely gratified to see that Intel has made a full-bore effort to deliver a truly acceptable enthusiast-class motherboard this time around. This is what we asked for back when we reviewed the V8, and Intel deserves major kudos for taking that criticism into account. We still wish this board’s BIOS offered better control over a number of things, like fan speeds and memory clocks, but this is an awful lot of progress in a short span of time.

Of course, performance and power this prodigious will come at a price. We don’t yet have official numbers from Intel, but Skulltrail-based systems should be available for sale in the next month or so. I wouldn’t be surprised to see the core components—the D5400XS motherboard, dual Core 2 Extreme QX9775 processors, and two 2GB 800MHz FB-DIMMs—add up to over four grand, when that day comes. If that sort of price tag doesn’t scare you off, you probably fall into one of two categories. Either you’re a professional who works in a field where this order of computing power can save you tremendous amounts of time—game development, digital content creation, scientific computing—or you’re a hobbyist with plenty of cash to burn and bragging rights on the line. Whatever the case, you can rest easy knowing that Skulltrail successfully bridges the gap between desktops and workstations, laying the foundation for the ultimate personal computer.

Comments closed
    • AndrewBurgess
    • 11 years ago

    “Adding two more memory modules would have increased the system’s peak bandwidth, but at the cost of additional memory access latency and power consumption.”

    With more memory I think the latency would be identical or 1/2 depending on the access pattern. With parallel access the bandwidth doubles and the latency per byte is 1/2.

    • Snake
    • 12 years ago

    It would be MORE interesting to read the review of the Mac implimentation of this…because, apparently, this must be what the top Mac Pro is using. The specs line up.

    OSX, and the apps that run on it, seem to handle SMP much better than Windows ever has. So, therefore, remove the OS as the “limiting” factor and see what this puppy can do on alternate configurations.

    • Mr Bill
    • 12 years ago

    I wonder if this will galvanized AMD into doing something more with the FX platform, perhaps with an ATI chipset.

    • pogsnet
    • 12 years ago
      • moose17145
      • 12 years ago

      Oh there is an imminent NEED for it… actually an imminent need well beyond what this setup can do even. Just so happens that imminent need is not for gaming yet. lol. Like then ending conclusion said, you are either a hardware enthusiast who has too much money burning a hole in your pocket or you are a professional. Personally i could see professionals going for something like this that are also into gaming. In which case you have a great workstation that also happens to be awesome at gaming.

    • Bensam123
    • 12 years ago

    And yet very few games really make use of two cores, let alone one. Not even a reason to buy a Q6600 yet let alone one of these. Unless you don’t plan on upgrading for the next 4-5 years, which is probably right considering the amount of money you’ll have to spend on one. :l

    • carbondude
    • 12 years ago

    First of all, you don’t have to use the QX9775s. In fact, getting one of Intel’s extreme processor isn’t worth it already, getting two of them is like burning money. I personally think that dual E5410s overclocked to 1600FSB is a nice respectable 8 cores at 2.8Ghz vs the 3.2Ghz of the QX9775s. So about 10% slower. Dual E5410s cost about $550 for both, where as Dual QX9775s cost about $3000.

    Second, unless you want to SLI on this platform, any Seaburg chipset motherboard can give you the same performance. Wait for the Asus Z7S to come out to get the same overclocking for about half the price of the Intel board. Asus Z7S doesn’t support SLI though.

    To overcome the SLI problem, just wait for the Nvidia 9800GX2 to come out. Having one of them will make SLI automatically work for you.

    I would recommend the following system for anyone interested in Skulltrail: Prices taken from §[<http://www.froogle.com<]§ Asus Z7S $400 Dual E5410s if you want to overclock to 1600FSB $550 Dual L5410s not overclocked $700 4x 1.5V Low voltage Supermicro 2gb DDR2-800 FB-DIMMs $410 Nvidia 9800GX2 coming out in two weeks. $450 Samsung F1 750GB hard drive $150 Coolermaster Stacker case with 1000W power supply $200 DVD drive $30 That's a really really nice system for $2300 dollars or so. Hardly the 5000 dollar figure being thrown around by review sites simply because of the ridiculous pricing for the QX9775s.

    • PRIME1
    • 12 years ago

    l[http://en.wikipedia.org/wiki/List_of_computer_technology_code_names<]§

      • PenGun
      • 12 years ago

      They left off my favorite … TWAIN or Technology Without An Important Name.

        • titan
        • 12 years ago

        Oh man, I thought it was going to be Technology Without An Imminent Need.

        • Mithent
        • 12 years ago

        Unfortunately that was a backronym :p

      • DrDillyBar
      • 12 years ago

      Silence — RIAA project to scan computers for MP3 files and delete them

      • UberGerbil
      • 12 years ago

      I always liked the OS for the Psion: EPOC — Electronic Piece of Cheese
      §[<http://en.wikipedia.org/wiki/EPOC_(computing)<]§

    • herothezero
    • 12 years ago

    Liquor store in Riyadh…that’s great stuff, right there.

    As for the platform, aside from e-peen contests, I don’t really see the point.

      • My Johnson
      • 12 years ago

      I fail to see the point of this platform also. You don’t need 4 graphics cards for a 3D rendering/development platform (CAD/CAM or flight simulator maybe.)

    • derFunkenstein
    • 12 years ago

    no idea what i’d do with this kind of hardware (in other words, there’s no way I’d put it to good enough use), but it’s fun to look at and daydream.

    • Looking for Knowledge
    • 12 years ago

    q[< Skulltrail was already deep into development by the time AMD peed down its leg. <]q Good stuff there! lol

      • flip-mode
      • 12 years ago

      I liked that line too.

      • cynan
      • 12 years ago

      One of the best metaphors on the site yet!

      Ohhh poor AMD…

    • liquidsquid
    • 12 years ago

    Wow, I would rather spend that kind of money on 4 separate rigs. At least if one went south, I would still have three left.

    No, wait, that is like having 4 Chevettes instead of a Corvette. Hmmm. I never knew of a Chevette getting the ladies excited.

      • flip-mode
      • 12 years ago

      But a Skulltrail will thrill the ladies?

        • eitje
        • 12 years ago

        the e-ladies will be e-interested in the extra e-peen.

    • eitje
    • 12 years ago

    while you did give a nod to how the system shipped, it would have been interesting to quantify the impact on memory & power consumption when you had the DIMM slots fully populated.

      • continuum
      • 12 years ago

      Agreed, seeing it fully populated with FB-DIMMs to keep all of the memory channels fed would have been a nice comparison.

      Much like AMD’s QuadFX or Intel’s V8, it is kinda pointless… but man, for distributed computing nerds, 3D rendering, or whatnot– holy crap, I’ll take one! Or two… or more!

      (if someone else is paying, that is… 😉 ).

    • Sargent Duck
    • 12 years ago

    I’m quite surprised in terms of the graphics components. Getting Crossfire to work seems really easy (it could be done on the Southbridge), but to get SLI it seems Intel had to really go out of its way to please Nvidia, in addition to paying them. Wonder why they simply just didn’t say “screw you Nvidia” (well, not in that exact phrase) and just go with Crossfire. Seems they’d be leaving a potential 3 people out in the cold.

      • eitje
      • 12 years ago

      i think intel’s trying to cater to the graphics card company that isn’t owned by their direct CPU competitor.

        • UberGerbil
        • 12 years ago

        And also the one that offered by far the best performance during the span when this was under development.

      • lethal
      • 12 years ago

      IIRC, Nvidia’s share of the multi GPU market dominates Ati’s by a 10:1 ratio, so without Nvidia there isn’t much multi GPU market to speak of.

    • Mithent
    • 12 years ago

    The FB-DIMMs seem to be this platform’s curse.. not only are they expensive, the slower memory architecture is presumably what’s making it slower than single-chip quad cores, except in the cases where more than four cores can be used.

    • Jigar
    • 12 years ago

    Crysis ????

      • ssidbroadcast
      • 12 years ago

      Yeah, same here.

    • deepthought86
    • 12 years ago

    Conclusion: much ado about nothing!

    This platform was conceived as a “me too!” response to 4×4. And it’s about as useful, too. For real workstation users max 4 mem dimms is a joke, and for gamers fb-dimms are a joke. Pay 3 times the price, get a 15% performance increase.

    “Step right up, there’s a sucker born every day!”

      • Nitrodist
      • 12 years ago

      Did you know that TR has a page titled Conclusions?

      §[<http://www.techreport.com/articles.x/14052/19<]§

        • deepthought86
        • 12 years ago

        Yes, yes I do. But my one-line summary was even more abbreviated, and a bit more pessimistic than the TR conclusion.

    • Krogoth
    • 12 years ago

    After reading the review, it seems that Skulltrail is Intel’s version of 4×4 expect that it’s performance does not suck as much.

    Intel tried way too hard to convert their Xeon workstation platform to cater towards hardcore enthusiasts.

    WTF does it need FB-DIMMs and more importantly why does it only have four FB-DIMM slots? It effective defeats the raison d’etre of FB-DIMM for multi-chip Xeons.

    I have already pointed out the stupidity of having two PCI slots and their placement on reference boards.

    It was painfully obvious since the day that Intel announced Skulltrail gaming performance would be inferior to its single-chip counterparts. It is again from FB-DIMMs and underlying platform that was based on is geared for stability not performance.

    The only area where Skulltrail deliver its performance is professional level stuff like ray-tracing and scientific computing.

    It is too bad that Skulltrail platform is crippled in dumbfounded ways to be a viable mid to low-range alternative to real Xeon workstation platforms. Namely, only having 4 FB-DIMM slots.

    I also see that Amdahl’s law at work with some of the multi-threaded benchmarks on 8-core systems.

      • Flying Fox
      • 12 years ago

      q[

    • Nitrodist
    • 12 years ago

    Four VMware installations each running a SMP client.

    Clocked at 4ghz…

    ~2500 PPD * 4…

    10k ppd.

      • Krogoth
      • 12 years ago

      Wrong, it is much closer to 2,500-3,000 points a day.

      Besides, the VM trick only work when SMP client wasn’t available for Windows.

      Windows SMP client performs just as well as the Linux SMP client.

        • Nitrodist
        • 12 years ago

        Wtf do you mean, wrong? I get 2k PPD at 3.2ghz with my E6600 vmwared setup. 4/3.2*2000 = 2500 ppd at 4ghz.

        Unless I’m mistaken, I’m sure that you can have more than one VMware installation and then run those installations concurrently, then assign each installation its affinity through windows.

        And that last one is an outright lie 😛 Times are decreased *[

          • Flying Fox
          • 12 years ago

          You don’t install multiple instances of VMware, you create multiple VMs on the box and run them at the same time. Given VMware server/workstation, you can create VMs with a maximum of 2 “CPUs”. So with 8 cores you can have 4 separate VMs and that’s 4 times the ppd of SMP clients running on dual core machines.

            • Nitrodist
            • 12 years ago

            Bleh, that’s what I mean 😛

          • Krogoth
          • 12 years ago

          Ok, I am looking back at my Q6600@2.93Ghz numbers again.

          It does get 2,500-4,000 PPD with Windows SMP client. However it depends on the system load during the day and what WU it is working on. I blame EOC’s 24 hours avg stat which stats it is only 1,500-1,600 PPD.

          So 2xPenyrn-based Xeons at 3.2Ghz theoretically could peak at 10,000+ PPD, but expect the load to be closer to 7,000-8,000 PPD.

            • Nitrodist
            • 12 years ago

            Try the linux client out. You’ll see a large increase in your %’s.

            When I had the WinSMP client going, I was struggling to meet 24 hours. Now I’m usually at 20 or lower.

            • Krogoth
            • 12 years ago

            Not a problem here, my Q6600@2.93Ghz can crunch down almost two WUs a day if does not do anything else.

            My performance matches what the databases at F@H forums would expect from my processor.

            The Windows SMP client has improved that much since you probably had tried it.

            • Nitrodist
            • 12 years ago

            Err, yeah, the SMP client hasn’t been updated in over a year.

            Just for fun, I started up the Windows SMP client again. The times are still slow (13-15 minutes/frame) while the linux client is at 11-12 minutes/frame. At 90-95% usage, no less, as well.

            • Krogoth
            • 12 years ago

            It might be because my client is running under *gasp* Vista 64.

            I get around 7-9 minutes per frame and the program does eat up 100% if it can.

            • Nitrodist
            • 12 years ago

            Hmm, you think so? I was running it on 32bit XP pro and a 64bit (obviously) ubuntu vmware.

      • My Johnson
      • 12 years ago

      But why not just get a workstation/server?

      This setup is for multi-GPU graphics. And it is not compelling in that regard.

        • Nitrodist
        • 12 years ago

        Did I say that it was smart to do it or even economical? :/

    • Krogoth
    • 12 years ago

    Why the hell there are PCI slots on these boards and most of all in between 16xPCIe 1 and 2? To be honest, there should only be one PCI slot for a high-end PCI soundcard and be placed on the leftmost slot.

    Anywayn, PCI is officially dead. You finally can get any discrete peripheral in PCIe (yes, there are PCIe audio cards)

      • Dashak
      • 12 years ago

      My two cents: I use a desktop PCI wireless card in addition to a soundcard. Not sure if there are many PCIe cards for desktop wireless.

      • Ubik
      • 12 years ago

      There are, however, not any professional-grade audio cards in PCIe at all. And I know of several production suites that could benefit from the kind of horsepower Skulltrail offers (although I don’t know how many of them could use all eight cores), so I see the PCI slots being useful for at least part of the Skulltrail market.

    • adisor19
    • 12 years ago

    So umm, this is like the tweaked guts of the Mac Pro ? 😉

    Adi

      • DASQ
      • 12 years ago

      Do you ever not troll your Apple garbage?

        • adisor19
        • 12 years ago

        I’m sorry, what exactly constitutes trolling in my post ? This entire platform IS a god damn dual CPU Xeon board complete with FB-DIMMS and with added BIOS tweaks. It’s the exact same thing that’s in a Mac Pro except with the BIOS tweaks added to it.

        Now, take a look at your post and then think of who exactly is trolling here ? Hint : the inflammatory word “garbage”.

        Adi

    • UberGerbil
    • 12 years ago

    Every pic of that thing makes me want to climb on and drive around the Everglades.

      • data8504
      • 12 years ago

      HAHA that took me entirely too long to comprehend…

      *headdesk*

      • paulsz28
      • 12 years ago

      Lol. Yeah, taking an airboat with that much HP through the Everglades would be a fantastic excursion.

      • flip-mode
      • 12 years ago

      Good one, LOL.

    • Flying Fox
    • 12 years ago

    I like that Folding plug, but will there be people who have more money than sense buy this and Fold for us?

    • BoBzeBuilder
    • 12 years ago

    FIrssT post. Wow, that was disappointing. Its barely faster than its four cored counter part.

      • enzia35
      • 12 years ago

      Doesn’t matter, it’s sexy!

      • jobodaho
      • 12 years ago

      …if you’re more concerned with gaming, the same rules apply to this as with all other dual/quad cores: games don’t take advantage of them yet. But, if you are like me and work with 3D rendering software a lot, then this is a cream dream, although completely unrealistic for a college student like myself. Is is way faster than a regular quad core setup? Not in most apps. But will it handle anything you throw at in at any given time, including multiple resource hungry programs? You betcha.

Pin It on Pinterest

Share This