Home NVIDIA’s nForce4 Ultra chipset
Reviews

NVIDIA’s nForce4 Ultra chipset

Scott Wasson Former Editor-in-Chief Author expertise
Disclosure
Disclosure
In our content, we occasionally include affiliate links. Should you click on these links, we may earn a commission, though this incurs no additional cost to you. Your use of this website signifies your acceptance of our terms and conditions as well as our privacy policy.
THE RACE TO DELIVER a core logic chipset with PCI Express support for the Athlon 64 has occupied much of our attention over the past few months. We first got a brief hands-on look at VIA’s K8T890 chipset, and then we did a full review of ATI’s Radeon Xpress 200. In between the two, NVIDIA announced its nForce4 lineup but wasn’t able to produce review hardware—until now. We’ve finally gotten our hands on an nForce4 reference motherboard from NVIDIA, and we’ve subjected it to a grueling battery of tests to see how it performs against the competition from VIA, ATI, and even Intel.

The nForce4 packs a number of innovative new features, including the aforementioned PCI Express capability, a firewall-fortified Gigabit Ethernet controller with hardware acceleration, and a robust implementation of the latest Serial ATA spec, including Native Command Queuing for SCSI-like disk I/O performance under heavy loads. The question is, do all of the marketers’ talking points add up to superior performance? We’ve tested these features, including ActiveArmour and SATA command queuing, and we have some answers. Read on to see what we found.

NVIDIA’s special sauce
The folks at NVIDIA have obviously worked hard to position the nForce4 chipset family as a distinctive option in the copycat world of core logic chipsets. That’s no simple goal to reach in a field where all of the products generally have to conform to the same basic set of standards at any given time. Some of the nForce4’s key features are simply shared with the competition, but others are worthy of note because they’re unique. Let’s have a look at the highlights.

    A single-chip design — Like the nForce3, the nForce4 is a single piece of silicon rather than a “chipset” proper with separate north and south bridge chips. Because the Athlon 64 (and AMD’s other K8-class processors, including the Opteron and some Semprons) has its own memory controller onboard, the single-chip approach works here. All of NVIDIA’s competitors have opted for two-chip configurations that will allow them to swap in new north or south bridge chips independently. The nForce4’s single-chip design should virtually eliminate any bottlenecks in chip-to-chip communication between the traditional north and south bridge function blocks.


    A block diagram of the nForce4. Source: NVIDIA.

    PCI Express — We’ve already covered the basics of PCI Express in our review of the world’s first PCI Express chipsets, the Intel 915G and 925X Express. The nForce4 packs 20 lanes of PCI Express connectivity, sixteen of which will be dedicated to graphics, leaving four lanes for use with PCI-E expansion slots or peripheral chips on the motherboard.

    SLI support — The most exciting application for the nForce4’s PCI Express lanes is probably NVIDIA’s SLI GPU teaming technology. I’ve listed this capability as a separate bullet point because I’m being generous. Truth is, SLI relies on PCI Express in order to work properly, and any decent PCI-E chipset ought to be able to handle SLI. However, NVIDIA claims the nForce4 is specially optimized for SLI. Whatever the technical merits of that claim, it is true that the first Athlon 64 motherboards with dual graphics slots will definitely be based on the nForce4 chipset. NVIDIA has worked with motherboard manufacturers to make SLI mobos happen, even pioneering a trick PCI-E connector card that will redirect PCI-E lanes as needed. (In a single-card config, 16 lanes go to a single PCI-E X16 slot. In a dual-card config, eight lanes go to each of two physical PCI-E X16 slots.) We’ll have more to say about SLI in a separate article shortly.

    Advanced storage options — In terms of feature set, the nForce4’s storage options are the class of the industry, rivaled only by Intel’s. NVIDIA’s design team has endowed the nForce4 with four Serial ATA ports and two channels of ATA/133 storage connectivity, for total of eight possible connected devices.

    The SATA ports support the latest enhancements from the SATA II spec, including hot-swappable devices, transfer rates up to 300MB/s, and Native Command Queuing. Hard drives with support for 300MB/s transfer rates are still rare birds, but Native Command Queuing (NCQ) is becoming more common. NCQ replicates the sort of smart algorithms that have long been built into SCSI disk controllers on high-end server and workstation computers. By taking in multiple disk I/O requests and intelligently reordering their execution, a NCQ-enabled controller should be able to minimize the mechanical delays in a hard drive mechanism—things like head seeks and rotational latencies that can take milliseconds, a virtual eternity in computer time.


    NCQ in action. Source: NVIDIA

    The nForce4 should be the first Athlon 64 chipset to market with NCQ suppport; ATI’s Radeon Xpress doesn’t have it, and VIA’s K8T890 north bridge will likely have to wait for a new south bridge chip, in second-gen K8T890 motherboards, in order to gain that capability.

    Robust RAID — All eight of the devices plugged connected to the nForce4’s storage controllers can participate in RAID arrays of various flavors, including RAID levels 0, 1, 0+1 and the ever-casual “just a bunch of disks.” NVIDIA’s RAID implementation is unique among Athlon 64 chipsets in its ability to span multiple disk controllers and disk types, so that an ATA/133 drive could be paired up in an array with a comparable SATA drive. Also, the slick NVRAID Windows utility allows users to perform a number of management tasks, including assigning hot-spare drives and viewing a graphical representation of the SATA port on the motherboard where a failed drive is connected. Most impressive, perhaps, is a feature NVIDIA calls RAID morphing; the nForce4 can convert one RAID array type to another on the fly (from RAID 0, say, to RAID 1) without the loss of data on the volume.

    Hardware-accelerated Gigabit Ethernet with a firewall — Other chipset makers are deemphasizing networking capabilities, preferring to let the newfound bandwidth of PCI Express enable third-party network controllers to handle those duties. NVIDIA, on the other hand, is extending its emphasis on integrated networking. The nForce4 has a Gigabit Ethernet controller and NVIDIA’s own home-cooked firewall software, as did the nForce3 250Gb before it. New this time around is actual hardware-level acceleration of packet handling, both for basic TCP/IP communications and for the stateful packet inspection conducted by the firewall. The nForce4 includes logic, designed in-house at NVIDIA, to offload such chores from the CPU. NVIDIA’s marketing types have dubbed this feature ActiveArmor. Despite the name, it is possible to use ActiveArmor’s TCP acceleration while NVIDIA’s firewall is disabled.


    The main admin screen for NVIDIA’s networking suite

    Eight-channel AC’97 audio — Another highlight, or perhaps the lowlight, of the nForce4’s feature set is audio. While building its hardware-accelerated networking capabilities, NVIDIA has moved in the opposite direction for audio, dropping the DSP-accelerated SoundStorm solution from the nForce3 in favor of basic AC’97 audio. The nForce4 won’t resurrect SoundStorm, and NVIDIA hasn’t elected to support Intel’s new High Definition Audio standard on nForce4, either, so the nForce4 can’t handle the higher sample and bit rates that standard brings. The nForce4 does have the ability to feed eight channels of AC’97 audio, however, and NVIDIA’s audio drivers support 3D positional audio via DirectSound3D and EAX.

    A 1GHz HyperTransport link — Despite rumors to the contrary, NVIDIA says the nForce4 will feature a 1GHz HyperTransport link to the CPU. In fact, the reference board that NVIDIA supplied us for review had 1GHz HyperTransport link option in the BIOS, and we turned it on for testing without suffering any negative consequences.

    nTune tweaking software — NVIDIA has long been pushing its System Utility as a means of tweaking and overclocking nForce-based motherboards, but not all motherboard makers provided the right hooks in the BIOS to make this utility fully functional. For the nForce4, NVIDIA has renamed the utility to nTune and given it an auto-overclocking feature similar to the one in NVIDIA’s graphics drivers. The software will put the system through a series of tests to determine a “safe” set of overclocking settings. NVIDIA says the goal isn’t to squeeze every last possible ounce of performance out of the system like one might get with manual tuning, but to provide a quick, automated way to get 90% or so of the way there. I tried out nTune on the NVIDIA reference board, and after a lock-up and several restarts, it settled on a 210MHz HyperTransport speed, raising the clock speed of my Athlon 64 4000+ from its stock 2.4GHz to 2.52GHz.

    nTune can also aim for specific goals, like best graphics performance or “silent tuning” that might be more appropriate for home theater PCs or word processing work, and it can save and manage various profiles with these settings. The nTune software also has a monitoring tool to keep track of fan speeds and temperatures, plus a BIOS flash update tool that works inside of Windows. I’m curious to see whether NVIDIA can convince motherboard makers to support nTune; they tend to prefer to provide their own monitoring and tweaking software.

Those are the highlights, and just the highlights, believe it or not. Chipsets are complex beasts.

NVIDIA will be making things even more complex by offering three different versions of the nForce4. The nForce4 Ultra, which we’re reviewing here, will be the middle-of-the-line version aimed at motherboards in the $100 to $150 range. The nForce4 SLI will have the same basic set of features, but it will, naturally, go on motherboards with multiple graphics slots for SLI. Expect those boards to cost $200 or more. And the cheapy version of the chip, just dubbed “nForce4” and nothing more, will be aimed at $50 to $80 mobos for Socket 754-based Athlon 64 and Sempron processors. The vanilla flavor of nForce4 will lack some of the fanciest features, including support for 1GHz HyperTransport speeds, SATA II transfer rates, and ActiveArmor acceleration. It will, however, still include a Gigabit Ethernet controller, firewall, and all the rest.

Sizing up the competition
Let’s compare the nForce4 to the competition by singling out some of the more pertinent north bridge features:

Intel 915G/925X VIA K8T890 ATI Radeon Xpress 200 NVIDIA nForce4 Ultra
HyperTransport link N/A 16-bit/1GHz 16-bit/1GHz 16-bit/1GHz
PCI Express lanes 16 20 22* 20
North/south bridge interconnect type DMI Ultra V-Link 2 lanes PCI Express N/A (Single chip)
Peak theoretical interconnect bandwidth 2GB/s 1.06GB/s 1GB/s N/A (Single chip)

The nForce4 matches up almost exactly with the K8T890 and the Radeon Xpress 200 in terms of north bridge features, with the obvious exception that it’s a single-chip design. Note the asterisk next to the number of PCI-E lanes on the Radeon Xpress 200. That’s there because ATI dedicates a pair of PCI-E lanes to talking to the south bridge, so the effective number of lanes on the chipset is actually 20. Now, down to the south bridge, where more of the I/O capabilities lie.

Intel ICH6R VIA VT8251 ATI Radeon Xpress 200 NVIDIA nForce4 Ultra
PCI Express lanes 4 2 0 N/A (Single chip)
SATA ports 4 4 4 4
SATA peak data rate 150MB/s 150MB/s 150MB/s 300MB/s
Native Command Queuing Y Y N Y
SATA RAID 0/1 Y Y Y Y
SATA RAID 0+1 N Y N Y
SATA Matrix RAID Y N N N
ATA channels 1 2 2 2
ATA RAID support N N N Y
Max audio channels 8 8 8 8
Audio standard HD/AC’97 HD/AC’97 AC’97 AC’97

Compared to its primary competitors, the ATI Radeon Xpress 200 and VIA K8T890, the nForce4 has a few nice advantages in the disk I/O department, like support for NCQ, 300MB/s Serial ATA transfer rates, and better RAID capabilities on several fronts. Unlike the competition, NVRAID supports ATA devices, can do RAID morphing, and includes NVIDIA’s nifty RAID software utility. When VIA’s VT8251 south bridge arrives, though, the nForce4 will fall behind on the audio front because it lacks support for High Definition Audio.

The nForce4’s most notable deficiency may be the lack of a version with integrated graphics. ATI’s Radeon Xpress 200 has a real DirectX 9-class graphics core, and it should do well in low-cost PCs, corporate desktop systems, and the like. Despite being primarily a graphics company, NVIDIA has nothing comparable.


NVIDIA’s nForce4 Ultra reference motherboard

Leveling the playing field
Thanks to the Athlon 64’s integrated memory controller, most Athlon 64 chipsets perform about the same on the memory-intensive benchmarks traditionally used to evaluate chipsets. However, we did find some measurable performance differences between the Radeon Xpress 200 and the VIA K8T800 Pro recently, and that piqued my curiosity. Turns out that the Radeon Xpress 200 reference motherboard supplied to us by ATI had the HyperTransport clock set at a very curious speed: 200.9MHz. That’s nearly 201MHz, and while it may not sound like much, it makes a real difference in CPU clock speeds. Have a look at what CPU-Z shows for the ATI reference board:


ATI’s Radeon Xpress 200 reference board

Sneaky, huh? The CPU clock speed is quite a bit higher than stock. What’s more, although the HyperTransport speed can be overclocked, there’s no way to adjust it down to a true 200MHz on that board.

Funny thing is, the NVIDIA reference board came to us with a 200.9MHz HyperTransport clock, as well.


NVIDIA’s nForce4 reference board

NVIDIA’s BIOS does offer the option of a true 200MHz HT link (as well as 200.3 and 200.6MHz), but I wanted a fair comparison. The only thing to do was to raise the HT clock on the Asus A8V Deluxe mobo we were using for the VIA K8T800 Pro results to a comparable speed, so that’s what I did:


Asus A8V Deluxe with K8T800 Pro

Ok, so it’s not exact, but it’s as close as I could get.

With the clock speed disparity out of the way, I proceeded to the Athlon 64 memory controller’s settings. Using the handy little utility A64 Tweaker and a few BIOS menu options, I was able to match up memory timing parameters almost exactly between the three boards, with the exception of some idle cycle settings on the Asus A8V Deluxe.


ATI’s Radeon Xpress 200 reference board


NVIDIA’s nForce4 reference board


Asus A8V Deluxe with K8T800 Pro

Tweaked like this, these boards should perform just about identically whenever the CPU or memory controller is the primary bottleneck.

Test notes
Note that our Pentium 4 systems use a slightly lower grade CPU than the Athlon 64 4000+ in our AMD test systems. We have also included the P4 Extreme Edition, which we’ve used in conjunction with the Intel 925XE chipset and its fancy 1066MHz bus, but it’s really not much faster than the P4 560. We probably should have opted for a lower end CPU for the Athlon 64 systems just to make it fair, but we didn’t want to hamstring them with a slower CPU.

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least twice, and the results were averaged.

Our test systems were configured like so:

Processor Athlon 64 4000+ 2.4GHz Athlon 64 4000+ 2.4GHz Athlon 64 4000+ 2.4GHz Pentium 4 560 3.6GHz Pentium 4 Extreme Edition 3.46GHz
System bus 1GHz HyperTransport 1GHz HyperTransport 1GHz HyperTransport 800MHz (200MHz quad-pumped) 1066MHz (266MHz quad-pumped)
Motherboard Asus A8V Deluxe ATI reference NVIDIA reference Abit AA8 DuraMax Intel D925XECV2
BIOS revision 1008 beta 1 B10 4.70 1.4 CV92510A.86A.0338
North bridge K8T800 Pro Radeon Xpress 200 nForce4 Ultra 925X MCH 925XE MCH
South bridge VT8237 Radeon Xpress 200 ICH6R ICH6R
Chipset drivers 4-in-1 v.1.11 beta (9/7/04) 10/31/04 beta ForceWare 6.31 beta INF Update 6.0.1.1002
IAA for RAID 4.5.0.6515
INF Update 6.0.1.1002
IAA for RAID 4.5.0.6515
Memory size 1GB (2 DIMMs) 1GB (2 DIMMs) 1GB (2 DIMMs) 1GB (2 DIMMs) 1GB (2 DIMMs)
Memory type OCZ PC3200 EL DDR SDRAM at 400MHz OCZ PC3200 EL DDR SDRAM at 400MHz OCZ PC3200 EL DDR SDRAM at 400MHz OCZ PC2 5300 DDR2 SDRAM at 533MHz OCZ PC2 5300 DDR2 SDRAM at 533MHz
CAS latency (CL) 2 2 2 3 3
RAS to CAS delay (tRCD) 2 2 2 3 3
RAS precharge (tRP) 2 2 2 3 3
Cycle time (tRAS) 5 5 5 10 10
Hard drive Maxtor MaXLine III 250GB SATA 150
Audio Integrated VT8237/ALC850 with 3.64 drivers Integrated with 5.00.30001.20 drivers Integrated Integrated ICH6R/ALC880 with 5.10.0.5022 drivers Integrated ICH6R/ALC880 with 5.10.0.5032 drivers
Graphics 1 GeForce 6800 GT 256MB AGP with ForceWare 66.81 drivers GeForce 6800 GT 256MB PCI-E with ForceWare 66.81 drivers GeForce 6800 GT 256MB PCI-E with ForceWare 66.81 drivers GeForce 6800 GT 256MB PCI-E with ForceWare 66.81 drivers GeForce 6800 GT 256MB PCI-E with ForceWare 66.81 drivers
Graphics 2 Radeon X800 XT AGP 256MB with Catalyst 4.11 drivers Radeon X800 XT PCI-E 256MB with Catalyst 4.11 drivers Radeon X800 XT PCI-E 256MB with Catalyst 4.11 drivers N/A N/A
OS Microsoft Windows XP Professional
OS updates Service Pack 2, DirectX 9.0c

All tests on the Pentium 4 systems were run with Hyper-Threading enabled.

Thanks to OCZ for providing us with memory for our testing. If you’re looking to tweak out your system to the max and maybe overclock it a little, OCZ’s RAM is definitely worth considering.

Also, all of our test systems were powered by OCZ PowerStream power supply units. The PowerStream was one of our Editor’s Choice winners in our latest PSU round-up.

The test systems’ Windows desktops were set at 1152×864 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.

We used the following versions of our test applications:

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Memory performance
We’ll start with synthetic memory bandwidth benchmarks, just to get them out of the way. Have our efforts to level the playing field with respect to clock speeds and memory timings had an impact?

Definitely. The three Athlon 64 chipsets all produce similar scores here, with very little variance.

Doom 3
Let’s get right down to the gaming results with Doom 3. We tested using a custom-recorded demo that should be fairly representative of most of the single-player gameplay in Doom 3. Doom 3 is running in High Quality mode here.

Far Cry
Our Far Cry demo takes place on the Pier level, in one of those massive, open outdoor areas so common in this game. Vegetation is dense, and view distances can be very long.

Both Doom3 and Far Cry show the nForce4 perform almost exactly like the Radeon Xpress 200. The K8T800 Pro is slightly faster, but it’s nothing to write home about.

Unreal Tournament 2004
Our UT2004 demo shows yours truly putting the smack down on some bots in an Onslaught game.

Now here are some real differences between the scores! These results suprised me a little bit. I was testing UT2004 with “Hardware 3D audio” enabled, and I started to wonder whether that was the issue. Retesting with the game’s software 3D audio tightened things up quite a bit.

Obviously, the nForce4’s 3D audio drivers have a fair bit of overhead. I’m not sure that’s entirely a bad thing; it possible NVIDIA’s algorithms are doing positional audio more correctly than the competition. The Radeon Xpress 200’s scores barely changed when I turned off hardware 3D audio, but ATI’s drivers don’t support positional 3D audio via DirectSound3D or EAX.

3DMark05

3DMark runs without sound, and the scores here show that the nForce4’s performance with PCI Express graphics is right in line with the competition.

Radeon X800 XT performance
The previous round of gaming scores was recorded using GeForce 6800 GT graphics cards, as were all of our non-gaming scores. However, I wanted to see whether moving to ATI Radeon X800 XT cards would change things much, for several reasons. First, NVIDIA has been known to optimize its chipsets for its graphics cards, and I wanted to remove that factor. Second, the GeForce 6800 GT GPU has a native AGP interface, and PCI Express versions of the card use a bridge chip to enable compatibility with PCI-E motherboards. I wanted to be sure the bridge chip wasn’t exacting a performance penalty on our PCI Express chipsets. The Radeon X800 XT comes in native PCI-E and native AGP versions.

Doom 3

Far Cry

Unreal Tournament 2004

3DMark05

Moving to the Radeon X800 XT cards didn’t change things too much, with the exception of the 3DMark05 CPU scores, which are now much more even.

WorldBench overall performance
WorldBench uses scripting to step through a series of tasks in common Windows applications and produces an overall score. More impressively, WorldBench spits out individual results for its component application tests, allowing us to compare performance in each. We’ll look at the overall score, and then we’ll show individual application results alongside the results from some of our own application tests.

The two PCI Express chipsets come out ahead of the K8T800 Pro in WorldBench. We’ll see why as we look at some of the individual applications’ scores.

Audio editing and encoding

LAME MP3 encoding
We used LAME to encode a 101MB 16-bit, 44KHz audio file into a very high-quality MP3. The exact command-line options we used were:

lame –alt-preset extreme file.wav file.mp3

MusicMatch Jukebox

Audio encoding is all about CPU and memory performance, so the scores don’t really differ here.

Video encoding and editing

XMPEG DivX video encoding
We used the default settings for the DivX codec to encode a 3000-frame sequence from a DVD-formatted MPEG2 source file.

Windows Media Encoder

Adobe Premiere

VideoWave Movie Creator

The nForce4 and Radeon Xpress 200 are neck-and-neck in our video encoding tests, but the K8T800 Pro falls behind a little.

Image processing

Adobe Photoshop

ACDSee PowerPack

The differences between the scores here aren’t pronounced enough to mean much.

Multitasking and office applications

MS Office

Mozilla

Mozilla and Windows Media Encoder

Ditto for these scores, with the possible exception of the multitasking test with Mozilla and Windows Media Encoder, where the Radeon Xpress 200 seems to fare relatively well.

Other applications

Sphinx speech recognition
Ricky Houghton first brought us the Sphinx benchmark through his association with speech recognition efforts at Carnegie Mellon University. Sphinx is a high-quality speech recognition routine. We use two different versions, built with two different compilers, in an attempt to ensure we’re getting the best possible performance.

WinZip

Nero

WinZip and Sphinx are both primarily memory/CPU bound applications, and all the Athlon 64 chipsets are bunched up tight once more in these tests. Nero, however, is more dependent on the disk controller, and none of the Athlon 64 systems can touch the Intel 900-series chipsets in this test.

3D modeling and rendering

Cinebench 2003
Cinebench is based on Maxon’s Cinema 4D modeling, rendering, and animation app. This revision of Cinebench measures performance in a number of ways, including 3D rendering, software shading, and OpenGL shading with and without hardware acceleration. Cinema 4D’s renderer is multithreaded, so it takes advantage of Hyper-Threading, as you can see in the results.

The OpenGL shading tests reveal some difference between the chipsets, because they rely on the graphics card (and thus the graphics interface) to help with some of the work. The nForce4 is just a little slower than the other guys here.

3ds max
We have used 3ds max in the past for CPU testing, but most of those tests have consisted of rendering only. WorldBench’s 3ds max tests replicate an entire modeling and animation work session, stressing the graphics card as well as the CPU and the rest of the system.

Like Cinebench, 3dsmax also pushes lots of polygons through the graphics subsystem, and the nForce4 Ultra performs about like the competition.

Ethernet performance
We evaluated Ethernet performance using the NTttcp tool from the Microsoft Windows DDK. We used the following command line options on the server machine:

ntttcps -m 4,0,192.168.1.25 -a

..and the same basic thing on each of our test systems acting as clients:

ntttcpr -m 4,0,192.168.1.25 -a

We used an Abit IC7-G-based system as the server for these tests. It has an Intel NIC in it that’s attached to the north bridge via Intel’s CSA connection, and it’s proven very fast in prior testing. The server and client talked to each other over a Cat 6 crossover cable.

We tested the nForce4 several different ways, with and without NVIDIA’s Firewall enabled, and with and without ActiveArmor acceleration. We also tested the nForce4 with the firewall included in Microsoft’s Windows XP Service Pack 2, just to see how it compared.

The Radeon Xpress 200 system here is using a PCI Express Gigabit Ethernet card from SysKonnect (based on a Marvell chip) that ATI supplied to go along with its reference board. This is the first PCI Express X1 card I’ve ever seen. Check it out:

Innit cute?

Since ATI has included no Ethernet controller in its chipset, the use of an external PCI-E Ethernet controller chip should be standard practice with the Radeon Xpress 200. The K8T800 Pro system, meanwhile, is using a Marvell Ethernet chip embedded on the Asus A8V Deluxe motherboard and connected to the PCI bus. This, too, is a typical GigE solution for this chipset. Our Pentium 4 system, meanwhile, uses the unfortunate combination of a PCI Express chipset and a PCI network chip, unnecessarily hampering its performance.

NVIDIA’s ActiveArmor works more or less as promised, delivering much lower CPU utilization at the price of about 20Mbps of total throughput. Not a bad tradeoff, all things considered. That gives the nForce4 Ultra the best combination of Ethernet throughput and CPU utilization of the lot, even with NVIDIA’s firewall active. The Radeon Xpress 200, with the Marvel PCI-E chip, achieves even higher throughput, but it requires about another 20% of CPU time in order to do so.

I was curious to see what would happen to NTttcp throughput and CPU use if I enabled jumbo frames, a provision in Gigabit Ethernet implementations that could potentially improve throughput and lower CPU utilization. I turned on frame sizes up to 9K bytes for the server, the Radeon Xpress 200 system, and the nForce4 Ultra. (None of them had jumbo frames enabled by default.) Here’s what I found.

Jumbo frames can indeed boost throughput and lower CPU utilization, but NVIDIA’s ActiveArmor acceleration has a throughput problem with jumbo frames. NVIDIA says that ActiveArmor should be able to handle jumbo frames just fine, and they’re looking into this problem.

Single-drive Iometer performance
We haven’t had time to conduct a full set of RAID testing yet, but I wanted to see the impact of NVIDIA’s support for Native Command Queuing, so I ran a few tests on a single drive. The drive, in this case, was the same model as the system disk we used for all the rest of our tests, a Maxtor MaxLine III SATA drive with support for NCQ built right in.

Hmm. With NCQ enabled, the nForce4 is faster than all the other Athlon 64 chipsets in terms of operations per second and response times, but there are a few problems. The nForce4’s performance with NCQ doesn’t seem to scale right above a queue depth of 32, and at a queue depth of 256, Iometer consistently crashes. Also, the nForce4 can’t touch the performance of the Intel 925X chipset, and CPU utilization on the nForce4 is markedly higher than everything else.

Let’s see what happens with Iometer’s database access pattern.

The database pattern shows us the same things. NCQ undeniably boosts performance on the nForce4, but there’s a scaling problem with higher loads that reminds me of what we saw out of the nForce3 250Gb in our chipset RAID comparo. I’m unsure whether this is some sort of hardware limitation or simply a problem with NVIDIA’s drivers.

SATA performance
I used HDTach for these benchmarks. They’re largely intended to find a problem with throughput, CPU utilization, or both. I used HDTach’s 8MB zone size setting for all tests. The hard drive was a Seagate Barracuda V with SATA interface.

The nForce4 Ultra’s performance mirrors that of the competition, and we don’t see any major problems here.

ATA performance
These tests are HDTach again, but with a Seagate Barracuda V with ATA/100 interface.

ATA performance is much the same story. For most intents and purposes, the three Athlon 64 chipsets perform identically, and the Intel 925X is quite similar, too.

USB performance
For the USB tests, I used a DiamondMax D740X hard drive in a USB 2.0-compatible enclosure and, you guessed it, HDTach with an 8MB zone size.

The nForce4 escapes our USB test without having any problems. The same can’t be said for the Radeon Xpress 200, whose USB implementation seems to be a bit of a CPU hog.

Audio performance

As we saw in our Unreal Tournament tests, the nForce4’s CPU utilization for audio is rather high. Even for basic audio, as in the DirectSound2D tests, the nForce eats up more CPU time than any of the other competitors. Once we get into 3D positional audio, the CPU use becomes substantial, peaking at 14% with EAX. Gamers will want to consider a discrete sound card with DSP capabilities like the Audigy 2 in order to avoid taking the performance hit.

Conclusions
NVIDIA had ambitious goals when they set out to create the nForce4, and in part, they’ve succeeded. The PCI Express implementation has been trouble-free for us, and the chipset’s all-around performance in benchmarks that stress the CPU, memory, and graphics is roughly equivalent to that of its competitors. Also, the features that make nForce4 unique, like ActiveArmor network/firewall acceleration and Serial ATA with Native Command Queuing, are able to demonstrate their effectiveness in our benchmarks.

However, NVIDIA’s success is only partial, because those same unique features can’t quite stand up to scrutiny when pushed really hard. The disk controller’s NCQ support clearly makes it faster than competing AMD K8 chipsets, but performance doesn’t scale properly; it hits a wall at larger queue depths. That’s quite the contrast to Intel’s NCQ disk controller, which is faster still and scales nicely. Similarly, NVIDIA’s ActiveArmor Ethernet packet handling acceleration is able to provide low CPU utilization in combination with decent throughput, but when we turn on jumbo frame sizes, it chokes.

These are the features that are supposed to make the nForce4 stand out in a crowded market, and they almost manage to do so, but they’re not quite right. What’s more, we have no good sense yet about whether NVIDIA will be able to rectify these problems through driver updates or minor tweaks to nForce4 silicon. It’s possible we’re looking at hardware-level deficiencies than can’t be easily or simply corrected.

Also, in a poetic twist, one of the nForce4’s outstanding weaknesses turns out to be the CPU overhead of its SoundStorm-free audio implementation. The nForce’s performance in our gaming tests with audio (Far Cry, UT2004) is a step or two behind those without sound (Doom 3, 3DMark05), and frame rates take an especially big dip in UT2004 when we rely on NVIDIA’s drivers to provide “hardware” 3D positional audio. Our RightMark audio tests confirm the problem, showing the nForce4 to have higher CPU utilization than any of the other solutions, even when doing straight DirectSound2D audio.

Partial success at achieving ambitious goals doesn’t always mean failure, of course. The nForce4 isn’t perfect, but it doesn’t have any glaring, showstopper problems, either. It’s quite possible, perhaps even likely, that the nForce4 will become the chipset of choice for enthusiast motherboards in the coming months. NVIDIA’s RAID is still more flexible than most, the nForce4 will likely be the best choice for SLI motherboards, and the nTune software suite is kind of nifty. Also, ActiveArmor and NCQ in their current states are still, on balance, worth having. However, thanks to a few problems with those features, NVIDIA has left an opening for the competition. We’ll have to see whether that opening closes when final nForce4 motherboards and drivers arrive.

The Tech Report - Editorial ProcessOur Editorial Process

The Tech Report editorial policy is centered on providing helpful, accurate content that offers real value to our readers. We only work with experienced writers who have specific knowledge in the topics they cover, including latest developments in technology, online privacy, cryptocurrencies, software, and more. Our editorial policy ensures that each topic is researched and curated by our in-house editors. We maintain rigorous journalistic standards, and every article is 100% written by real authors.

Scott Wasson Former Editor-in-Chief

Scott Wasson Former Editor-in-Chief

Scott Wasson is a veteran in the tech industry and the former Editor-in-Chief at Tech Report. With a laser focus on tech product reviews, Wasson's expertise shines in evaluating CPUs and graphics cards, and much more.