Multitasking with Native Command Queuing

INTELLIGENTLY REORDERING I/O REQUESTS in order to minimize the performance impact of a hard drive’s mechanical latency—otherwise known as command queuing—is unquestionably the right thing to do. However, though command queuing has long proven to be a valuable asset to SCSI drives faced with multi-user and enterprise-class workloads, the performance benefits of Native Command Queuing (NCQ) in desktop Serial ATA drives have been harder to illustrate. Unfortunately, most commonly used hard drive benchmarks don’t play to NCQ’s strengths, and those that do involve server-style workloads that are hardly indicative of desktop environments.

In order to test Native Command Queuing’s performance potential on the desktop, something completely different is needed—and we have just the thing. Join me as we explore a new collection of hard drive tests that showcase why NCQ really does matter on the desktop.

Command queuing in brief
Hard drives use a mechanical drive head to write and read data to and from a spinning platter. Often, though, the drive head sits idle waiting for the platter to rotate to the correct position for a pending read or write operation. This idle delay is referred to as the drive’s rotational latency, and the lower the better. Drive makers have long sought to reduce rotational latencies by spinning platters faster, and they’ve managed to hit dizzying 15,000-RPM spindle speeds with the latest high-end SCSI drives. Unfortunately, spinning a drive at 15,000 or even 10,000 RPM isn’t as easy as giving it more gas. It’s not cheap, either, which is why relatively low-cost desktop drives are stuck at 7,200 RPM.

While faster spindle speeds are a brute-force approach to reducing rotational latency, Native Command Queuing is a more cerebral and ultimately less expensive alternative. Instead of blindly executing IO requests on a first-come, first-serve basis, NCQ allows a hard drive to intelligently reorder multiple requests based on their proximity to the drive head’s position. Here’s an example of how it works:


Source: NVIDIA

Without command queuing, the drive must complete two rotations to execute requests one through four in the order that they are received. However, with a little intelligent reordering, an NCQ-equipped drive can fulfill all four requests in only a single rotation. Fewer rotations translate to milliseconds saved—a virtual eternity within a modern PC. Because of the nature of data organization on a hard disk, which often involves close proximity for pockets of related data (in a single file or group of files, for instance), command queuing can potentially pay big dividends, especially when multiple tasks are operating simultaneously on separate data sets.

Although it’s a much smarter approach than turning up spindle speeds, command queuing has some limitations. Most notably, it can only improve performance if consecutive IO requests reference different regions of a disk. Command queuing can’t improve the scheduling of streaming IO requests that arrive at the drive in order, either. In fact, queuing IO requests may introduce enough overhead to reduce a drive’s performance in some instances. However, in environments with more random I/O profiles, such as multitasking or multi-user loads, NCQ has considerable potential. In fact, command queuing is such a valuable asset in multi-user environments that high-end SCSI drives have been using it for years.

A new approach to testing
Since Native Command Queuing’s ability to intelligently reorder IO request depends on more random I/O profiles, a new approach to testing is required. Most hard drive benchmarks rely on streaming transfers that don’t benefit from command queuing, so they won’t do. IOMeter presents a best-case scenario for command queuing by hammering drives with randomized access patterns and up to 256 simultaneous requests, but that kind of extreme load is meant to simulate multi-user enterprise environments, not single-user desktop workloads.

At best, desktop PCs are only doing a few things at once. Often those tasks, such as gaming, rendering, image editing, word processing, web surfing, and crunching work units for distributed computing projects, aren’t even disk-bound. A few are, though. Creating and extracting compressed files, importing and exporting Outlook PST files, importing MPEG video into VirtualDub for editing, and copying files are all disk-intensive activities that I put my PC through on a regular basis. We’ve distilled those activities into the following disk-intensive tasks:

  • VirtualDub import — A 6.86GB Hell’s Kitchen MPEG2 video file was opened using VirtualDub MPEG2.
  • File copy — The 6.86GB MPEG2 file was copied to another location on the disk.
  • Outlook import — A folder with ~20,000 email messages was imported into Outlook XP from a PST file.
  • Outlook export — A folder with ~20,000 email messages was exported from Outlook XP to a PST file.
  • Compress create — A ZIP file was created with 512MB of text files, images, and Excel spreadsheets.
  • Compress extract — 512MB of text files, images, and Excel spreadsheets were extracted from a compressed ZIP file.

Since dual-core processors and multitasking are all the rage these days, we’ll be combining these tasks into a series of multitasking tests. We chose to classify our file copy and VirtualDub import tasks as foreground activities and our Outlook and compression operations as ones we’d probably do in the background. We then combined each foreground task with each background task, giving us a total of eight multitasking loads. For kicks, we also threw in a dual file copy test that involved copying 512MB of text files, images, and Excel spreadsheets alongside our 6.86GB MPEG2 transfer.

Performance in multitasking tests is difficult to quantify, and test methods are difficult to replicate exactly, which is where Intel’s iPEAK Storage Performance Toolkit comes in. iPEAK allows us to record a trace of all the IO requests in our multitasking tests and play them back on various drives. iPEAK reports a drive’s mean service time for each test, giving us results that are easy to compare.

In an attempt to reproduce a typical hard drive, we recorded our traces on a 40GB Windows XP partition filled with program and data files, music, movies, and games. The partition had 10GB of free space and was defragmented before the traces were recorded. Because we’re doing multitasking tests, we stopped recording each trace when one of the two tasks was completed. In the end, each test ended up representing between two and 10 minutes worth of I/O requests.

 

Our testing methods
To test the benefits of Native Command Queuing, we ran through our test suite with a couple of NCQ-capable hard drives running with command queuing enabled and disabled.

All tests were run at least twice, and their results were averaged, using the following test system.

Processor Intel Pentium 4 Extreme Edition 3.4GHz
Front-side bus 800MHz
Motherboard Asus P5WD2 Premium WiFiTV Edition
BIOS revision 0422
North bridge Intel 955X
South bridge Intel ICH7R
Chipset drivers Intel 7.2.1.1003
Memory size 1GB (2 DIMMs)
Memory type Micron DDR2 SDRAM at 533MHz
CAS latency (CL) 3
RAS to CAS delay (tRCD) 3
RAS precharge (tRP) 3
Cycle time (tRAS) 8
Hard drives Maxtor DiamondMax Plus 9 160GB PATA
Maxtor DiamondMax 10 300GB SATA
Seagate Barracuda 7200.8 400GB SATA
Storage drivers Intel RAID/AHCI 5.1.0.1022
Audio ICH7R/ALC882D
Audio driver Realtek HD 1.22
Graphics ATI  Radeon X700 Pro 256MB with CATALYST 5.4 drivers
OS Microsoft Windows XP Professional
OS updates Service Pack 2, DirectX 9.0c

iPEAK trace playback requires an empty disk, so our system’s OS and applications are running off a Maxtor DiamondMax Plus 9 ATA/133 hard drive.

Our test systems were powered by OCZ PowerStream power supply units. The PowerStream was one of our Editor’s Choice winners in our latest PSU round-up.

We used the following versions of our test applications:

  • Intel iPEAK Storage Performance Toolkit 3.0

The test systems’ Windows desktop was set at 1280×1024 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.

All the tests and methods we employed are publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

 

iPEAK multitasking performance
The mean service time of each drive is reported in milliseconds, with lower values representing better performance.

Both drives enjoy a healthy performance boost from NCQ in our dual file copy test, with the DiamondMax drive’s performance improving by an impressive 19%.

NCQ also proves valuable during the creation of a compressed file. Performance improves when copying files and loading an MPEG2 video into VirtualDub, although the Maxtor drive doesn’t get much faster at the latter.

Results are more mixed when we turn our multitasking to compressed file extraction. Here, the DiamondMax 10 is barely faster with NCQ enabled during a VirtualDub import, and actually slower during a file copy. The Barracuda performs better with NCQ in both instances, although the NCQ performance advantage in the file copy test is pretty slim.

 

iPEAK multitasking performance – con’t

Moving to an Outlook PST export, NCQ proves to have little impact on the Maxtor drive’s performance. However, the Barracuda’s performance improves by an impressive 12% with a VirtualDub import as our second task. Of course, the DiamondMax drive is still faster overall, perhaps because it’s equipped with a larger 16MB cache.

Our final tests involve and Outlook PST import, and here we see some of our biggest performance gains. Note the DiamondMax drive’s huge performance jump in the file copy test, where NCQ improves performance by an incredible 32%.

 

Conclusions
Native Command Queuing can definitely improve the performance of desktop systems. Across our nine tests, NCQ performance gains average to around 9% for both the Maxtor and Seagate drives. That’s pretty good, especially considering that we saw a handful of much more impressive gains in some of our tests.

Of course, we weren’t taking it easy on the drives. Our multitasking traces represent demanding, disk-intensive loads, but we’ve done our best to ensure that none of the multitasking scenarios we’ve portrayed are outlandish or unrealistic for a single-user desktop. In fact, as dual-core processors encourage more frequent and demanding multitasking, our scenarios might even start to look a little conservative.

Count on seeing more of these multitasking tests in our upcoming storage reviews. We’re eager to see how new hard drives and even multi-drive RAID arrays handle our iPEAK workloads. 

0 0 votes
Article Rating
2 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
blubje
blubje
12 years ago

Unfortunately, nvidia still doesn’t know how to write a driver properly (let alone design hardware). There is still a problem with their nforce sata driver after all these years, for example with NCQ enabled in the latest driver you can occassionally get crap like “Reset to device, \Device\RaidPort0, was issued.” (with 1-minute system freezes) — and yes it’s misleading, when NOT using raid at all (completely disabled in bios). lol

indeego
indeego
13 years ago
Reply to  Buub

Not all the benefits. You use up CPU and power draw. Some people do not leave their machines idle (Myself, for one, I always sleep the machine when not in use.)

In practice, a far better method of reducing fragmentation is to use systems where fragmentation isn’t a concern: i.e. get drives far larger than your datasetsg{<.<}g

Pin It on Pinterest

Share This

Share this post with your friends!