Multitasking with Native Command Queuing

INTELLIGENTLY REORDERING I/O REQUESTS in order to minimize the performance impact of a hard drive’s mechanical latency—otherwise known as command queuing—is unquestionably the right thing to do. However, though command queuing has long proven to be a valuable asset to SCSI drives faced with multi-user and enterprise-class workloads, the performance benefits of Native Command Queuing (NCQ) in desktop Serial ATA drives have been harder to illustrate. Unfortunately, most commonly used hard drive benchmarks don’t play to NCQ’s strengths, and those that do involve server-style workloads that are hardly indicative of desktop environments.

In order to test Native Command Queuing’s performance potential on the desktop, something completely different is needed—and we have just the thing. Join me as we explore a new collection of hard drive tests that showcase why NCQ really does matter on the desktop.

Command queuing in brief
Hard drives use a mechanical drive head to write and read data to and from a spinning platter. Often, though, the drive head sits idle waiting for the platter to rotate to the correct position for a pending read or write operation. This idle delay is referred to as the drive’s rotational latency, and the lower the better. Drive makers have long sought to reduce rotational latencies by spinning platters faster, and they’ve managed to hit dizzying 15,000-RPM spindle speeds with the latest high-end SCSI drives. Unfortunately, spinning a drive at 15,000 or even 10,000 RPM isn’t as easy as giving it more gas. It’s not cheap, either, which is why relatively low-cost desktop drives are stuck at 7,200 RPM.

While faster spindle speeds are a brute-force approach to reducing rotational latency, Native Command Queuing is a more cerebral and ultimately less expensive alternative. Instead of blindly executing IO requests on a first-come, first-serve basis, NCQ allows a hard drive to intelligently reorder multiple requests based on their proximity to the drive head’s position. Here’s an example of how it works:


Source: NVIDIA

Without command queuing, the drive must complete two rotations to execute requests one through four in the order that they are received. However, with a little intelligent reordering, an NCQ-equipped drive can fulfill all four requests in only a single rotation. Fewer rotations translate to milliseconds saved—a virtual eternity within a modern PC. Because of the nature of data organization on a hard disk, which often involves close proximity for pockets of related data (in a single file or group of files, for instance), command queuing can potentially pay big dividends, especially when multiple tasks are operating simultaneously on separate data sets.

Although it’s a much smarter approach than turning up spindle speeds, command queuing has some limitations. Most notably, it can only improve performance if consecutive IO requests reference different regions of a disk. Command queuing can’t improve the scheduling of streaming IO requests that arrive at the drive in order, either. In fact, queuing IO requests may introduce enough overhead to reduce a drive’s performance in some instances. However, in environments with more random I/O profiles, such as multitasking or multi-user loads, NCQ has considerable potential. In fact, command queuing is such a valuable asset in multi-user environments that high-end SCSI drives have been using it for years.

A new approach to testing
Since Native Command Queuing’s ability to intelligently reorder IO request depends on more random I/O profiles, a new approach to testing is required. Most hard drive benchmarks rely on streaming transfers that don’t benefit from command queuing, so they won’t do. IOMeter presents a best-case scenario for command queuing by hammering drives with randomized access patterns and up to 256 simultaneous requests, but that kind of extreme load is meant to simulate multi-user enterprise environments, not single-user desktop workloads.

At best, desktop PCs are only doing a few things at once. Often those tasks, such as gaming, rendering, image editing, word processing, web surfing, and crunching work units for distributed computing projects, aren’t even disk-bound. A few are, though. Creating and extracting compressed files, importing and exporting Outlook PST files, importing MPEG video into VirtualDub for editing, and copying files are all disk-intensive activities that I put my PC through on a regular basis. We’ve distilled those activities into the following disk-intensive tasks:

  • VirtualDub import — A 6.86GB Hell’s Kitchen MPEG2 video file was opened using VirtualDub MPEG2.
  • File copy — The 6.86GB MPEG2 file was copied to another location on the disk.
  • Outlook import — A folder with ~20,000 email messages was imported into Outlook XP from a PST file.
  • Outlook export — A folder with ~20,000 email messages was exported from Outlook XP to a PST file.
  • Compress create — A ZIP file was created with 512MB of text files, images, and Excel spreadsheets.
  • Compress extract — 512MB of text files, images, and Excel spreadsheets were extracted from a compressed ZIP file.

Since dual-core processors and multitasking are all the rage these days, we’ll be combining these tasks into a series of multitasking tests. We chose to classify our file copy and VirtualDub import tasks as foreground activities and our Outlook and compression operations as ones we’d probably do in the background. We then combined each foreground task with each background task, giving us a total of eight multitasking loads. For kicks, we also threw in a dual file copy test that involved copying 512MB of text files, images, and Excel spreadsheets alongside our 6.86GB MPEG2 transfer.

Performance in multitasking tests is difficult to quantify, and test methods are difficult to replicate exactly, which is where Intel’s iPEAK Storage Performance Toolkit comes in. iPEAK allows us to record a trace of all the IO requests in our multitasking tests and play them back on various drives. iPEAK reports a drive’s mean service time for each test, giving us results that are easy to compare.

In an attempt to reproduce a typical hard drive, we recorded our traces on a 40GB Windows XP partition filled with program and data files, music, movies, and games. The partition had 10GB of free space and was defragmented before the traces were recorded. Because we’re doing multitasking tests, we stopped recording each trace when one of the two tasks was completed. In the end, each test ended up representing between two and 10 minutes worth of I/O requests.

 

Our testing methods
To test the benefits of Native Command Queuing, we ran through our test suite with a couple of NCQ-capable hard drives running with command queuing enabled and disabled.

All tests were run at least twice, and their results were averaged, using the following test system.

Processor Intel Pentium 4 Extreme Edition 3.4GHz
Front-side bus 800MHz
Motherboard Asus P5WD2 Premium WiFiTV Edition
BIOS revision 0422
North bridge Intel 955X
South bridge Intel ICH7R
Chipset drivers Intel 7.2.1.1003
Memory size 1GB (2 DIMMs)
Memory type Micron DDR2 SDRAM at 533MHz
CAS latency (CL) 3
RAS to CAS delay (tRCD) 3
RAS precharge (tRP) 3
Cycle time (tRAS) 8
Hard drives Maxtor DiamondMax Plus 9 160GB PATA
Maxtor DiamondMax 10 300GB SATA
Seagate Barracuda 7200.8 400GB SATA
Storage drivers Intel RAID/AHCI 5.1.0.1022
Audio ICH7R/ALC882D
Audio driver Realtek HD 1.22
Graphics ATI  Radeon X700 Pro 256MB with CATALYST 5.4 drivers
OS Microsoft Windows XP Professional
OS updates Service Pack 2, DirectX 9.0c

iPEAK trace playback requires an empty disk, so our system’s OS and applications are running off a Maxtor DiamondMax Plus 9 ATA/133 hard drive.

Our test systems were powered by OCZ PowerStream power supply units. The PowerStream was one of our Editor’s Choice winners in our latest PSU round-up.

We used the following versions of our test applications:

  • Intel iPEAK Storage Performance Toolkit 3.0

The test systems’ Windows desktop was set at 1280×1024 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.

All the tests and methods we employed are publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

 

iPEAK multitasking performance
The mean service time of each drive is reported in milliseconds, with lower values representing better performance.

Both drives enjoy a healthy performance boost from NCQ in our dual file copy test, with the DiamondMax drive’s performance improving by an impressive 19%.

NCQ also proves valuable during the creation of a compressed file. Performance improves when copying files and loading an MPEG2 video into VirtualDub, although the Maxtor drive doesn’t get much faster at the latter.

Results are more mixed when we turn our multitasking to compressed file extraction. Here, the DiamondMax 10 is barely faster with NCQ enabled during a VirtualDub import, and actually slower during a file copy. The Barracuda performs better with NCQ in both instances, although the NCQ performance advantage in the file copy test is pretty slim.

 

iPEAK multitasking performance – con’t

Moving to an Outlook PST export, NCQ proves to have little impact on the Maxtor drive’s performance. However, the Barracuda’s performance improves by an impressive 12% with a VirtualDub import as our second task. Of course, the DiamondMax drive is still faster overall, perhaps because it’s equipped with a larger 16MB cache.

Our final tests involve and Outlook PST import, and here we see some of our biggest performance gains. Note the DiamondMax drive’s huge performance jump in the file copy test, where NCQ improves performance by an incredible 32%.

 

Conclusions
Native Command Queuing can definitely improve the performance of desktop systems. Across our nine tests, NCQ performance gains average to around 9% for both the Maxtor and Seagate drives. That’s pretty good, especially considering that we saw a handful of much more impressive gains in some of our tests.

Of course, we weren’t taking it easy on the drives. Our multitasking traces represent demanding, disk-intensive loads, but we’ve done our best to ensure that none of the multitasking scenarios we’ve portrayed are outlandish or unrealistic for a single-user desktop. In fact, as dual-core processors encourage more frequent and demanding multitasking, our scenarios might even start to look a little conservative.

Count on seeing more of these multitasking tests in our upcoming storage reviews. We’re eager to see how new hard drives and even multi-drive RAID arrays handle our iPEAK workloads. 

Comments closed
    • blubje
    • 10 years ago

    Unfortunately, nvidia still doesn’t know how to write a driver properly (let alone design hardware). There is still a problem with their nforce sata driver after all these years, for example with NCQ enabled in the latest driver you can occassionally get crap like “Reset to device, \Device\RaidPort0, was issued.” (with 1-minute system freezes) — and yes it’s misleading, when NOT using raid at all (completely disabled in bios). lol

    • Naito
    • 15 years ago

    tell that to the windows installer
    just installing the OS on a clean 80GB drive will have it end up with red all over the built-in defrag utility. and with 95% free space.

    no, files SHOULDN’T be fragmented if you copy them to a clean drive. But this is Windows, and NTFS. *sigh*

    to be fair, the only filesystems that I know of that has no fragmentation problems are OS X’s new journaled HFS, and Reiser4. But that’s only cuz they have a built-in background defragmenter too.

    edit: oops, this was in reply to #33

    • domsmith
    • 15 years ago

    If you want to do a defragment test its best to use something like leech ftp and copy from one hd to another.I find it really fragments files up on a hd and brings any defragger to its knees. It had to defrag a system that was split into over a million fragments.Disk Keeper took 8 hours to fix it.

    I had one file split into 17,622 pieces alone

      • indeego
      • 15 years ago

      Fragmentation has to do with filesize, number of files, and available slack space–not just individual files. A gig of data is not going to fragment on a clean 500 gig drive.

      In all cases spending the money to increase slack space (i.e. geta larger drive) is a far better method over defragmentingg{.}g You want to prevent the fragmentation in the first place.
      tips:
      If you are defragmenting regularly you are wasting your time, as it’s just going to fragment right back. Again, put your data on more drives or get a bigger drive. I recommend never filling a drive more than 50% if you can help it.

      The less free space you have the quicker your drive will get fragmented and the more your system performance will suffer (Split I/O’s)g{<.<}g

        • domsmith
        • 15 years ago

        Fragmentation has more to do with how well a program has been written to use the filesystem.

        In the case of the program I was using (leechftp) the way it writes to the disk is extremely poor, creating r[

        • Buub
        • 15 years ago

        That’s good in theory. Too bad it doesn’t work that way in practice.

        Any good defragger should be able to schedule defrag passes when you aren’t using your machine, so none of your time wasted, with all the benefits.

          • indeego
          • 11 years ago

          Not all the benefits. You use up CPU and power draw. Some people do not leave their machines idle (Myself, for one, I always sleep the machine when not in use.)

          In practice, a far better method of reducing fragmentation is to use systems where fragmentation isn’t a concern: i.e. get drives far larger than your datasetsg{<.<}g

      • Buub
      • 15 years ago

      Well, 8 hours isn’t saying much since Diskeeper sucks. Did it even completely defrag in one pass? Try PerfectDisk next time.

        • domsmith
        • 15 years ago

        Yep one pass. 200gigs of files on a 300 gig hd each file was split into 60-65k fragments caused by the way the program wrote to disk. 80-90% fragmentation

    • crose
    • 15 years ago

    How does this compare with a 74GB Raptor w/TCQ? Does nForce4 support SATA TCQ??

    It would have been great if you had compared it with one or two SCSI disks too (10k/15k rpm).

    • madclicker
    • 15 years ago

    How do I disable/enable NCQ on a Dimond MAX?

    • albundy
    • 15 years ago

    I cannot comment too much on this since I have been an all scsi user since, well since ATA pissed me off soooo much 13 years ago. Super high latency with very high failure rate and very low spindle speeds really turned me away from ATA drives permanently. The first thing I noticed when I switched over to the dark side was how snappy everything started and copied. Could of been NCQ on my first 4.5gb 7200rpm brick drive, or it could have been the low lantency. dunno. I can only imagine what ata users must be going through as I am now with 2 15k drives. I really notice the difference when I go to work every day trying to work with the same applications on an ata system. One of the benefits I have encountered is that I have more resources available due to FDD, Primary and Secondary ata controllers being disabled in my bios. The OS wont install what it dont see! Dont know if they are still reserved!

    • DrDillyBar
    • 15 years ago

    I love reading this kind of article, because I think of this kind of thing, and am very pleased to see you tackle this kind of scenerio.
    *dons the Mr. Happy shirt in celebration*

    • Prototyped
    • 15 years ago

    About time someone focused on the *[

    • continuum
    • 15 years ago

    Woah… holy disk intensive tasks!!

    The SR guys have already noticed.

    §[<http://forums.storagereview.net/index.php?showtopic=20476<]§

    • nonegatives
    • 15 years ago

    Glad to see a test that looks more like how I actually use a computer! Mixing 10 audio tracks, editing several 6 Megapixel images, uploading to webserver and burning CD’s. All on a ancient XP2000+ on Nforce220 with onboard video. Looking forward to the full review to help build an X2 replacement.

    • indeego
    • 15 years ago

    Couple of other points:

    No baseline for a drive without capability of NCQ. Yes you used drives with NCQ disabled, but that is not the same as comparing to a drive with no capacity for NCQ. Just for sh*ts and giggles, and to be an annoying TR reader. You test in software mode for your 3d cards… why not HDD’s?

    You reached a conclusion at the beginning of your review. While this may have been reached at the end of your testing, but before you wrote it up, my concern is that you are striving to prove something that in your mind is already a foregone conclusion. You are probably fine here, the tests look sound, it’s just a dangerous assumption to be making throughout your testing process. (Indeterminacy of theory under empirical testing)

    §[<http://en.wikipedia.org/wiki/Scientific_method#Indeterminacy_of_theory_under_empirical_testing<]§

      • Dissonance
      • 15 years ago

      This article is just a freebie. I was in the process of trying to develop some tests for our storage reviews that would take advantage of NCQ. Figured that multitasking should do the trick, but before testinga stack of drives for an upcoming hard drive review, decided to test the test itself with just a couple and NCQ enabled/disabled.

      The results were interesting enough to share without making you guys wait a couple of weeks for the next hard drive review, which will include more drives, including one or more that don’t support NCQ at all.

    • liquidsquid
    • 15 years ago

    *duh* Deleted stupidity.

    Wow that Diamond Max is fast! More than double the speed of my drives. Go figure.

    -LS

      • sigher
      • 15 years ago

      pity maxtor drives are shitty drives that fail very quickly, in my experience and many others I asked.

    • R2P2
    • 15 years ago

    Was there a problem with the Seagate drive? In some of the tests, the difference between the Seagate and the Maxtor seems awfully large. I’m only used to seeing differences like that in 7200RPM vs 10k RPM comparisons.

      • Klyith
      • 15 years ago

      The Seagate 7200.8 is not a particularly fast drive. It’s seek time is pretty pathetic, though it does have great sustained transfer. There are a number of different theories about it:

      1. There is some way to change to a higher performance mode. Seagate drives used to have a utility for changing “Acoustic Management Mode” but it doesn’t work on the new drives. Some people think that running in NCQ mode changes to a faster mode, but this test shows that isn’t likely.

      2. The ultra high density of the platters is causing problems for the physical seek operation. This is my feeling, and we’ll see if it’s true when other HD makers catch up with Seagate’s 133gb / platter.

      The upshot is that it’s a great drive for mass storage or something like a HTPC (video capture likes STR). But it would suck for a games drive or the primary drive of a desktop.

    • drfish
    • 15 years ago

    “crunching work units for distributed computing projects”

    Sweet! That statement was begging for a link to our folding page though… 😉

    • muyuubyou
    • 15 years ago

    I’m guessing these tests were run in “clean” installs, and I’m wondering what difference would NCQ make in a heavily fragmented HD… my guess is a lot more.

    How to benchmark that. Maybe making ghost images of fragmented drives?

      • Usacomp2k3
      • 15 years ago

      You’d have to make sure it was a bit-wise copy, and not a file-copy. (like g4u as opposed to ghost)

      • indeego
      • 15 years ago

      Use the split i/o bench in Window’s perfmong{<.<}g

    • Vrock
    • 15 years ago

    Still waiting for a good reason to upgrade from ATA-133….

      • Krogoth
      • 15 years ago

      Good riddance to those annoying 80’s-era ribbions and hardware SATA consumes far less CPU cycles are IMHO, the biggest reasons to upgrade from PATA. If bandwidth was the only concern, then ATA66 is still good enough for 90% of the PATAs out there.

        • Anomymous Gerbil
        • 15 years ago

        I like SATA, but why are the new/thin/sexy cables so stiff, especially ariund the connectors? It seems like alot of the advantage of those cables was thrown out by making them so stiff, or is it just me?

          • Krogoth
          • 15 years ago

          I had no problems dealing with SATA cabling. I do admit that that aren’t quite as flexible as ribbons, then again they don’t really need to be unless you got one of those cases with a 90 degree sideway HDD cage. In that case you can ether get SATA cabling with a 90 degree connector or stick with PATA/ribbions.

    • totoro
    • 15 years ago

    No Nforce 4? Is there still a problem w/NCQ?

      • liquidsquid
      • 15 years ago

      NOt that I can tell. Didn’t even realize I had it! Neat.

    • indeego
    • 15 years ago

    What was the difference % wise for just the diamondmax tests aloneg{

      • Dissonance
      • 15 years ago

      From the conclusion…

      /[

        • indeego
        • 15 years ago

        I am curious about the performance difference for just the drives split out, not a combined series, it would be helpful to tell if this technology is universally helpful and consistent across all drive manufacturers or just say Maxtor. 9% is great and all but I have concern with the file copy/compress test, as that is a large majority of what I actually do, as opposed to outlook exports/imports. Also I spend the majority of my time in non-multitasking situations where NCQ would only be beneficial in a dedicated role, say a NAS/SAN. No big deal, I can figure it out manually if need beg{<..<}g My point is that if there is no performance hit for all situations that is great, NCQ is sweet, but I'm thinking there are certain situations where NCQ can harm the performance, and those cases could be the majority--thereby it's two steps forward, one step back. This is how I feel about LCD's. Yeah they are great for weight, text clarity, size, but they suck on color reproduction, blank screen production, refresh, etcg{<.<}g

          • Usacomp2k3
          • 15 years ago

          You could always do the math yourself 😉
          (disabled – enabled)/disabled

Pin It on Pinterest

Share This