Single page Print

Multitasking with Native Command Queuing


Why NCQ matters, even on the desktop
— 12:00 AM on August 3, 2005

INTELLIGENTLY REORDERING I/O REQUESTS in order to minimize the performance impact of a hard drive's mechanical latency—otherwise known as command queuing—is unquestionably the right thing to do. However, though command queuing has long proven to be a valuable asset to SCSI drives faced with multi-user and enterprise-class workloads, the performance benefits of Native Command Queuing (NCQ) in desktop Serial ATA drives have been harder to illustrate. Unfortunately, most commonly used hard drive benchmarks don't play to NCQ's strengths, and those that do involve server-style workloads that are hardly indicative of desktop environments.

In order to test Native Command Queuing's performance potential on the desktop, something completely different is needed—and we have just the thing. Join me as we explore a new collection of hard drive tests that showcase why NCQ really does matter on the desktop.


Command queuing in brief
Hard drives use a mechanical drive head to write and read data to and from a spinning platter. Often, though, the drive head sits idle waiting for the platter to rotate to the correct position for a pending read or write operation. This idle delay is referred to as the drive's rotational latency, and the lower the better. Drive makers have long sought to reduce rotational latencies by spinning platters faster, and they've managed to hit dizzying 15,000-RPM spindle speeds with the latest high-end SCSI drives. Unfortunately, spinning a drive at 15,000 or even 10,000 RPM isn't as easy as giving it more gas. It's not cheap, either, which is why relatively low-cost desktop drives are stuck at 7,200 RPM.

While faster spindle speeds are a brute-force approach to reducing rotational latency, Native Command Queuing is a more cerebral and ultimately less expensive alternative. Instead of blindly executing IO requests on a first-come, first-serve basis, NCQ allows a hard drive to intelligently reorder multiple requests based on their proximity to the drive head's position. Here's an example of how it works:


Source: NVIDIA

Without command queuing, the drive must complete two rotations to execute requests one through four in the order that they are received. However, with a little intelligent reordering, an NCQ-equipped drive can fulfill all four requests in only a single rotation. Fewer rotations translate to milliseconds saved—a virtual eternity within a modern PC. Because of the nature of data organization on a hard disk, which often involves close proximity for pockets of related data (in a single file or group of files, for instance), command queuing can potentially pay big dividends, especially when multiple tasks are operating simultaneously on separate data sets.

Although it's a much smarter approach than turning up spindle speeds, command queuing has some limitations. Most notably, it can only improve performance if consecutive IO requests reference different regions of a disk. Command queuing can't improve the scheduling of streaming IO requests that arrive at the drive in order, either. In fact, queuing IO requests may introduce enough overhead to reduce a drive's performance in some instances. However, in environments with more random I/O profiles, such as multitasking or multi-user loads, NCQ has considerable potential. In fact, command queuing is such a valuable asset in multi-user environments that high-end SCSI drives have been using it for years.

A new approach to testing
Since Native Command Queuing's ability to intelligently reorder IO request depends on more random I/O profiles, a new approach to testing is required. Most hard drive benchmarks rely on streaming transfers that don't benefit from command queuing, so they won't do. IOMeter presents a best-case scenario for command queuing by hammering drives with randomized access patterns and up to 256 simultaneous requests, but that kind of extreme load is meant to simulate multi-user enterprise environments, not single-user desktop workloads.

At best, desktop PCs are only doing a few things at once. Often those tasks, such as gaming, rendering, image editing, word processing, web surfing, and crunching work units for distributed computing projects, aren't even disk-bound. A few are, though. Creating and extracting compressed files, importing and exporting Outlook PST files, importing MPEG video into VirtualDub for editing, and copying files are all disk-intensive activities that I put my PC through on a regular basis. We've distilled those activities into the following disk-intensive tasks:

  • VirtualDub import — A 6.86GB Hell's Kitchen MPEG2 video file was opened using VirtualDub MPEG2.

  • File copy — The 6.86GB MPEG2 file was copied to another location on the disk.

  • Outlook import — A folder with ~20,000 email messages was imported into Outlook XP from a PST file.

  • Outlook export — A folder with ~20,000 email messages was exported from Outlook XP to a PST file.

  • Compress create — A ZIP file was created with 512MB of text files, images, and Excel spreadsheets.

  • Compress extract — 512MB of text files, images, and Excel spreadsheets were extracted from a compressed ZIP file.

Since dual-core processors and multitasking are all the rage these days, we'll be combining these tasks into a series of multitasking tests. We chose to classify our file copy and VirtualDub import tasks as foreground activities and our Outlook and compression operations as ones we'd probably do in the background. We then combined each foreground task with each background task, giving us a total of eight multitasking loads. For kicks, we also threw in a dual file copy test that involved copying 512MB of text files, images, and Excel spreadsheets alongside our 6.86GB MPEG2 transfer.

Performance in multitasking tests is difficult to quantify, and test methods are difficult to replicate exactly, which is where Intel's iPEAK Storage Performance Toolkit comes in. iPEAK allows us to record a trace of all the IO requests in our multitasking tests and play them back on various drives. iPEAK reports a drive's mean service time for each test, giving us results that are easy to compare.

In an attempt to reproduce a typical hard drive, we recorded our traces on a 40GB Windows XP partition filled with program and data files, music, movies, and games. The partition had 10GB of free space and was defragmented before the traces were recorded. Because we're doing multitasking tests, we stopped recording each trace when one of the two tasks was completed. In the end, each test ended up representing between two and 10 minutes worth of I/O requests.