For years, SSDs thrived under the Serial ATA banner. New generations got bigger, faster, and cheaper. Prices continue to fall and densities continue to increase, but performance has been stagnating for a while. The problem is the SATA 6Gbps interface, whose limited bandwidth keeps modern flash storage from living up to its true potential.
Fortunately, SSDs aren’t inexorably tied to Serial ATA. They can tap into PCI Express, which promises substantially more bandwidth through faster signaling and multiple lanes. They can also ditch Serial ATA’s dated AHCI protocol for an all-new NVM Express protocol architected with solid-state storage in mind.
PCIe SSDs have actually existed for a while, but most early implementations used bridge chips tied to the very same controllers found in SATA drives. Those initial efforts are very different from the native solutions that have crept into the market over the past year. This new breed employs updated controllers that meld PCI Express and NAND interfaces on a single chip. Native connectivity promises to elevate SSDs to new heights, setting the stage for the next revolution in PC storage.
To prepare for the incoming tide of PCIe SSDs, we’ve revamped our storage test rigs with new hardware and benchmarks. We’ve gathered a boatload of data on three PCIe drives and a stack of SATA SSDs, and we’ve learned some interesting lessons along the way. In some respects, the next PC storage revolution is already upon us. In others, though, things haven’t changed one bit.
All aboard the PCI Express
Before we dive into our performance results, let’s get acquainted with our posse of PCIe drives: Plextor’s M6e 256GB, Samsung’s XP941 256GB, and Intel’s DC P3700 800GB. The M6e is the most consumer-friendly solution, so that’s the best place to start.
Released last year, the M6e was the first native PCIe SSD to be widely available to end users. The double-sided M.2 2280 “gumstick” is a perfect fit for the mini SSD slots in most 9-series Intel motherboards, as is the dual-lane Gen2 interface. For systems that lack M.2 connectivity, Plextor sells versions of the M6e mounted to PCIe expansion cards that plug into full-sized slots.
Because it’s a standard AHCI device, the M6e doesn’t require separate drivers. Any modern operating system should recognize it. The M6e also works with both UEFI and legacy BIOSes, which should ensure broad compatibility with new and old motherboards alike. Boot support is pretty much guaranteed.
Marvell provides the M6e’s controller, while Toshiba kicks in the 19-nm MLC NAND. Plextor claims this pairing is good for sequential reads up to 770MB/s, a fair bit faster than SATA’s top speed. The fastest incarnation of the M6e is only rated for 625MB/s sequential writes, though, and our 256GB sample tops out at just 580MB/s. The M6e’s ~100k IOps random I/O ratings aren’t anything to write home about, either. Plenty of SATA drives boast similar specs.
Samsung’s XP941 shares the same M.2 2280 form factor as the M6e, but only one side of its circuit board is populated. Unlike the Plextor, it was never targeted directly at consumers. Samsung designed the XP941 specifically for notebook makers, and the drive is a somewhat less natural fit for desktops as a result.
The XP941’s homegrown controller uses the AHCI protocol, so at least OS support isn’t an issue. However, booting from the drive requires UEFI-compatible firmware equipped with the appropriate option ROM. Notebook makers can easily integrate the necessary code into systems that use the XP941. So can mobo makers, but they have less incentive to support a drive that’s rare even in PC enthusiast circles. For what it’s worth, our test rigs’ Asus Z97-Pro motherboards have no issues booting from the XP941.
The Samsung controller’s four-lane Gen2 interface provides another clue that the XP941 wasn’t meant for desktops. This wider pipe offers double the bandwidth available to the M6e, but it also requires more PCIe lanes than are available in most M.2 slots. Fortunately, four-lane adapter cards are capable of mating the XP941 with full-sized PCIe slots. The XP941 works in dual-lane M.2 slots, too, just not at full speed.
Speaking of which, the XP941 is rated to hit up to 1170MB/s with sequential reads and 930MB/s with writes. The 256GB version we’ve been testing is specced at 1080MB/s and 800MB/s, respectively, which still gives it a substantial leg up on the M6e. Random reads clock in at an impressive 120k IOps, but random writes are rated for half that—and only 72k IOps for the top model. Even Samsung’s budget-minded 850 EVO SSD has better specs on that front.
Both of the M.2 drives are nearly a dollar per gig: the XP941 256GB sells for $229.99 at Newegg, while the equivalent M6e runs $215.99. That’s peanuts compared to the ringer we drafted to illustrate the true potential of PCIe SSDs. Intel’s DC P3700 800GB SSD costs over $2,400, or three bucks a gig, putting this datacenter-grade SSD on a whole ‘nother level.
The P3700 is way too much SSD for an M.2 gumstick. It’s only available in a half-height expansion card, as pictured above, or in an extra-thick 2.5″ enclosure. Both implementations have substantial heatsinks, and the hunks of finned metal aren’t just for show. The 800GB drive has an 18W power rating, while the 2TB flagship consumes up to 25W.
Intel evidently puts all that power to good use. The P3700 is rated to hit 2800MB/s in sequential reads and 1900MB/s in writes. Random reads peak at an astounding 460k IOps, according to the spec sheet, while random writes top out at an impressive 175k IOps (and 90k IOps for the 800GB unit we tested). SSD makers use different metrics to rate their drives, so don’t get too caught up comparing the claimed numbers. The important takeaway is that the P3700 is a very different class of PCIe SSD.
The drive even features a different class of host interface. Intel’s custom controller features a four-lane Gen3 interface with twice the bandwidth of an equivalent Gen2 link. On top of that, the chip conforms to the NVM Express protocol rather than AHCI. Otherwise known as NVMe, this protocol has much lower overhead than AHCI, and it’s built to scale up with massively parallel flash arrays. Where AHCI is limited to a single command queue 32 entries deep, NVMe supports up to 64k parallel queues, each of which can be up to 64k slots deep.
NVMe support is baked into Windows 8.1, and a hotfix is available to add support to Windows 7. Third-party drivers aren’t required for those operating systems as a result. Intel still provides its own drivers for the P3700, and it claims they’re faster than the ones that come with the OS. We’ve been using the Intel drivers for all our testing.
Native OS support makes it easy to get NVMe SSDs working as secondary storage. Booting from these drives requires UEFI firmware that explicitly supports NVMe boot devices, though. Support is a little spotty, as far as I can tell, and the Z97-Pro only received the necessary firmware update in February. Your mileage may vary with other makes and models.
With the PCIe drives covered, it’s time to introduce the cadre of SATA SSDs we’ve gathered to provide comparative context. The field includes a diverse assortment of drives from some of the biggest names in the business: Crucial’s BX100 and MX200, Intel’s 335 and 480 Series, and Samsung’s 850 EVO and 850 Pro. For nostalgia, we’ve also tested a vintage Intel X25-M G2 SSD that dates back to 2009. That drive is so old that its SATA interface is capped at 3Gbps speeds, but don’t rule it out entirely. Turns out the old girl still has a few tricks up her sleeve.
Now, on to the benchmarks…
IOMeter — Sequential and random performance
IOMeter fuels much of our new storage suite, including our sequential and random tests. These tests are run across the full extent of the drive at two queue depths. The QD1 tests simulate a single thread, while the QD4 results emulate a more demanding desktop workload. (87% of the requests in our old DriveBench 2.0 trace of real-world desktop activity have a queue depth of four or less.) Clicking the buttons below the graphs switches between the different queue depths.
Our sequential tests use a relatively large 128KB block size. The results are color-coded to make the PCIe SSDs easier to spot, and the X25-M G2 is clad in a darker shade to set it apart from the other SATA drives.
In this first batch of tests, Intel’s DC P3700 trounces the field regardless of the queue depth. Its lead is particularly pronounced in writes, where the drive nearly triples the sequential speeds posted by the Samsung XP941. The read results are much closer, but the P3700 still leads by around 400MB/s.
Samsung easily takes the race for second place. The XP941 outpaces the M6e by a wide margin in sequential reads, and it’s over 200MB/s ahead with writes. The M6e’s write speed at QD1 is actually slower than for a handful of SATA SSDs.
The drives generally hit higher transfer rates at QD4, but that test isn’t demanding enough for the P3700 to reach top speed. Intel recommends a queue depth of 128 for peak performance. That’s a little outside the realm of what we’d consider typical for a consumer PC, though we have confirmed the drive reaches its potential at higher queue depths.
Before moving on, notice the X25-M languishing at the back of the pack. The drive barely breaks 100MB/s with writes, and its read speeds are about 50% slower than for the fastest Serial ATA drives. SATA SSDs have come a long way over the past six years.
Next, we’ll turn our attention to performance with 4KB random I/O. We’ve reported average response times rather than raw throughput, which we think makes sense in the context of system responsiveness.
Once again, Intel’s datacenter SSD takes the top spot with both queue depths. But its margins of victory are slim at best, and the PCIe drives don’t have a definitive advantage over their SATA peers. The M6e and XP941 are right in the thick of things with the other SSDs.
Random read response times increase marginally with the queue depth. There are bigger slowdowns with random writes, but note the scale. Those differences still amount to less than a millisecond for everything but the X25-M.
The preceding tests are based on the median of three consecutive three-minute runs. SSDs typically deliver consistent sequential and random read performance over that period, but random write speeds worsen as the drive’s overprovisioned area is consumed by incoming writes. We explore that decline on the next page.
IOMeter — Sustained and scaling I/O rates
Our sustained IOMeter test hammers drives with 4KB random writes for 30 minutes straight. It uses a queue depth of 32, which should result in higher speeds that saturate each drive’s overprovisioned area more quickly. This lengthy—and heavy—workload isn’t indicative of typical PC use, but it provides a sense of how the drives react when they’re pushed to the brink.
We’re reporting IOps rather than response times for these tests. Click the buttons below the graph to switch between SSDs.
More domination from the P3700. To be fair, the deck is stacked heavily in the drive’s favor. The P3700 sets aside 25% of its total flash capacity as overprovisioned area, a much higher percentage than the ~7% reserved by consumer-grade drives like the M6e and XP941. That advantage is compounded by the P3700’s monstrous capacity, resulting in much more overprovisioned area than the smaller PCIe drives. The data for the different 850 EVO and Vector 180 sizes illustrate how higher-capacity SSDs can deliver better random write performance than their lower-capacity siblings.
Despite having half the PCIe bandwidth of the XP941, the M6e easily beats the Samsung SSD. It peaks higher at the beginning of the test, and it maintains more consistent write speeds across the test’s full extent. Plenty of the SATA SSDs hit even higher peak and steady-state speeds, suggesting that interface bandwidth has little impact on performance here.
To show the data in a slightly different light, we’ve also graphed the peak random write rate and the average, steady-state speed over the last minute of the test.
Yeah, the P3700 is in another league, especially as the test wears on. So is the X25-M G2, which is handicapped by dated components and a relatively small 160GB capacity. Long-term performance consistency wasn’t a hot topic back when this older drive debuted, but it’s become a bigger focus in recent generations, especially among SSDs with enterprise aspirations.
Our final IOMeter test examines performance scaling across a broad range of queue depths. We ramp all the way up to a queue depth of 128. Don’t expect AHCI-based drives to scale past 32, though; that’s the depth of their native command queue.
We use a database access pattern comprising 66% reads and 33% writes, all of which are random. The test runs after 30 minutes of continuous random writes that put the drives in a simulated used state. Click the buttons below the graph to switch between total, read, and write IOps.
The P3700’s peak throughput is five times higher than the mark set by the next-fastest drive, the Vector 180 240GB. Using the same scale to compare all the drives would diminish the differences between them, so this first set of graphs is limited to the two leaders and the other PCIe SSDs.
With the proper context established, we can move on to results for the individual drives. The following graphs show total, read, and write IOps together. The expanded scale remains for the P3700 plot, but a more appropriate one is used for the rest.
While the other SSDs predictably peak at a queue depth of 32 or lower, the P3700 keeps scaling all the way up to 128 concurrent requests. Its I/O rate ramps aggressively up to a queue depth of 64 and then rises only slightly at QD128.
Clicking through the rest of the graphs highlights the fact that the M6e and XP941 pose little threat to their SATA brethren here. Even Crucial’s budget-minded BX100 offers higher throughput across the board.
TR RoboBench — Real-world transfers
RoboBench trades synthetic tests with random data for real-world transfers with a range of file types. Developed by our in-house coder, Bruno “morphine” Ferreira, this benchmark relies on the multi-threaded robocopy command build into Windows. We copy files to and from a wicked-fast RAM disk to measure read and write performance. We also cut the RAM disk out of the loop for a copy test that transfers the files to a different location on the SSD.
Robocopy uses eight threads by default, and we’ve also run it with a single thread. Our results are split between two file sets, whose vital statistics are detailed below. The compressibility percentage is based on the size of the file set after it’s been crunched by 7-Zip.
|Number of files||Average file size||Total size||Compressibility|
The media set is made up of large movie files, high-bitrate MP3s, and 18-megapixel RAW and JPG images. There are only a few hundred files in total, and the data set isn’t amenable to compression. The work set comprises loads of TR files, including documents, spreadsheets, and web-optimized images. It also includes a stack of programming-related files associated with our old Mozilla compiling test and the Visual Studio test on the next page. The average file size is measured in kilobytes rather than megabytes, and the files are mostly compressible.
RoboBench’s write and copy tests run after the drives have been put into a simulated used state with 30 minutes of 4KB random writes. The pre-conditioning process is scripted, as is the rest of the test, ensuring that drives have the same amount of time to recover.
Read speeds are up first. Click the buttons below the graphs to switch between one and eight threads.
Although the P3700 continues to beat down its rivals, the XP941 is more competitive than one might expect, especially with the larger files in the media test. The M6e isn’t nearly as fast at reading those files, but the gap narrows considerably in the work tests.
Even the mighty P3700 struggles to distance itself from the SATA field in the single-threaded work test. The read speeds in that test are low across the board, and upping the thread count delivers big gains. That said, the top speeds with eight threads are still well short of what the drives achieve with even a single thread in the media test.
RoboBench’s write results paint a somewhat similar picture. Speeds are much higher in the media tests, and the PCIe pecking order is unchanged. However, the M6e gets nipped by a couple of SATA drives in the eight-thread media test, and it’s near the bottom of the pile in the work tests.
This time around, the P3700 has a much greater advantage over the XP941 in the media tests. The Intel drive roughly doubles the write speeds of the Samsung regardless of the thread count. Its lead in the work tests remains slim, though.
As one might expect, the copy results look like a combination of the read and write scores. The P3700 maintains its lead over the XP941, which in turn trumps the M6e. The Plextor SSD is a smidgen faster than the SATA drives in three of four tests, but it’s buried in the middle of the pack when copying work files with a single thread. To be fair, most of the drives are evenly matched there.
The old X25-M G2 struggles to keep up throughout RoboBench, and it’s especially slow when dealing with the larger media files. The load-time tests on the next page show that the drive is still plenty quick for some tasks, though.
Thus far, all of our tests have been conducted with the SSDs connected as secondary storage. This next batch uses the SSDs as system drives.
We’ll start with boot times measured two ways. The bare test depicts the time between hitting the power button and reaching the Windows desktop, while the loaded test adds the time needed to load four applications—Avidemux, LibreOffice, GIMP, and Visual Studio Express—automatically from the startup folder. Our old boot tests focused just on the time required to load the OS, but these new ones cover the entire process, including drive initialization.
Betcha didn’t see that coming. The Plextor M6e turns in the fastest boot times in both tests, edging out the Samsung XP941 by a few seconds and the Intel DC P3700 by a lot more. The P3700 boots slower than most of the SATA drives, including the X25-M G2.
To be fair, we had to use slightly different motherboard firmware to test boot and load times on the P3700. That change could account for the slower boot times, as may the steps required to initialize NVMe boot devices. All of the other SSDs use the AHCI protocol.
Next, we’ll tackle load times with two sets of tests. The first group focuses on the time required to load larger files in a collection of desktop applications. We open a 790MB 4K video in Avidemux, a 30MB spreadsheet in LibreOffice, and a 523MB image file in GIMP. In the Visual Studio Express test, we open a 159MB project containing source code for the LLVM toolchain. Thanks to Rui Figueira for providing the project code.
Like the boot tests, these are hand-timed with a stopwatch.
So much for that idea. The PCIe and SATA drives are evenly matched, with little more than a second separating the fastest from the slowest examples. The variance inherent to hand-timed tests makes it difficult to get worked up about such minute differences.
What about games? We fired up a trio of recent titles to see how long it took to resume their single-player campaigns.
The drives are separated by a few seconds in Shadow of Mordor, but that game takes forever to load, making the deltas less noticeable. Good luck detecting a difference between the drives in Tomb Raider and Arkham Origins, where the gaps are less than half a second. Even the old X25-M is within striking distance of the fastest drives.
For a while, I’ve worried that our last batch of load-time tests showed little difference between SSDs because the games were too old. These newer titles behave similarly, which is vindicating but also a little disappointing. We’ll probably have to bench some mechanical drives to see any meaningful differences in these tests.
Test notes and methods
Here’s are the essential details for all the drives we tested:
|Crucial BX100 500GB||SATA 6Gbps||Silicon Motion SM2246EN||16-nm Micron MLC|
|Crucial MX200 500GB||SATA 6Gbps||Marvell 88SS9189||16-nm Micron MLC|
|Intel X25-M G2 160GB||SATA 3Gbps||Intel PC29AS21BA0||34-nm Intel MLC|
|Intel 335 Series 240GB||SATA 6Gbps||SandForce SF-2281||20-nm Intel MLC|
|Intel 730 Series 480GB||SATA 6Gbps||Intel PC29AS21CA0||20-nm Intel MLC|
|Intel DC P3700 800GB||PCIe Gen3 x4||Intel CH29AE41AB0||20-nm Intel MLC|
|Plextor M6e 256GB||PCIe Gen2 x2||Marvell 88SS9183||19-nm Toshiba MLC|
|Samsung 850 EV0 250GB||SATA 6Gbps||Samsung MGX||32-layer Samsung TLC|
|Samsung 850 EV0 1TB||SATA 6Gbps||Samsung MEX||32-layer Samsung TLC|
|Samsung 850 Pro 500GB||SATA 6Gbps||Samsung MEX||32-layer Samsung MLC|
|Samsung XP941 256GB||PCIe Gen2 x4||Samsung S4LN053X01||19-nm Samsung MLC|
|Samsung 850 Pro 500GB||SATA 6Gbps||Samsung MEX||32-layer Samsung MLC|
|OCZ Vector 180 240GB||SATA 6Gbps||Indilinx Barefoot 3 M10||A19-nm Toshiba MLC|
|OCZ Vector 180 960GB||SATA 6Gbps||Indilinx Barefoot 3 M10||A19-nm Toshiba MLC|
All the SATA SSDs were connected to the motherboard’s Z97 chipset. The M6e was connected to the Z97 via the motherboard’s M.2 slot, which is how we’d expect most folks to run that drive. Since the XP941 requires more lanes, it was connected to the CPU via a PCIe adapter card. The DC P3700 was hooked up to the CPU via the same full-sized PCIe slot.
We used the following system for testing:
|Processor||Intel Core i5-4690K 3.5GHz|
|Platform hub||Intel Z97|
|Platform drivers||Chipset: 10.0.0.13
|Memory size||16GB (2 DIMMs)|
|Memory type||Adata XPG V3 DDR3 at 1600 MT/s|
|Audio||Realtek ALC1150 with 220.127.116.1144 drivers|
|System drive||Corsair Force LS 240GB with S8FM07.9 firmware|
|Storage||Crucial BX100 500GB with MU01 firmware
Crucial MX200 500GB with MU01 firmware
Intel 335 Series 240GB with 335u firmware
Intel 730 Series 480GB with L2010400 firmware
Intel DC P3700 800GB with 8DV10043 firmware
Intel X25-M G2 160GB with 8820 firmware
Plextor M6e 256GB with 1.04 firmware
OCZ Vector 180 240GB with 1.0 firmware
OCZ Vector 180 960GB with 1.0 firmware
Samsung 850 EVO 250GB with EMT01B6Q firmware
Samsung 850 EVO 1TB with EMT01B6Q firmware
Samsung 850 Pro 500GB with EMXM01B6Q firmware
Samsung XP941 256GB with UXM6501Q firmware
|Power supply||Corsair Professional Series AX650 650W|
|Operating system||Windows 8.1 Pro x64|
Thanks to Asus for providing the systems’ motherboards, Intel for the CPUs, Adata for the memory, and Corsair for the system drives and PSUs. And thanks to the drive makers for supplying the rest of the SSDs.
We used the following versions of our test applications:
- IOMeter 1.1.0 x64
- TR RoboBench 0.2a
- Avidemux 2.6.8 x64
- LibreOffice 4.3.2
- GIMP 2.8.14
- Visual Studio Express 2013
- Batman: Arkham Origins
- Tomb Raider
- Middle Earth: Shadow of Mordor
Some further notes on our test methods:
- To ensure consistent and repeatable results, the SSDs were secure-erased before every component of our test suite. For the IOMeter database, RoboBench write, and RoboBench copy tests, the drives were put in a simulated used state that better exposes long-term performance characteristics. Those tests are all scripted, ensuring an even playing field that gives the drives the same amount of time to recover from the initial used state.
- We run virtually all our tests three times and report the median of the results. Our sustained IOMeter test is run a second time to verify the results of the first test and additional times only if necessary. The sustained test runs for 30 minutes continuously, so it already samples performance over a long period.
- Steps have been taken to ensure the CPU’s power-saving features don’t taint any of our results. All of the CPU’s low-power states have been disabled, effectively pegging the frequency at 3.5GHz. Transitioning between power states can affect the performance of storage benchmarks, especially when dealing with short burst transfers.
The test systems’ Windows desktop was set at 1920×1200 at 60Hz. Most of the tests and methods we employed are publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Our fresh look at SSD performance shows the considerable promise of PCI Express SSDs, but we’re not finished yet. We’re working to bring our DriveBench simulation of real-world I/O to the new rigs, and we’re eyeing several other PCIe drives to run through the gauntlet. Samsung is sending us its SM951, the Gen3 successor to the XP941. Kingston’s HyperX Predator is also on our radar, and Intel is promising an SSD revolution for April 2. Stay tuned.
As I said in the intro, the revolution is already here in some respects. The Intel DC P3700’s awe-inspiring performance with both sequential and random I/O highlights to the benefits of both PCIe’s greater interface bandwidth and the NVMe protocol’s streamlined, scalable design. This datacenter-grade drive is incredibly expensive, of course, but cutting-edge technology tends to start at the top before trickling down to the masses.
Besides, even the relatively affordable Samsung XP941 and Plextor M6e can deliver huge gains with sequential transfers. The XP941 is generally the faster of the two, as one would expect given its fatter PCIe pipe, but both M.2 drives fail to live up to the random I/O performance of the top SATA alternatives. Even though Serial ATA hinders performance in some situations, the host interface clearly isn’t the limiting factor for all workloads.
One need look no further than our load-time tests to see that PCIe SSDs aren’t always faster than their SATA counterparts. Some drives boot a few seconds faster than others, but we didn’t find any meaningful differences in application or game load times. Heck, the nearly-six-year-old X25-M G2 had little trouble keeping up in those tests.
While power users and other folks with demanding storage workloads can extract a lot of additional performance from PCIe SSDs, there are fewer benefits for typical PC users right now, especially when one considers the additional cost. Revolutions tend to start on the fringes before amassing mainstream support, though. Prices will surely fall as more PCIe drives enter the market, and software developers will hopefully do a better job of exploiting solid-state storage as its installed base grows. We’ll be watching closely.