A closer look at RAPID DRAM caching on the Samsung 840 EVO SSD

Samsung’s 840 EVO is a breath of fresh air in a sea of largely cookie-cutter SSDs. It’s a true original—the first solid-state drive to combine TLC main storage with a faster SLC write cache. Thanks in part to this TurboWrite cache, the EVO is quick enough to keep pace with high-end SSDs. In some tests, it’s even faster than Samsung’s flagship 840 Pro.

Yet the 840 EVO is priced firmly in budget territory. The 250GB model has dropped to $185 already, and the terabyte variant sells for 65 cents per gig. Meanwhile, the 500GB drive we reviewed last month is under $400.

Our first look at the 840 EVO was admittedly put together in a bit of a hurry. We got the drive only a few days before the product launch, and there was barely enough time to test the thing, let alone write about it. As a result, we weren’t able to explore thoroughly the EVO’s secondary caching layer, otherwise known as RAPID mode. This optional, software-based solution commandeers a portion of system memory for use as a separate drive cache. It’s also coming to Samsung’s 840 Pro later this year.

Since the EVO review, we’ve been putting RAPID mode through its paces across our entire storage test suite. We now have a better sense of where this reimagined RAM disk improves performance—and where it has the opposite effect.

Before diving into our results, let’s spend a moment to, ahem, refresh our memory about what RAPID mode is all about. RAPID stands for Real-time Accelerated Processing of I/O Data, so we should probably honor the all caps. You can enable the feature via Samsung’s SSD Magician utility, and you’ll need to be running Windows 7 or 8 for it to work. When enabled, RAPID mode takes up to a gigabyte of system memory. DRAM is even faster than the flash memory used in SSDs, so there’s some wisdom in using it as a high-speed cache for solid-state drives.

Samsung says RAPID mode is used primarily to accelerate read performance. Data is speculatively loaded into the cache based on user access patterns. The caching intelligence considers several factors, including how frequently and recently the data has been accessed. It also discriminates against large media files to avoid polluting the cache with data that may not benefit from quicker access times.

If this all of sounds familiar, you may be thinking of Windows’ SuperFetch routine, which does something similar. However, Samsung says SuperFetch only considers application data. RAPID mode looks at each and every read request, and it’s capable of caching both application and user data.

In addition to accelerating read performance, RAPID mode offers “write optimization.” Caching writes in DRAM before moving them to the 840 EVO’s flash-based TurboWrite cache helps maintain performance at high queue depths, according to Samsung. This approach evidently conveys other benefits, too, but Samsung isn’t talking specifics. It’s possible RAPID mode collates incoming data and writes it to the flash in larger blocks to make more efficient use of the NAND’s limited endurance.

Of course, caching writes in volatile DRAM introduces the potential for data loss due to an unexpected power failure. RAPID mode transfers the contents of its write cache to the SSD every time the Windows write cache is flushed, so it doesn’t hang on to the data for too long. There’s still some risk attached, which is probably why RAPID mode is disabled by default.

Between reboots, Samsung’s software automatically copies the contents of the RAPID cache to main storage. This seamless step preserves the contents of the cache, but it’ll cost you a gig of SSD capacity.

Although RAPID mode is limited to 1GB right now, Samsung tells us future versions of the software may allow users to allocate even more memory for caching. Plenty of enthusiast rigs have gobs of RAM, and it would be nice to be able to dedicate more of it to the RAPID cache. The software already uses compression to make the most of the available space, though.

Now that we’ve covered the basics, let’s see what RAPID mode can do. We tested the Samsung 840 EVO with and without the RAPID cache enabled. Both configs were tested using the same system described on this page of our 840 EVO review.

Load times

Load time tests are usually prime candidates for cache-based acceleration, but there’s a caveat attached for RAPID mode. Because the cache relies on software that loads after the OS, it can’t speed up the boot process. Our Windows 7 boot duration test highlights this fact nicely.

The RAPID config actually takes a fraction of a second longer than the standard 840 EVO. We measure the Win7 boot duration using the OS’s built-in performance monitoring tools, which tell us how long it takes for the system to become idle after the OS first begins to load. The extra time required to load Samsung’s Magician utility in the background probably explains the 0.4-second delay associated with RAPID mode in this test.

RAPID mode isn’t meant to accelerate OS load times, but it should load games faster… right?

Not these ones, at least according to our stopwatch. There was essentially no difference between the EVO’s standard and RAPID configs in our usual level load tests. We repeated the tests five times with the RAPID config, providing ample opportunity for the software to pick up on the repetitive access pattern. The later runs weren’t consistently faster than the earlier ones, though.

Note that all the SSDs are pretty evenly matched in these tests. They’re all within about a second of each other, which suggests that moving away from mechanical storage may effectively eliminate storage as a bottleneck.

Now, let’s see what happens in our other storage tests.

HD Tune — Transfer rates

HD Tune lets us present transfer rates in a couple of different ways. Using the benchmark’s “full test” setting gives us a good look at performance across the entire drive rather than extrapolating based on a handful of sample points.

RAPID mode doesn’t improve the 840 EVO’s read performance in HD Tune, but it definitely speeds up writes. The extra caching layer pushes the SSD’s average write speed to 420MB/s—a 6% improvement. The RAPID config actually starts the write speed test at 665MB/s, but the burst is short-lived; performance quickly falls and levels off.

Note that the standard 840 EVO config has a slightly higher write rate for the first portion of the test. I suspect that’s a benefit of the drive’s flash-based TurboWrite cache.

HD Tune runs on unpartitioned drives, with no file system in place, so it may not be an ideal candidate for RAPID’s caching intelligence. For another take on sequential speed, we’ll turn to CrystalDiskMark, which runs on partitioned drives. We used the benchmark’s sequential test with the default 1GB transfer size and randomized data.

When Samsung first demonstrated RAPID mode to the press, it used CrystalDiskMark. No wonder. The 840 EVO’s sequential transfer rates more than double when the DRAM-based caching scheme is enabled.

Interestingly, the first run in the read test registered only 536 MB/s. RAPID mode only kicked reads into high gear for subsequent runs, which all clocked in around 1150 MB/s. Writes were through the roof right off the bat, though.

HD Tune — Random access times

In addition to letting us test transfer rates, HD Tune can measure random access times. We’ve tested with four transfer sizes and presented all the results in a couple of line graphs. For readability’s sake, those configs only have a handful of results. We also busted out bar graphs that provide a broader selection of results for the 4KB and 1MB transfer sizes.

In the 4KB results, the RAPID cache improves read access times by a factor of about five.

Interestingly, RAPID mode only helps in the 4KB test. There’s essentially no change in the EVO’s read access times in the 512-byte, 64KB, or 1MB tests.

The benefits of RAPID mode are more universal in the write speed tests. There, DRAM caching accelerates access times across the board. The speedups range from 2.8 to 9.8X, depending on the transfer size.

TR FileBench — Real-world copy speeds

Concocted by resident developer Bruno “morphine” Ferreira, FileBench runs through a series of file copy operations using Windows 7’s xcopy command. Using xcopy produces nearly identical copy speeds to dragging and dropping files using the Windows GUI, so our results should be representative of typical real-world performance. We tested using the following five file sets—note the differences in average file sizes and their compressibility. We evaluated the compressibility of each file set by comparing its size before and after being run through 7-Zip’s “ultra” compression scheme.

  Number of files Average file size Total size Compressibility
Movie 6 701MB 4.1GB 0.5%
RAW 101 23.6MB 2.32GB 3.2%
MP3 549 6.48MB 3.47GB 0.5%
TR 26,767 64.6KB 1.7GB 53%
Mozilla 22,696 39.4KB 923MB 91%

The names of most of the file sets are self-explanatory. The Mozilla set is made up of all the files necessary to compile the browser, while the TR set includes years worth of the images, HTML files, and spreadsheets behind my reviews. Those two sets contain much larger numbers of smaller files than the other three. They’re also the most amenable to compression.

To get a sense of how aggressively each SSD reclaims flash pages tagged by the TRIM command, we run FileBench with the solid-state drives in two states. We first test the SSDs in a fresh state after a secure erase. They’re then subjected to a 30-minute IOMeter workload, generating a tortured used state ahead of another batch of copy tests. We haven’t found a substantial difference in the performance of mechanical drives between these two states. Let’s start with the fresh-state results.

The Samsung 840 EVO is incredibly fast in FileBench, but RAPID mode slows it down. In each and every test, the RAPID config pulls up short of the standard drive.

Putting the EVO into a used state doesn’t change the outcome, either. RAPID mode is consistently slower, regardless of the file set.

There are a couple of interesting things going on behind the scenes. The RAPID config’s copy speed goes up after the first run in just about every test, suggesting that the caching scheme is learning something—just not enough to keep up with the standard setup. The only exceptions to that rule are the movie tests, which slow down after the first run. Looks like RAPID mode is smart enough to ignore the large video files in the movie set, at least after its first encounter with them. Too bad ignoring those files still results in slower performance than disabling the caching scheme completely.

TR DriveBench 1.0 — Disk-intensive multitasking

TR DriveBench allows us to record the individual IO requests associated with a Windows session and then play those results back as fast as possible on different drives. We’ve used this app to create a set of multitasking workloads that combine common desktop tasks with disk-intensive background operations like compiling code, copying files, downloading via BitTorrent, transcoding video, and scanning for viruses. The individual workloads are explained in more detail here.

Below, you’ll find an overall average followed by scores for each of our individual workloads. The overall score is an average of the mean performance score for each multitasking workload.

DriveBench looks like a complete disaster for RAPID mode. Perhaps we can find a silver lining among the individual test results.

Nope. The RAPID config is several times slower than the standard 840 EVO regardless of the workload. I don’t quite know what to make of the results, but it’s worth reiterating that DriveBench crunches I/O at warp speed, without any of the idle time in the original traces. RAPID mode appears to be choking on the resulting torrent of I/O.

TR DriveBench 2.0 — More disk-intensive multitasking

As much as we like DriveBench 1.0’s individual workloads, the traces cover only slices of disk activity. Because we fire the recorded I/Os at the disks as fast as possible, the solid-state drives also have no downtime during which to engage background garbage collection or other optimization algorithms. DriveBench 2.0 addresses both of those issues with a much larger trace that spans two weeks of typical desktop activity peppered with multitasking loads similar to those in DriveBench 1.0. We’ve also adjusted our testing methods to give solid-state drives enough idle time to tidy up after themselves. More details on DriveBench 2.0 are available right here.

Instead of looking at a raw IOps rate, we’re going to switch gears and explore service times—the amount of time it takes drives to complete an I/O request. We’ll start with an overall mean service time before slicing and dicing the results.

Our second-generation disk trace shows RAPID mode in a better light. Caching requests in DRAM cuts the EVO’s mean service time almost exactly in half, propelling the drive to the top of the standings.

We can sort DriveBench 2.0 service times into reads and writes to learn more about RAPID mode.

Samsung claims RAPID mode is designed primarily to speed read performance. In this test, however, the DRAM cache has a much more profound impact on writes. RAPID mode lowers the EVO’s mean read service time by a modest margin, but it cuts the write service time by nearly two orders of magnitude. No other SSD even comes close to the RAPID config’s write performance in DriveBench 2.0.

There are millions of I/O requests in this trace, so we can’t easily graph service times to look at the variance. However, our analysis tools do report the standard deviation, which can give us a sense of how much service times vary from the mean.

RAPID mode substantially reduces the variance in the 840 EVO’s write service times. It doesn’t move the needle much on the read front, though.

We can’t easily graph all the service times recorded by DriveBench 2.0, but we can sort them. The graphs below plot the percentage of service times that fall below various thresholds.

The RAPID cache handles pretty much all of DriveBench 2.0’s write requests in under 0.1 milliseconds. The exact percentage is 99.99%.

The read distribution is admittedly less exciting. Enabling RAPID mode only delivers a slight increase in the percentage of service times under each threshold.

What about RAPID mode’s impact on extremely long service times over 100 milliseconds?

The caching scheme has no effect on the number of extremely long read service times. However, it nearly doubles the number of write requests that take longer than 100 ms to execute. Fortunately, the percentage is still very low overall.

IOMeter

Our IOMeter workloads feature a ramping number of concurrent I/O requests. Most desktop systems will only have a few requests in flight at any given time (87% of DriveBench 2.0 requests have a queue depth of four or less). We’ve extended our scaling up to 32 concurrent requests to reach the depth of the Native Command Queuing pipeline associated with the Serial ATA specification. Ramping up the number of requests also gives us a sense of how the drives might perform in more demanding enterprise environments.

The web server test consists entirely of read requests, and the RAPID cache is of little assistance. Let’s see what happens in our remaining IOMeter tests, which mix reads and writes.

Ignore the outlier in the file server test for a moment. The remaining results show the RAPID config slightly ahead of the vanilla 840 EVO. The benefits of the caching scheme seem to increase slightly as the load scales up to 16 simultaneous requests, after which the gap between the two setups shrinks.

The spike at the start of the file server test is something we’ve observed before. We’ve seen numerous SSDs exhibit similar spikes early in IOMeter, which is why we run our test five times and toss out the first two sets of results. The spike associated with RAPID mode persisted through all five runs, albeit to varying degrees.

Conclusions

We can summarize storage performance with an overall score derived from a subset of our benchmark results. Without RAPID mode, the Samsung 840 EVO sits high in the standings. Enabling the DRAM cache drops the drive way down the list, though.

Of course, as we’ve seen, the overall score doesn’t tell the whole story. RAPID mode improves performance in some of our tests—in several instances by massive margins. There’s no getting around the fact that a DRAM cache can be a lot faster than a flash-based SSD, especially for writes.

But the RAPID config also proves slower in some tests, substantially so in the case of DriveBench 1.0. The caching system clearly doesn’t accelerate all workloads. There’s some danger associated with caching writes in volatile DRAM, too.

Even if we ignore the pitfalls, I can’t help but wonder whether RAPID mode’s performance benefits would be perceptible to typical users. We didn’t see an improvement in load times, and our test system didn’t feel any snappier with the RAPID cache enabled. SSDs already have near-instantaneous access times measured in fractions of a millisecond. Lowering those access times even further may have diminishing benefits for desktop workloads.

Although I wouldn’t recommend that folks enable RAPID mode as it exists right now, I am encouraged to see Samsung exploring new ways to speed up its SSDs. RAPID mode definitely has the potential to become more appealing as it matures, and it won’t be restricted to the 840 EVO for long.

Update — We have a full suite of performance results for the latest version of RAPID in our 850 Pro review.

Comments closed
    • Wirko
    • 6 years ago

    Regarding the TR DriveBench results: the data consists of a couple of very large delays plus millions of extremely small ones. Such a dataset is extremely skewed and it makes little sense to calculate the standard deviation. What is really meaningful here is the graph of 100+ ms times.

    Also, with the data you have, it would be possible to graph the (cumulative) distibution of very large delays (>1 ms or >10 ms). That would be a greatly magnified upper right corner of the time distribution graphs. Sure it would be fancy looking, too.

    • Parallax
    • 6 years ago

    Geoff (or someone at TR): There’s a small problem with your graphs on page 7 (the IOMeter results). The first and third graphs show up noticeably blurry in my browser. Looks like the images are 615px wide, while the HTML states they’re 614px, thus enabling image scaling. Definitely not a severe problem, but thought you’d want to know.

    • cjava2
    • 6 years ago

    I don’t think that this article describes just *how good* the Samsung 840 EVO really is like this video does:

    [url<]https://www.youtube.com/watch?v=-y3XuhMJQ28[/url<]

      • kravo
      • 6 years ago

      I actually watched it. It was painful.

      But hey, I do have a 840 EVO, and it works great.
      ***blinks an eye and smiles showing a gazillion superwhite teeth***

    • Bensam123
    • 6 years ago

    You guys really should try out Fancycache as a alternative… I’d highly recommend it. I know you guys don’t normally do software reviews, but it’s almost right on top of this solution and you don’t need to buy a SSD to use it. There is no limit to memory it uses either.

    As far as sequentials go, it’s also faster… About 3GB/s.

      • TO11MTM
      • 6 years ago

      Indeed.

      Heck, if you’re cheap, run Dataram’s free version of RamDisk and set it as a readyboost. Only problem is it’s volatile so you’ll lose the cache on shutdown, unless you can be patient enough to let it save/restore state.

      But if you’ve got that much ram to spare, your disk is probably already considered ‘too fast’ for readyboost and then we are back to fancycache,

    • drfish
    • 6 years ago

    Hmm… I hear that different RAM disk software implementations can have a pretty huge impact on the speed of the RAM disk or in this case the cache… I would be very interested to read a TR article pitting a number of the RAM disk solutions against each other…

    • iatacs19
    • 6 years ago

    I guess the next step is overclocking the controller, bus speed and power/TDP like intel is going to do with their SSDs. Or we could wait for SATA over PCIe.

    • HisDivineOrder
    • 6 years ago

    This is a great system optimized entirely and solely for benchmarks. Great. Too bad they didn’t use this money and dedicate it toward actual performance improvements.

    • phez
    • 6 years ago

    [quote<]Interestingly, the first run in the read test registered only 536 MB/s. RAPID mode only kicked reads into high gear for subsequent runs, which all clocked in around 1150 MB/s. Writes were through the roof right off the bat, though.[/quote<] should emphasize this. with bold letters. bold, possibly large letters.

      • albundy
      • 6 years ago

      should also emphasize this:

      We repeated the tests five times with the RAPID config, providing ample opportunity for the software to pick up on the repetitive access pattern. The later runs weren’t consistently faster than the earlier ones, though.

    • CheetoPet
    • 6 years ago

    Good idea but windows already has its own disk cache. Mebbe the added complexity of the extra IO code path is killing the real world performance tests?

      • Aliasundercover
      • 6 years ago

      Adding another layer of RAM cache is a bit like another round of compression.

    • Meadows
    • 6 years ago

    Ah, another setting meant for benchmarks. I believe people should just stick to SuperFetch and that’s it. No chance of data loss either, as it only ever reads from drives.

      • weapau
      • 6 years ago

      I’m not sure this is such a terrible idea. At work we use several systems with ZFS. The agressive disk caching in ram under linux combined with a ZFS intent log running on a fast SSD has dramatically improved our user experience when heavily interacting with our storage array.

      Edit: Worse run-on sentence than currently displayed.

        • Meadows
        • 6 years ago

        This product is not made for professional users. (Even if it has theoretical advantages for them.)

      • Firestarter
      • 6 years ago

      isn’t superfetch automatically disabled on SSDs?

        • Meadows
        • 6 years ago

        Not to my knowledge, but Samsung’s Magician software can disable it with an “optimization” switch. Exactly why this is optimal for anything is anyone’s guess.

          • Ryu Connor
          • 6 years ago

          Windows Vista and 8 leave SuperFetch on with an SSD.

          [url=http://blogs.msdn.com/b/e7/archive/2009/05/05/support-and-q-a-for-solid-state-drives-and.aspx<]Windows 7 will cut it off[/url<].

            • Meadows
            • 6 years ago

            In that case, I hope it’s intelligently selective, because the majority of SSD-equipped desktop PCs are some sort of hybrid with a secondary HDD somewhere. Then again, it’s not a problem with W8, which is ironic, because W8 is a problem in and of itself in its current state.

            • Ryu Connor
            • 6 years ago

            It’s the UI of Windows 8 that is largely the area of consternation. By comparison the underlying guts of the OS are a step forward versus 7.

            For those that can’t stand the UI changes, 8 plus one of the 3rd party start menus is a worthwhile direction.

    • weapau
    • 6 years ago

    I appears to me that the drive bench, in an effort to reduce benchmark run time, runs several weeks worth of drive activity as fast as possible.

    It seems to me there might be some disconnect between this type of benchmark and real-world use for the following reasons:

    RAPID utilizes compression in the write cache, requiring some additional CPU cycles. Is it possible the nature of the test might stress the CPU in ways that will not be encountered in normal use?

    RAPID also takes advantage of the “bursty” nature of client writes, i.e. it is thought the data will have had time to be flushed to the drive DRAM cache, then to the SLC write cache, then to the TLC NAND by the time another write comes through. Is it possible this benchmark does not correctly simulate that behavior?

    Just some thoughts, thanks for considering!

      • UberGerbil
      • 6 years ago

      Yeah, my first thought was to look for CPU utilization graph(s) to go along with those. It’s quite possible that something in the caching layer can’t keep up (there’s likely some serialized operations that end up bottlenecking on a single core).

      • Alereon
      • 6 years ago

      Yeah not to call anyone out but this looks like a benchmark issue more than a reflection of actual drive behavior. Anandtech found in their testing that the Samsung 840 Evo is fast enough to overflow a 32-bit integer used to store benchmark results, resulting in wildly low scores. I’m not saying that this is definitely what is happening here, but you need to be VERY careful when you benchmark the 840 Evo to make sure that the results you are getting make sense and are internally consistent. For example, the mean response time results for the 840 Evo tell a COMPLETELY different story than the throughput graphs, which seems to confirm at least one set of results is wrong (for the other drives it roughly corresponds to their position in the throughput benchmark, and I don’t think the fast responses from the DRAM cache could fully explain this).

    • Firestarter
    • 6 years ago

    Well for the negligible benefit, I really don’t like the idea of adding another layer of volatile caching. [i<]Especially[/i<] since the 840 EVO already employs a non-volatile caching scheme.

    • 5150
    • 6 years ago

    Wonderful article. Thank you!

    /startsslowclap

    Edit: I guess no one likes showing some appreciation for topic that was on my mind.

      • Kevsteele
      • 6 years ago

      That’s the problem with slow claps – it can be both a profound appreciation of the subject, as well as a derisive mocking of the subject.

Pin It on Pinterest

Share This