[ZDNet] How SSD Power faults scramble your data

All things storage here: hard drives, DVD RW drives, little wicker baskets.

Moderators: morphine, Steel

[ZDNet] How SSD Power faults scramble your data

Postposted on Sat Mar 02, 2013 5:57 pm

Source

The good news: of 6 expected failures, only 5 were observed; and 2 of the devices behaved as expected. The bad news: 13 of the devices had poor failure behavior.

Every failed device lost some amount of data or became massively corrupted under power faults.

Bit corruption hit 3 devices; 3 had shorn writes; 8 had serializability errors; one device lost 1/3 of its data; and 1 SSD bricked. The low-end hard drive had some unserializable writes, while the high-end drive had no power fault failures.

The 2 SSDs that had no failures? Both were MLC 2012 model years with a mid-range - $1.17/GB - price.



Some interesting research here, it seems like you must have a UPS for your Rig to avoid this issue. Anyone experienced SSD issues as a result of power loss like this Study Shown? Discuss.
System: i7-3770K/PH-TC14PE/ 2X 4 G Samsung MV-3V4G3D/US/Gigabyte Z77X-UP5-TH/EVGA GTX 780/Samsung 840 Pro256GB/2X Seagate Baracuda 2TB/Seasonic Platinum 660 W /Phantom 630
BoilerGamer
Gerbil
 
Posts: 21
Joined: Wed Aug 22, 2012 9:03 pm

Re: [ZDNet] How SSD Power faults scramble your data

Postposted on Sat Mar 02, 2013 7:45 pm

It would have been nice if they actually showed what they did, what drives they used, or really anything at all.

The article basically says, We cut power to some drives and they all had problems but two, but we won't tell you which two or why they were maybe okay.

Big caps? A fluke? Did they run the test more than once? What's going on?
odizzido
Gerbil
 
Posts: 59
Joined: Fri May 06, 2005 6:10 am

Re: [ZDNet] How SSD Power faults scramble your data

Postposted on Sat Mar 02, 2013 7:59 pm

I agree, as worrying as the implications sound, I really don't understand their strange misplaced sense of "not naming names"... as though they're trying to protect manufacturers privacy or something. Why on earth would anyone withhold the details of such a test?

Maybe a better source would care to replicate their efforts in a neutral and well planned way, and publish the full results for us...
Tech Report? :-P
GrimDanfango
Gerbil
 
Posts: 71
Joined: Sun May 10, 2009 9:53 am

Re: [ZDNet] How SSD Power faults scramble your data

Postposted on Sat Mar 02, 2013 9:27 pm

Many large studies don't name names.

For example this study found that one of the major CPU vendors chips are 20x more likely to fail when overlocked and the other 4x more likely to fail when overclocked.

Whose worse AMD or Intel? They refused to name names.
"Welcome back my friends to the show that never ends. We're so glad you could attend. Come inside! Come inside!"
Ryu Connor
Global Moderator
Gold subscriber
 
 
Posts: 3528
Joined: Thu Dec 27, 2001 7:00 pm
Location: Marietta, GA

Re: [ZDNet] How SSD Power faults scramble your data

Postposted on Sat Mar 02, 2013 9:51 pm

Interesting (but not terribly surprising). Vendors tend to optimize for cost and performance, and need to hit their time-to-market window if they are to be competitive. Behavior under unusual operating conditions probably gets second (or third) priority.

I'm a big believer in UPSes (as well as ECC RAM, but that's a topic for another thread).

As far as the "not naming names" thing goes... I think that's standard practice for academic papers. Hopefully review sites will start paying attention to this issue. The review site has to be willing to risk bricking their review samples though... presumably, testing of this nature would need to be the *last* test run!
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37673
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: [ZDNet] How SSD Power faults scramble your data

Postposted on Sun Mar 03, 2013 12:59 am

The ZDNet article links to a detailed .pdf with the details for these tests (except the names/brands for SSDs and HDDs)... Those are not very useful for anyone except the SSD/HDD manufacturers. The TL;DR version of these tests is a very obvious "some SSDs and HDDs can corrupt data during sudden power losses, so use UPS units, guys!" :wink:
In my primary system with an SSD drive I haven't had any such issues during power losses, even during that nasty hurricane we had here, but it's always been connected to one of these: http://www.amazon.com/CyberPower-CP1500 ... 00429N19W/ with a software utility set to shut down my PC if the outage is longer than 5 minutes, so even if my SSD model/brand is the one that failed most of their tests - it has very little practical relevance to me (well, at least until my PSU dies suddenly... but that's less likely to happen compared to utility power loss, and I do full backups to external HDD enclosure which has its own PSU).
My subscription allows you people to exist on this site and makes me a better human being than you'll ever be
JohnC
Gerbil Jedi
Gold subscriber
 
 
Posts: 1881
Joined: Fri Jan 28, 2011 2:08 pm
Location: NY/NJ/FL

Re: [ZDNet] How SSD Power faults scramble your data

Postposted on Sun Mar 03, 2013 2:42 am

I don't feel like reading the pdf. I wonder whether they had write-cache buffer flushing (or the equivalent - that's what it's called in Windows) enabled or not. I don't know that in true sudden power loss scenarios during actual writes that it would matter, maybe someone else can chime in. I always disable it though because I run a UPS and want MOAR PERFORMENCE

MS 'more information' help article:

Disk write-caching is a performance improvement feature, which is available on most disk drives. It allows applications to run faster by allowing them to proceed without waiting for data write-requests to be written to the disk. You can enable or disable this feature through the Disk Management snap-in.

Because write-caching does not actually write data to the hard disk drive until sometime after sending a "write done" message to the system, a power failure, or other ill-timed or inadvertent system shutdown may result in data loss. Use this setting if the possibility of data loss is an acceptable risk compared to the increased performance associated with writing to the cache and then to the hard disk instead of directly to the hard disk.

Information about Disk Management and write-caching is available on the web. Please see the following page in the Windows Server Technical Library:

http://go.microsoft.com/fwlink/p/?LinkId=238144


Microsoft has moved information about this technology to the TechNet website so that up-to-date information is available to you.
MadManOriginal
Graphmaster Gerbil
 
Posts: 1413
Joined: Wed Jan 30, 2002 7:00 pm
Location: In my head...

Re: [ZDNet] How SSD Power faults scramble your data

Postposted on Sun Mar 03, 2013 8:45 am

JohnC wrote:... so even if my SSD model/brand is the one that failed most of their tests - it has very little practical relevance to me (well, at least until my PSU dies suddenly...

Or the UPS battery starts to go south, and the UPS's battery monitoring fails to detect that fact. I discovered this the hard way last time we had a breaker trip at work. UPS thought I had ~30 minutes of runtime, but it cut out suddenly after only 5. Bottom line: Change your UPS battery every ~3 years, whether you think it needs it or not!
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37673
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: [ZDNet] How SSD Power faults scramble your data

Postposted on Sun Mar 03, 2013 8:57 am

I had a blackout with my Corsair SSD with no protection from a powerpole getting hit by a car nearby. Didn't lose any data.
Core i7 4770K | eVGA GTX770 SC ACX | 16GB DDR3 2133mhz | Asus Z87-PLUS | Corsair HX650 | Fractal Define R4 | Samsung 840 Pro 256GB | Windows 8 x64
yogibbear
Gerbil Elite
Gold subscriber
 
 
Posts: 668
Joined: Fri Feb 08, 2008 11:30 am

Re: [ZDNet] How SSD Power faults scramble your data

Postposted on Sun Mar 03, 2013 10:40 am

If I was a cynic I'd say this might really be an advert for UPSs - possibly straight out of the hands of a PR rep, considering how little data they provide about their 'study'.
4670K@4.5GHz | Asus Z87-A | G.Skill 8GB 2400MHz CL10 | R9 290 4GB | Samsung 840 120GB |Thermalright Macho | Lancool PC-K59
puppetworx
Gerbil XP
Silver subscriber
 
 
Posts: 488
Joined: Tue Dec 02, 2008 5:16 am

Re: [ZDNet] How SSD Power faults scramble your data

Postposted on Mon Mar 04, 2013 2:11 pm

Ehh...this is why some of the pricier SSDs, including the Intel 320 series, have power failure capacitor(s) that store enough energy to permit the drive a graceful management of its data cache and most recent operating instructions before going dark.
He who laughs last, laughs first next time.
ludi
Gerbil Elder
 
Posts: 5431
Joined: Fri Jun 21, 2002 10:47 pm
Location: Sunny Colorado front range

Re: [ZDNet] How SSD Power faults scramble your data

Postposted on Mon Mar 04, 2013 4:04 pm

ludi wrote:Ehh...this is why some of the pricier SSDs, including the Intel 320 series, have power failure capacitor(s) that store enough energy to permit the drive a graceful management of its data cache and most recent operating instructions before going dark.

It's not like this even needs to increase the cost by much. 1 Farad supercaps are less than $2 in bulk. Charge one from the 5V rail and you've got enough stored energy to operate a SSD for several seconds -- which should be long enough to flush any internal caches and do a clean shutdown of the drive's firmware.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37673
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: [ZDNet] How SSD Power faults scramble your data

Postposted on Wed Mar 06, 2013 3:39 pm

It would cost a lot of money. Not because adding an array of capacitors bumps the bill of materials (BoM) to any significant degree.

Instead, it costs a lot of money if the much more expensive enterprise products are being replaced by cheaper consumer-level products with much lower pricetags. One could say that it is in the interest of the manufacturers to not make their consumer-level products too reliable. Otherwise, all computers would also be supplied with ECC memory which also adds only marginally to cost while vastly increasing reliability. Enterprise customers need reliable computing and are ready to pay for it. But if they pay the same price for consumer SSDs; this means a large market will get wasted.

Thus, the result is to politically separate certain features to more expensive products meant for enterprise and nearline customers, while consumers get fast but not-too-reliable products that their enterprise users will not dare to use.

However, aside from the already very dated Intel 320, the new Crucial M500 will have a special place in history, thanks to its many protections including power-safe capacitors, while being targeted as consumer product.
sub.mesa
Gerbil
 
Posts: 14
Joined: Sat Apr 11, 2009 3:35 pm

Re: [ZDNet] How SSD Power faults scramble your data

Postposted on Fri Mar 08, 2013 12:32 pm

sub.mesa wrote:It would cost a lot of money. Not because adding an array of capacitors bumps the bill of materials (BoM) to any significant degree.

Product segmentation is an explanation, but not the only one. Typically the BOM for any consumer device will be shaved down to the last marginal penny, not because an extra $2 makes a big difference on a per-device basis, but because somebody is looking at what it will cost to produce them in 10k or 100k lots, versus the total expected revenue of selling 10k or 100k units after accounting for both fixed and variable costs. Variable costs will include DOA returns, long-term warranty support, sales and promotions, competition, unexpected market variations, etc., any of which could prove fatal to the product's margins or marketable life.

Power failure capacitors are a great idea, but most of the time the customer can't actually see them. However, they will see that your competitor's product is cheaper, and purchase accordingly.
He who laughs last, laughs first next time.
ludi
Gerbil Elder
 
Posts: 5431
Joined: Fri Jun 21, 2002 10:47 pm
Location: Sunny Colorado front range

Re: [ZDNet] How SSD Power faults scramble your data

Postposted on Fri Mar 08, 2013 12:45 pm

Or they'll just buy a UPS, which is a good idea anyway, and have a backup for their whole system that will last years.
MadManOriginal
Graphmaster Gerbil
 
Posts: 1413
Joined: Wed Jan 30, 2002 7:00 pm
Location: In my head...

incoming rant on reliability

Postposted on Fri Mar 08, 2013 2:46 pm

sub.mesa wrote:It would cost a lot of money. Not because adding an array of capacitors bumps the bill of materials (BoM) to any significant degree.

Instead, it costs a lot of money if the much more expensive enterprise products are being replaced by cheaper consumer-level products with much lower pricetags. One could say that it is in the interest of the manufacturers to not make their consumer-level products too reliable. Otherwise, all computers would also be supplied with ECC memory which also adds only marginally to cost while vastly increasing reliability. Enterprise customers need reliable computing and are ready to pay for it. But if they pay the same price for consumer SSDs; this means a large market will get wasted.

Thus, the result is to politically separate certain features to more expensive products meant for enterprise and nearline customers, while consumers get fast but not-too-reliable products that their enterprise users will not dare to use.

However, aside from the already very dated Intel 320, the new Crucial M500 will have a special place in history, thanks to its many protections including power-safe capacitors, while being targeted as consumer product.


I can't stand big government stuff, but I would love to see a 20% "deliberately defective product" tax on non-ecc ram, set aside the funds to cover data recovery.
A lot of the blame belongs to intel too, so I'd also put this on non-ecc capable chipsets/cpus. (mem controller has moved over the years)
Volatile ram is not stable, its well proven.

The suppliers would quickly do some math and realize including 12.5% more chips would mean they could avoid the tax and still be cheaper on the market.

Other industries have had to toe the line (electrical/fire/safety codes etc, no you can't run 20A over that string) but computers have been blissfully avoiding it for too long, and everything is on a massive price spiral to the bottom anyways. There is a huge pressure in all industries to use commodity/consumer (COTS) stuff in place of actually reliable systems. FFS the latest russian probe probably died to this kind of mentality. (lets send this to space, who needs rad hardening? two of everything should cover it)

Back to the OP, regular hard drives suck too, randomly power them off and you'll get occasional data corruption. If your OS/file system don't assume this at every possible layer then you get bit rot. Enjoy!
(sadly most don't, hence things like ZFS being born)
blah blah blah signature blah blah blah
Bauxite
Gerbil Elite
 
Posts: 609
Joined: Sat Jan 28, 2006 12:10 pm
Location: electrolytic redox smelting plant

Re: incoming rant on reliability

Postposted on Fri Mar 08, 2013 6:04 pm

Bauxite wrote:I can't stand big government stuff, but I would love to see a 20% "deliberately defective product" tax on non-ecc ram, set aside the funds to cover data recovery.
A lot of the blame belongs to intel too, so I'd also put this on non-ecc capable chipsets/cpus. (mem controller has moved over the years)
Volatile ram is not stable, its well proven.

That's a really horrible idea. You're effectively making everyone pay the data recovery costs for people who had critical data but were too stupid to back it up or pay for a system that had the kind of reliability they needed.

Educating users is the answer, not taxing everyone to cover ignorant users' asses.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37673
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: incoming rant on reliability

Postposted on Fri Mar 08, 2013 6:06 pm

just brew it! wrote:
Bauxite wrote:I can't stand big government stuff, but I would love to see a 20% "deliberately defective product" tax on non-ecc ram, set aside the funds to cover data recovery.
A lot of the blame belongs to intel too, so I'd also put this on non-ecc capable chipsets/cpus. (mem controller has moved over the years)
Volatile ram is not stable, its well proven.
That's a really horrible idea. You're effectively making everyone pay the data recovery costs for people who had critical data but were too stupid to back it up or pay for a system that had the kind of reliability they needed.

Educating users is the answer, not taxing everyone to cover ignorant users' asses.

Agreed. Given my usage pattern ECC RAM would be a waste of money. Besides, everything I care about is also multiply backed-up.
It is one of the blessings of old friends that you can afford to be stupid with them. Ralph Waldo Emerson.
Captain Ned
Global Moderator
Gold subscriber
 
 
Posts: 20215
Joined: Wed Jan 16, 2002 7:00 pm
Location: Vermont, USA


Return to Storage

Who is online

Users browsing this forum: No registered users and 3 guests