Personal computing discussed

Moderators: renee, morphine, Steel

 
JohnC
Gerbil Jedi
Topic Author
Posts: 1924
Joined: Fri Jan 28, 2011 2:08 pm
Location: NY/NJ/FL

Crucial M4 SSD failure

Fri Oct 04, 2013 11:23 am

Well, after a few years of usage my 512GB SSD drive finally decided to give up. Unfortunately it did it without any obvious warning - I only noticed it when I recently attempted to perform a full backup and any utility I have tried, including the built-in Windows backup tool, has failed to do that, producing random "The IO operation at logical block address e953b98 for Disk 0 was retried" error messages in an Event Log (and before you say anything - yes, I did try to connect this SSD to different controllers in different PCs, that didn't change anything and my other drives still work perfectly on same controller). Fortunately I was still able to manually copy most of the data (few files actually got corrupted but they were not very important) and I also had old images of this drive on my external enclosure. Interestingly enough the CheckDisk (chkdsk) does not find any errors regardless of switches being used with it and the "S.M.A.R.T" (more like "STUPID") thingie still shows the drive as "healthy" even with some suspiciously looking "Raw Values":

Image
Gifter of Nvidia Titans and countless Twitch donation extraordinaire, nothing makes me more happy in life than randomly helping random people
 
Waco
Maximum Gerbil
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: Crucial M4 SSD failure

Fri Oct 04, 2013 11:43 am

Does Crucial have a tool to read the vendor-specific fields? It'd be interesting if their tool also reported healthy...
Victory requires no explanation. Defeat allows none.
 
JohnC
Gerbil Jedi
Topic Author
Posts: 1924
Joined: Fri Jan 28, 2011 2:08 pm
Location: NY/NJ/FL

Re: Crucial M4 SSD failure

Fri Oct 04, 2013 12:15 pm

No they do not. At least not publicly available.
Gifter of Nvidia Titans and countless Twitch donation extraordinaire, nothing makes me more happy in life than randomly helping random people
 
Waco
Maximum Gerbil
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: Crucial M4 SSD failure

Fri Oct 04, 2013 3:09 pm

I wonder if they'll shed some light on those fields and what they mean if you contact their tech support. Corsair was terrible in my experiences with them but perhaps Crucial will be more forthcoming.
Victory requires no explanation. Defeat allows none.
 
JohnC
Gerbil Jedi
Topic Author
Posts: 1924
Joined: Fri Jan 28, 2011 2:08 pm
Location: NY/NJ/FL

Re: Crucial M4 SSD failure

Fri Oct 04, 2013 3:53 pm

Well, they are not defined in the official documents (this is why SMART utilities also give a generic description to them):
http://www.micron.com/~/media/Documents ... ibutes.pdf
Most likely these are related to the amount of data written to the drive.
Gifter of Nvidia Titans and countless Twitch donation extraordinaire, nothing makes me more happy in life than randomly helping random people
 
jurc11
Gerbil In Training
Posts: 4
Joined: Fri Dec 06, 2013 7:21 am

Re: Crucial M4 SSD failure

Fri Dec 06, 2013 7:31 am

The amount of data written can be guesstimated from the (AD) value, 17 full cycles. Which is really low. Less than 1% of lifetime ( see the (CA) attribute).

What I find useful about the screenshot is the (05) attribute. There were 8192 reallocated sectors, which is a nice round power of 2. Apparently there are 8192 overprovisioned blocks available for reallocation, the drive exhausted them all and now it cannot reallocate the 8193th faulty block. It still works, but cannot read/use the faulty one.

I'm no expert, but my guess would be one of the chips died catastrophically (at least a part of it) and the drive ate up the overprovisioned blocks, probably in one go.
 
Krogoth
Emperor Gerbilius I
Posts: 6049
Joined: Tue Apr 15, 2003 3:20 pm
Location: somewhere on Core Prime
Contact:

Re: Crucial M4 SSD failure

Fri Dec 06, 2013 7:52 am

Sounds like file system corruption to me.

Are you overclocking the original system by chance?
Gigabyte X670 AORUS-ELITE AX, Raphael 7950X, 2x16GiB of G.Skill TRIDENT DDR5-5600, Sapphire RX 6900XT, Seasonic GX-850 and Fractal Define 7 (W)
Ivy Bridge 3570K, 2x4GiB of G.Skill RIPSAW DDR3-1600, Gigabyte Z77X-UD3H, Corsair CX-750M V2, and PC-7B
 
SecretSquirrel
Minister of Gerbil Affairs
Posts: 2726
Joined: Tue Jan 01, 2002 7:00 pm
Location: North DFW suburb...
Contact:

Re: Crucial M4 SSD failure

Fri Dec 06, 2013 10:41 am

jurc11 wrote:
What I find useful about the screenshot is the (05) attribute. There were 8192 reallocated sectors, which is a nice round power of 2. Apparently there are 8192 overprovisioned blocks available for reallocation, the drive exhausted them all and now it cannot reallocate the 8193th faulty block. It still works, but cannot read/use the faulty one.


Or it means that the drive has failed one bad flash page (4KB) and relocated it and is reporting how many 512 byte sectors it moved when it did that.

Assuming that the drive hasn't had anything done to it, the SMART values are a little odd. If it suffered from a read failure due to a bad flash block, there should be a non-zero pending sector count as the drive should be waiting for the next write of data to the failed sectors to relocate them. Of course that assumes that it behaves, from a SMART perspective, similar to a mechanical drive. No guarantees there of course.

Patrick M
 
jihadjoe
Gerbil Elite
Posts: 835
Joined: Mon Dec 06, 2010 11:34 am

Re: Crucial M4 SSD failure

Fri Dec 06, 2013 10:48 am

SecretSquirrel wrote:
Or it means that the drive has failed one bad flash page (4KB) and relocated it and is reporting how many 512 byte sectors it moved when it did that.

Assuming that the drive hasn't had anything done to it, the SMART values are a little odd. If it suffered from a read failure due to a bad flash block, there should be a non-zero pending sector count as the drive should be waiting for the next write of data to the failed sectors to relocate them. Of course that assumes that it behaves, from a SMART perspective, similar to a mechanical drive. No guarantees there of course.

Patrick M


Actually there is a non-zero pending sector count. There are 2 sectors pending reallocation.

I agree with the hypothesis that the reallocation is failing because the drive had already exhausted its entire supply of 8192 spares.
 
anotherengineer
Gerbil Jedi
Posts: 1688
Joined: Fri Sep 25, 2009 1:53 pm
Location: Northern, ON Canada, Yes I know, Up in the sticks

Re: Crucial M4 SSD failure

Fri Dec 06, 2013 11:25 am

Wow 512GB that sucks.

Will a zero write or a format get you a working drive??? Is it under warranty still?
Life doesn't change after marriage, it changes after children!
 
jurc11
Gerbil In Training
Posts: 4
Joined: Fri Dec 06, 2013 7:21 am

Re: Crucial M4 SSD failure

Fri Dec 06, 2013 3:32 pm

SecretSquirrel wrote:
Or it means that the drive has failed one bad flash page (4KB) and relocated it and is reporting how many 512 byte sectors it moved when it did that.


Just to clarify, my assumption that all the spares were exhausted stems from the OP's description how a single logical address is appearing in the Event Log. Re-reading his post I see that he doesn't specifically say it's the same address every time.

If it's the same address, then the spares may very well be exhausted. If there would be spares available, that logical address would be reallocated to a different (working) physical address and it would no longer get reported.

Even if it's not the same address every time, if a large part of a flash chip is faulty, you'd get errors on many logical addresses, since none of them can reallocate anymore.


jihadjoe wrote:
I agree with the hypothesis that the reallocation is failing because the drive had already exhausted its entire supply of 8192 spares.


Hey, somebody agreeing with me! Wish my wife did that from time to time.

What I'm curious about is how many reallocations do I get on my 128 gig M4. I'm on AD = 180 (6% of lifetime), with zero (01) and (05). Hope it stays that way for the next 31 years that I'll need to get to 100%.
 
SecretSquirrel
Minister of Gerbil Affairs
Posts: 2726
Joined: Tue Jan 01, 2002 7:00 pm
Location: North DFW suburb...
Contact:

Re: Crucial M4 SSD failure

Fri Dec 06, 2013 6:55 pm

jihadjoe wrote:
Actually there is a non-zero pending sector count. There are 2 sectors pending reallocation.
{/quote]
I think you are off by a line on the data. There are two relocation events logged, but no sectors pending.

jihadjoe wrote:
I agree with the hypothesis that the reallocation is failing because the drive had already exhausted its entire supply of 8192 spares.

{/quote]
Micron flash uses 2048 byte page sizes, and 128K block sizes so two failed pages would equal 8192 standard 512 byte sectors. 8192 blocks would be 1GB. If that is the case, then it sounds more like and entire flash chip failed. That would be the easiest way to get 1GB of relocated sectors in only two events.

It is all kinda academic as the real question is "is the drive toast, and if not, can you trust it?" My answers are: 1) hard to say without more details and 2) nope.

--SS

Who is online

Users browsing this forum: No registered users and 1 guest
GZIP: On