Is RAID 5/6 dead due to large drive capacities?

All things storage here: hard drives, DVD RW drives, little wicker baskets.

Moderators: morphine, Steel

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 2:18 am

Definitely not dead-- but the question is, how many people need more capacity than 4TB or so?

Among those who need more than that, it's very much alive. Data use has gone up substantially in many areas, and RAID6 will almost always have its place in those installations. Among many of our customers, they use all the space they can reasonably afford. Space gets 10x cheaper? Great, then they can increase their data resolution by 10x! 20x? Then 20x resolution increase! etc...
continuum
Gerbil First Class
Gold subscriber
 
 
Posts: 168
Joined: Mon Jun 09, 2003 1:42 am
Location: California

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 9:45 am

Ryu Connor wrote:Yeah, and I noted all those factors.

I did say storage, performance, price, and fault tolerance are all choices to be made between the RAID levels. You are picking that number based on your requirements.

When addressing the OP question, these factors mean that growing storage has not killed off RAID 5 or 6.


Yeah, but that's kind of not really addressing the question, I think. Here's where I'm coming from:

http://queue.acm.org/detail.cfm?id=1670144
http://www.zdnet.com/blog/storage/why-r ... n-2009/162
http://www.zdnet.com/blog/storage/why-r ... n-2019/805
http://www.smbitjournal.com/2012/05/whe ... -reliable/
http://www.raid-failure.com/raid5-failure.aspx

Now you can do some googlin' on this and find a bunch of back and forth with arguments about math and what kind of defect rate is REALLY accurate for drives, etc. And apparently the rebuild process is also very stressful and will expose any drive flaws and crap out any drives that were marginal, etc.

I'm no storage expert; I don't have any buddies who write RAID card firmware at big companies; I don't know people who manage large arrays. I'm trying to learn what I can from online sources but, well, you see what I've run into. People with math predicting bad stuff and typically I'm a fan of quantifiable metrics.

That's why I'm looking for feedback from people in the field, so to speak, to see if they are running TB level RAID 5/6 and seeing these disasters as were predicted. All my experience with RAID 5/6 is with 146 GB or 300 GB SAS drives which are below the level of OMG FAILURE that these doomsayers indicate will bring pain.

The contention is that the larger drive capacities have explicitly made RAID 5 worthless. Fault tolerance is worthless if a rebuild will automatically cause the entire array to fail.
Scrotos
Graphmaster Gerbil
 
Posts: 1043
Joined: Tue Oct 02, 2007 12:57 pm
Location: Denver, CO.

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 10:01 am

Ryu Connor wrote:
Waco wrote:
Ryu Connor wrote:A bigger drive doesn't change that RAID1 doesn't have performance and it has terrible storage for the price.

What does RAID 1 not have in performance? Any proper software or hardware controller will read from all the drives optimistically and write speeds are the same as a single drive.


RAID1 isn't striping (you dont set a block size) it can't give you amazing reads. Even when there is a boost it is a far cry from the double performance reads like actual striping.

http://techreport.com/review/2525/real- ... explored/3
http://patrick.wagstrom.net/weblog/2011 ... windows-7/
http://www.maximumpc.com/article/raid_done_right

RAID1 performance is either no better or only slightly better than single drive in the examples.

If all I want is fault tolerance this is fine. If I want performance and storage for the price it isn't.

RAID 1 is RAID 0 with an infinitely small to [size of the drive] large stripe size. Sure, Windows 7 software RAID only reads from one drive, but last I checked, almost everything else will read from all drives in a RAID 1 array simultaneously.

Since all drives are identical you can read any stripe you wish from each disk.
Z68XP-UD4 | 2700K @ 4.7 GHz | 16 GB | GTX 780 SLI | PCP&C Silencer 950 | XSPC RX360 | Heatkiller R3 | D5 + RP-452X2 | HAF 932 | 480 GB Extreme Pro
Waco
Gerbil Elite
 
Posts: 818
Joined: Tue Jan 20, 2009 4:14 pm

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 10:48 am

Scrotos wrote:That's why I'm looking for feedback from people in the field, so to speak, to see if they are running TB level RAID 5/6 and seeing these disasters as were predicted. All my experience with RAID 5/6 is with 146 GB or 300 GB SAS drives which are below the level of OMG FAILURE that these doomsayers indicate will bring pain.

OK, a couple of anecdotal data points from the field:

I have a RAID-5 array on my home server. Three 500 GB drives (1 TB array capacity). Array was set up in August of 2009, and has been trouble-free.

At work we have a RAID-5 array that was also set up in 2009, 2 TB array capacity. Also trouble-free for over 3 years.

Now, maybe we've just been lucky; or maybe I've just jinxed things. And I'm staying away from RAID-5 in the future, because of the potential "write hole" issue. But I think the predictions of doom and gloom are overblown; I will not hesitate to use RAID-6 in the future, when the situation calls for it.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 38123
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 11:44 am

I'm somewhat repeating the thread, and for that I apologize, but I think I address the OP pretty nicely.

I have two main storage systems, my media server and my WD Lifebook Duo thing.

My media server runs a flexible array type that's somewhat akin to JBOD. Pretty much any drive fail also equals data loss, but the data on that server is highly regenerable. I'd just need to reshuffle my Netflix queue for a while, etc, etc.

My deskside backup is RAID1, in a cheap little NAS box. Things that aren't recoverable go there, and are backed up to a second source (BD-R) periodically. Things there are family photos, tax records, etc, etc.

Having a very clear picture of what data you have and what that is worth is the first step in choosing a RAID level, and further in choosing a backup scenario.
Siglessness is boring.
Image - M4800-Eight1
Image - Vargr-Z97
Forge
Lord High Gerbil
Silver subscriber
 
 
Posts: 8061
Joined: Wed Dec 26, 2001 7:00 pm
Location: SouthEast PA

with big data comes big problems

Postposted on Mon Mar 18, 2013 12:18 pm

Ryu Connor wrote:
Waco wrote:
Ryu Connor wrote:A bigger drive doesn't change that RAID1 doesn't have performance and it has terrible storage for the price.

What does RAID 1 not have in performance? Any proper software or hardware controller will read from all the drives optimistically and write speeds are the same as a single drive.


RAID1 isn't striping (you dont set a block size) it can't give you amazing reads. Even when there is a boost it is a far cry from the double performance reads like actual striping.

http://techreport.com/review/2525/real- ... explored/3
http://patrick.wagstrom.net/weblog/2011 ... windows-7/
http://www.maximumpc.com/article/raid_done_right

RAID1 performance is either no better or only slightly better than single drive in the examples.

If all I want is fault tolerance this is fine. If I want performance and storage for the price it isn't.


YMMV with windows, fakeraid, various hardware etc:

In ZFS and other schema that are designed correctly a mirror of size N can work like a RAID0 of N drives for reads, and slightly slower than a RAID0 of N/2 drives for writes.
The RAID5/6 type equivalents can read like a N-1/2 RAID0 though they write like a slightly higher latency single drive per array, recommended only if capacity trumps all.

I run a 12 drive SAN w/ RAID10 ZFS equivalent for this reason, it can keep quite a few gigabit NICs screaming and got all the drives at ~$30/TB, so $60/TB of usable, fast yet resilient* storage was plenty cheap. (ok, I also cheat with L2 cache ssds etc but I digress)

*BTW, does everyone have a SHA256 hash tree for all your files and metadata and check it at least once a week and every read/write/modify? yeah, thought so.
NTFS is defective by design for anyone entrusting lots of data to windows, don't do it.
Also some hardware card ROMs have hidden defects found out later by unhappy victims, caveat emptor.
blah blah blah signature blah blah blah
Bauxite
Gerbil Elite
 
Posts: 619
Joined: Sat Jan 28, 2006 12:10 pm
Location: electrolytic redox smelting plant

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 12:21 pm

Waco wrote:
Ryu Connor wrote:
Waco wrote:What does RAID 1 not have in performance? Any proer software or hardware controller will read from all the drives optimistically and write speeds are the same as a single drive.


RAID1 isn't striping (you dont set a block size) it can't give you amazing reads. Even when there is a boost it is a far cry from the double performance reads like actual striping.

http://techreport.com/review/2525/real- ... explored/3
http://patrick.wagstrom.net/weblog/2011 ... windows-7/
http://www.maximumpc.com/article/raid_done_right

RAID1 performance is either no better or only slightly better than single drive in the examples.

If all I want is fault tolerance this is fine. If I want performance and storage for the price it isn't.

RAID 1 is RAID 0 with an infinitely small to [size of the drive] large stripe size. Sure, Windows 7 software RAID only reads from one drive, but last I checked, almost everything else will read from all drives in a RAID 1 array simultaneously.

Since all drives are identical you can read any stripe you wish from each disk.

Right. We're talking hardware RAID controllers here (at least I was). Even some software RAID controllers can do interleaved access on RAID 1, but your results will vary.

Regardless, many RAID controllers can do interleaved access across RAID 1 sets.
Buub
Maximum Gerbil
Silver subscriber
 
 
Posts: 4214
Joined: Sat Nov 09, 2002 11:59 pm
Location: Seattle, WA

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 1:28 pm

People keep saying a hardware RAID controller makes the difference for RAID1, but I don't see it.

Areca ARC-1220 8-Port PCIe RAID6-Controller

Areca ARC-1220 8-Port PCIe RAID6-Controller

The performance for RAID1 on that card is still basically a single drive. That is a real RAID controller.
"Welcome back my friends to the show that never ends. We're so glad you could attend. Come inside! Come inside!"
Ryu Connor
Global Moderator
Gold subscriber
 
 
Posts: 3598
Joined: Thu Dec 27, 2001 7:00 pm
Location: Marietta, GA

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 2:33 pm

Ryu Connor wrote:People keep saying a hardware RAID controller makes the difference for RAID1, but I don't see it.

Areca ARC-1220 8-Port PCIe RAID6-Controller

Areca ARC-1220 8-Port PCIe RAID6-Controller

The performance for RAID1 on that card is still basically a single drive. That is a real RAID controller.


You're quoting 7 year old reviews?

Linux/UNIX do read balancing even with the most basic software RAID. I would argue that you shouldn't use any hardware specific RAID, ever, with the exception of using a good RAID controller to have a battery-backed write cache and good IOPs (basically use it as a dumb controller with a cache).


If you're running Windows on your fileserver, then yes, RAID 1 can be as slow as a single drive.
Z68XP-UD4 | 2700K @ 4.7 GHz | 16 GB | GTX 780 SLI | PCP&C Silencer 950 | XSPC RX360 | Heatkiller R3 | D5 + RP-452X2 | HAF 932 | 480 GB Extreme Pro
Waco
Gerbil Elite
 
Posts: 818
Joined: Tue Jan 20, 2009 4:14 pm

Re: with big data comes big problems

Postposted on Mon Mar 18, 2013 2:58 pm

Bauxite wrote:In ZFS and other schema that are designed correctly a mirror of size N can work like a RAID0 of N drives for reads, and slightly slower than a RAID0 of N/2 drives for writes.

Yeah, but ZFS has... what? About 0.01% of the file system market share? It's not exactly common or widely supported.

Waco wrote:Linux/UNIX do read balancing even with the most basic software RAID.

Well I just did a quick disk throughput check of the Linux RAID-1 array on my home desktop, and the read and write speeds were nearly identical (100 MB/sec for writes, 101 MB/sec for reads). So at least in my case, I'm not seeing the effects of this read balancing.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 38123
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 3:42 pm

Waco wrote:You're quoting 7 year old reviews?


Finding modern RAID1 reviews is no peach.

Regardless RAID1 hasn't exactly changed in the last seven years. If a real RAID controller didn't do it then, I'm not sure why it would suddenly do it now.

My own personal anecdotal situation with the technology over the years says differently too (never seen a substantial boost). I've never implemented RAID1 with any expectation for performance.
"Welcome back my friends to the show that never ends. We're so glad you could attend. Come inside! Come inside!"
Ryu Connor
Global Moderator
Gold subscriber
 
 
Posts: 3598
Joined: Thu Dec 27, 2001 7:00 pm
Location: Marietta, GA

Re: with big data comes big problems

Postposted on Mon Mar 18, 2013 3:45 pm

just brew it! wrote:
Waco wrote:Linux/UNIX do read balancing even with the most basic software RAID.

Well I just did a quick disk throughput check of the Linux RAID-1 array on my home desktop, and the read and write speeds were nearly identical (100 MB/sec for writes, 101 MB/sec for reads). So at least in my case, I'm not seeing the effects of this read balancing.

That's odd. I was under the impression that mdadm automatically enabled read-balancing on RAID 1 arrays.

Time to dig.


EDIT: It performs read-balancing on multiple accesses. A single application probably won't see any performance benefit, but doing two dd's at once should be substantially faster. At least, that's how I understand it.
Z68XP-UD4 | 2700K @ 4.7 GHz | 16 GB | GTX 780 SLI | PCP&C Silencer 950 | XSPC RX360 | Heatkiller R3 | D5 + RP-452X2 | HAF 932 | 480 GB Extreme Pro
Waco
Gerbil Elite
 
Posts: 818
Joined: Tue Jan 20, 2009 4:14 pm

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 3:56 pm

Scrotos wrote:The contention is that the larger drive capacities have explicitly made RAID 5 worthless. Fault tolerance is worthless if a rebuild will automatically cause the entire array to fail.


This still comes from an argument that no one apparently needs more space, which isn't true.
Still seems to apply superstition that rebuilds kill drives.
Ignores that during the recreation of the mirror after a failure that the "good" drive might die before completion causing the array to fail.

Backups, they matter, yo.
"Welcome back my friends to the show that never ends. We're so glad you could attend. Come inside! Come inside!"
Ryu Connor
Global Moderator
Gold subscriber
 
 
Posts: 3598
Joined: Thu Dec 27, 2001 7:00 pm
Location: Marietta, GA

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 5:27 pm

Ryu Connor wrote:
Scrotos wrote:The contention is that the larger drive capacities have explicitly made RAID 5 worthless. Fault tolerance is worthless if a rebuild will automatically cause the entire array to fail.


This still comes from an argument that no one apparently needs more space, which isn't true.
Still seems to apply superstition that rebuilds kill drives.
Ignores that during the recreation of the mirror after a failure that the "good" drive might die before completion causing the array to fail.

Backups, they matter, yo.


I don't understand what you're exactly saying. I would like more space. I think most of everyone in the universe wants more space. The arguments that I linked to say that because of the URE on drives, you're certain to get a failure during a rebuild and this is due primarily to each drive having larger capacity. By that token, it also seems that reading a 2 TB drive 6 times would also give you an URE, but it wouldn't matter in that instance because you're not trying to rebuild an array from data and parity, you're just losing a sector. In a RAID rebuild, that would cause the entire thing to die.

It's not that you choose RAID 5/6 to maximize space, that's not what the issue is. A 4 TB drive doesn't mean you're going to not make a five 1TB drive RAID 5 to get the space. It means if you make a five 4 TB drive RAID 5, during rebuild from a failed drive, you'll get an error that will cause the entire RAID to fail. I think maybe that's where you're getting confused. It isn't a "large drives mean you can use a single large drive instead of a RAID of smaller ones" issue, it's making that RAID out of big ones that's the problem.

I get the feeling that the math doesn't match up to reality, but I don't have access to something like that Google hard drive survey to give real-world experience. So all I got is the math to feed my fear.

The naysayers don't address mirrors, just parity. I guess mirrors are perfect in every way and don't have the whole rebuild issue. Parity is slow and causes errors and is scary.

I know backups matter. RAID is not a backup solution. I get it. I understand. Not the question/issue at all.
Scrotos
Graphmaster Gerbil
 
Posts: 1043
Joined: Tue Oct 02, 2007 12:57 pm
Location: Denver, CO.

Re: with big data comes big problems

Postposted on Mon Mar 18, 2013 7:36 pm

Waco wrote:EDIT: It performs read-balancing on multiple accesses. A single application probably won't see any performance benefit, but doing two dd's at once should be substantially faster. At least, that's how I understand it.

Ahh, that makes sense.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 38123
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 9:14 pm

URE is an estimate. I think what we are really trying to determine here is the difference between theory and practice. I think this would be like figuring out MTBF though, it's also just an estimate.

From what I know about failed RAID 5 rebuilds is that it will simply fail to rebuild, it won't break the array. What I'm kind of curious is if a URE guarantees a failed rebuild. I'm not an expert about RAID parity but is it not possible that the block that needs to be recovered could be found in parity on another drive or does it require the missing member to help with that?

Off the top of my head I think enterprise class SAS drives can be over a petabyte for URE. I know they are way over the ~13TB though that's for sure. So any examples people have would need to be of consumer grade drives.

I looked in to this briefly myself as I am building a NAS, but I'm only at 8TB in RAID 5.

Customers of mine have well over 13TB in arrays but they are nested and are using enterprise class drives.

I've had brand new drives fail in the middle of a rebuild. I've had two disks die at once on more than one occasion. I've had mass failures of a particular brand/model all occur within weeks of each other at multiple sites. So loosing another disk during a rebuild is not that uncommon in my personal experience. I'm sure other people have never seen it happen.
Tachyonic Karma: Future decisions traveling backwards in time to smite you now.
Convert
Grand Gerbil Poohbah
Gold subscriber
 
 
Posts: 3134
Joined: Fri Nov 14, 2003 6:47 am

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 9:21 pm

Convert wrote:URE is an estimate. I think what we are really trying to determine here is the difference between theory and practice. I think this would be like figuring out MTBF though, it's also just an estimate.

Agreed. I think the OP is misinterpreting a median statistical estimate as a hard-coded 100% will happen event once the stated number of reads or writes has occurred.
Life is hard; but it's harder if you're stupid. Big Al.
Captain Ned
Global Moderator
Gold subscriber
 
 
Posts: 20647
Joined: Wed Jan 16, 2002 7:00 pm
Location: Vermont, USA

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 9:59 pm

A couple of comments here.

1) I have a 16TB RAID6 array sitting upstairs. It is eight 2TB drives on an Areca RAID controller.

2) To get that kind of space out of a RAID 10 setup would require four more drives. Loosing two drives to parity is painful enough. Loosing six, even at $150 a piece is crazy. Not buying those four drives effectively paid for the RAID controller.

3) Even with gigabit to my workstation, the network is the limiting factor for writes to the array.

4) Drive failure during rebuild is a red herring. Yes, it can happen. No, it's not likely to. I spent ten years working with storage that in the end measured multiple petabytes, spread across hundreds of RAID 5 an RAID 6 arrays, perhaps 10,000 disks. We failed disks all the time -- like once or more a week, somewhere. In all that time, never a failure during rebuild. Yes, it is anticdotal.

5) You have to be careful when setting up your RAID 10 array. Get it backwards and you are just as succeptible to drive failure during rebuild as a RAID 5 array.

Does the "average" yahoo need 16TB of storage? No. Does that mean large drives have killed RAID arrays, certainly not.
SecretSquirrel
Gerbil Jedi
Gold subscriber
 
 
Posts: 1738
Joined: Tue Jan 01, 2002 7:00 pm
Location: The Colony, TX (Dallas suburb)

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 10:09 pm

Drives fail, doesn't matter if it is one drive, multiple drives, or one of the RAIDs.

Small drive vs big drive doesn't change that. Rebuilds don't change that. We also can't calculate your luck. Taking the safe route might not end like you suspect. Streaks are a fickle thing, just visit Vegas.
"Welcome back my friends to the show that never ends. We're so glad you could attend. Come inside! Come inside!"
Ryu Connor
Global Moderator
Gold subscriber
 
 
Posts: 3598
Joined: Thu Dec 27, 2001 7:00 pm
Location: Marietta, GA

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 10:13 pm

SecretSquirrel wrote:3) Even with gigabit to my workstation, the network is the limiting factor for writes to the array.

So ~110 MB/s is impressive for eight drives? :o

Also - backwards RAID 10 (01?) isn't any more susceptible to failure. You just change which drives can fail before you lose the whole array, right? A mirror of stripes can lose half of the drives and a stripe of mirrors can lose half of them as well.
Z68XP-UD4 | 2700K @ 4.7 GHz | 16 GB | GTX 780 SLI | PCP&C Silencer 950 | XSPC RX360 | Heatkiller R3 | D5 + RP-452X2 | HAF 932 | 480 GB Extreme Pro
Waco
Gerbil Elite
 
Posts: 818
Joined: Tue Jan 20, 2009 4:14 pm

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 10:33 pm

Scrotos wrote:I don't understand what you're exactly saying. I would like more space. I think most of everyone in the universe wants more space. The arguments that I linked to say that because of the URE on drives, you're certain to get a failure during a rebuild and this is due primarily to each drive having larger capacity. By that token, it also seems that reading a 2 TB drive 6 times would also give you an URE, but it wouldn't matter in that instance because you're not trying to rebuild an array from data and parity, you're just losing a sector. In a RAID rebuild, that would cause the entire thing to die.

It's not that you choose RAID 5/6 to maximize space, that's not what the issue is. A 4 TB drive doesn't mean you're going to not make a five 1TB drive RAID 5 to get the space. It means if you make a five 4 TB drive RAID 5, during rebuild from a failed drive, you'll get an error that will cause the entire RAID to fail. I think maybe that's where you're getting confused. It isn't a "large drives mean you can use a single large drive instead of a RAID of smaller ones" issue, it's making that RAID out of big ones that's the problem.

I get the feeling that the math doesn't match up to reality, but I don't have access to something like that Google hard drive survey to give real-world experience. So all I got is the math to feed my fear.


I was pretty much going to pass along the same sentiment but I had to ...well work. :) I have two issues with RAID 5. I axed the TL;DR version.
Basically it's this.

A) RAID 5's fault tolerance sucks compared to alternatives once you get beyond a certain size. If another drive dies during rebuild for any reason: URE's, a black hole appears in the server room, Mumm-Ra takes another drive back to her lair. The array is toast. The probability of the first one increases with the size of the drive. Raid 6? No problem. Raid 5? Well the data you need is on the drive that died.

B) The second reason is that the write penalty now for today's RAID 6 isn't the chasm is once was. It used to be hundreds of megs slower. Now it's about 12% off of the write performance of RAID 5 (depending on the controller...give or take). Now think about that for a second you are going to risk your data for a 12% increase in write speed?!?! Really? If every nugget of performance is that necessary put it in Raid 0,1, or 10 and call it a day.

In a nutshell RAID 5 isn't worth it in a lot of scenarios when you consider the availability of Raid 10 or Raid 6. Is there a workload that would make sense for it today? Probably if I thought long and hard about it. But databases don't need it because it's slower than what Raid 10 will give you and to my knowledge I've never encountered the silently corrupted database (MS Access is an outlier) that backs itself up every night. It either is or it isn't. If you have tons of files then you need tons of storage. If a drive dies during rebuild you'll lose tons of your stuff. Probably the only thing that I can think of are VM's that are written directly to a block device and not a file. But even that has me eyeing Raid 10 with a decent backup plan in toe.
Core i7 920 @stock - 6GB OCZ Mem - Adaptec 5805 - 2 x Intel X25-M in RAID1 - 5 x Western Digital RE4 WD1003FBYX 1TB in RAID 6 - Nvidia GTX 460
kc77
Gerbil Team Leader
 
Posts: 242
Joined: Sat Jul 02, 2005 2:25 am

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 10:38 pm

Waco wrote:Also - backwards RAID 10 (01?) isn't any more susceptible to failure. You just change which drives can fail before you lose the whole array, right? A mirror of stripes can lose half of the drives and a stripe of mirrors can lose half of them as well.


An issue of perspective I suppose.

With 0+1 losing one drive means an entire set has failed. If you replaced that drive and then the rebuild from the alternate set suffers a failure then the rebuild cannot complete. I only have two drives of protection if both drives in the same set die.

With 10 losing one drive does not mean the entire set has failed. Losing another drive from the alternate set during the rebuild also wouldn't hurt. With RAID10 I'd have to lose both drives in the exact same set for the array to collapse. I only have two drives of protection if two drives in the two different sets die.

If lightning strikes just right 10 or 0+1 are little better than RAID5. RAID6 isn't picky about which two drives decide to croak.
"Welcome back my friends to the show that never ends. We're so glad you could attend. Come inside! Come inside!"
Ryu Connor
Global Moderator
Gold subscriber
 
 
Posts: 3598
Joined: Thu Dec 27, 2001 7:00 pm
Location: Marietta, GA

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 10:58 pm

Ryu Connor wrote:
Waco wrote:Also - backwards RAID 10 (01?) isn't any more susceptible to failure. You just change which drives can fail before you lose the whole array, right? A mirror of stripes can lose half of the drives and a stripe of mirrors can lose half of them as well.


An issue of perspective I suppose.

With 0+1 losing one drive means an entire set has failed. If you replaced that drive and then the rebuild from the alternate set suffers a failure then the rebuild cannot complete. I only have two drives of protection if both drives in the same set die.

It is an issue of probabilities.

Take an 8 disk array. If you mirror first, then stripe, if you lose a disk, you can lose any of the other six drives not invovled in the failed mirror. So your odds of a drive failure during rebuild killing your array are 1 in 7. If you stripe first, and lose a drive, now you can only loose one of the three remaining drives in the stripe that has already failed. So your odds of a second drive failure killing your array are now 4 in 7. This assumes failures are independent, random events, yadda yadda...

Most decent RAID controllers enforce the mirror then stripe ordering, but there are a lot of lousy controllers floating around and if you are doing it in software, all bets are off.

BTW, a decent RAID controller will not fail a drive on an NRE. It will only fail a drive once SMART reports no more sectors for remap. In the case of an NRE on a drive, the array still has valid data (this is where RAID5 in a rebuild is at risk). SMART requires a write of fresh data in order to complete the remap. The array will read the block that contains the failed sector from the rest of the array, and write to the drive with the NRE allowing it to remap. If the remap is successfull, the array continues on. If it isn't then it fails the drive.

So, in the above case, an NRE during a RAID5 rebuild could potentially fail the array, depending on how the controller handles the situation.

--SS

--SS
SecretSquirrel
Gerbil Jedi
Gold subscriber
 
 
Posts: 1738
Joined: Tue Jan 01, 2002 7:00 pm
Location: The Colony, TX (Dallas suburb)

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 10:59 pm

Ryu Connor wrote:Drives fail, doesn't matter if it is one drive, multiple drives, or one of the RAIDs.

Small drive vs big drive doesn't change that. Rebuilds don't change that. We also can't calculate your luck. Taking the safe route might not end like you suspect. Streaks are a fickle thing, just visit Vegas.

URE's are built on the probability of failure in relation to the amount of bits read. The size matters because if we are talking about your average controller during initialization and during rebuild all of the drive is read/written to. So while the URE is generally static across an entire line or model and doesn't change. The size does. The larger the size the increased probability that a read error will occur.
Core i7 920 @stock - 6GB OCZ Mem - Adaptec 5805 - 2 x Intel X25-M in RAID1 - 5 x Western Digital RE4 WD1003FBYX 1TB in RAID 6 - Nvidia GTX 460
kc77
Gerbil Team Leader
 
Posts: 242
Joined: Sat Jul 02, 2005 2:25 am

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Mon Mar 18, 2013 11:39 pm

I'm familiar with what they are.

WD RE wrote:<10 in 10^16


Less than 10 in 10,000,000,000,000,000 (10PB).

Yeah, I'm not concerned about a 4TB rebuild with those numbers. Writing 4TB is .0004% of that capacity? Of course you wouldn't write the full capacity of the drive as that would imply the array is running out of space.

That's less than one per petabyte and it's a variable unknown. One drive might adhere to that spec and hit nine of them in 10PB or a drive might hit one or none of them in 10PB. We also have no clue if it will hit them in streaks or spaced out.

Presuming the array was going to hit the defect in a nice, consistent, every drive in sync, perfectly spaced situation, and presuming that nine of them were going to occur that means one every ~1,111,111,111,111,111.1 PB.

I could refill each of the 4TB drives of this perfectly sync'd array 277.78 times before each drive hit their first read error.

Of course none of us can really know how this infinitesimal error might hit. As SS notes it might not even tank the array, instead gracefully handled. Discussions of their rarity in real life should be had as well.

It's pedantic **** like this that makes this a bad thread and this discussion sophomoric. It loses the primary differences of the RAID into a well of numbers that impact everyone differently. The very variability of this defect makes any RAID of any size vulnerable and more drives certainly increases the odd you might find it. The Vegas odds might tell you you're safe by using smaller drives and less drives, but I got news: fate will come calling. These things are all sounding incredibly familiar. They've been oft repeated - along with other facts - making arguing this a circular affair that will lead to sniping.

This might be something we'd bother to calculate for a risk analysis, but it's not something that should even be a passing thought when putting this technology to work for the needs of the user or the business.

I think this thread has run its course and is heading for trouble.
"Welcome back my friends to the show that never ends. We're so glad you could attend. Come inside! Come inside!"
Ryu Connor
Global Moderator
Gold subscriber
 
 
Posts: 3598
Joined: Thu Dec 27, 2001 7:00 pm
Location: Marietta, GA

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Tue Mar 19, 2013 9:00 am

Convert wrote:From what I know about failed RAID 5 rebuilds is that it will simply fail to rebuild, it won't break the array. What I'm kind of curious is if a URE guarantees a failed rebuild. I'm not an expert about RAID parity but is it not possible that the block that needs to be recovered could be found in parity on another drive or does it require the missing member to help with that?

I think you're mis-understanding the failure scenario:

1. Drive fails, and gets kicked from the array.

2. Failed drive is replaced with a spare, and a rebuild started.

3. During the rebuild, one of the other drives in the array throws an URE.

At this point you're completely hosed, because you cannot recover the data which would've been reconstructed from the data in the sector that failed to read. The rebuild fails, *and* the array is now broken (because you had another failure while the array was still degraded). Game over.

RAID-5 has only one level of redundancy, it can't tolerate two (or more) failures on different drives. That's where RAID-6 comes to the rescue -- it can tolerate two failures and still successfully recover the array.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 38123
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Wed Mar 20, 2013 11:25 am

After an impassioned plea, I've decided to see if my expectation that this will spiral into trouble is wrong.

The conversation has reached the point of pedantry. I suspect this focus on the odds will obscure the primary features that make each RAID level unique and will eventually result in sniping. I'd like to be proven wrong.

The thread has had a cooling period. Before the conversation continues I want to pose a few questions to try and set the tone/direction of this thread. I don't want a reply to this, I'm looking for these questions to flavor replies to others.

The questions:

What can be said about the probabilities that exist that hasn't been said already?

What more can the math of odds really tell us? They give a % value that varies depending on a number of factors, but quantify that into something more than just a dice roll. The percentages are fascinating theory, but averages manifest in life in rather unpredictable ways. You could argue that due to the way human perceive life and remember events, that all odds regardless of their actual % distill into a simple dichotomy: did happen, didn't happen.

Where will this thread go next? The pros and cons of damn near all the RAID levels have been discussed.

What is left to contribute here that hasn't already been said or detailed as a roll of the dice?

Again I don't want a replay/answers. I want you to consider these elements before diving into this thread.
"Welcome back my friends to the show that never ends. We're so glad you could attend. Come inside! Come inside!"
Ryu Connor
Global Moderator
Gold subscriber
 
 
Posts: 3598
Joined: Thu Dec 27, 2001 7:00 pm
Location: Marietta, GA

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Wed Mar 20, 2013 1:05 pm

Thanks Ryu!

I really think all Scrotos is waiting for are examples of real life setups, hopefully someone has a raid 5 array over 13TB and has experienced some failures (well, hopefully you haven’t but you get what I mean!) and can give results on the rebuild. Unfortunately I can only personally give my 8TB array as an example, all of my customers are using different RAID levels for the larger arrays and they are enterprise class drives anyways.

I personally have some other RAID specific questions regarding this URE issue but I suppose it would be best not to ask here. I would like Scrotos to get what he was after. Plus I’m curious about the real life scenarios too.

This reminds me of the problem with SSD’s and how many times the flash can be written to. There’s a lot of math you can run on that problem too. There are people out there though that have been running programs to constantly write data to their SSDs to test their longevity. Obviously these are two completely different things but it’s interesting to see how real life correlates to estimated values: http://www.xtremesystems.org/forums/sho ... nm-Vs-34nm (scroll down for more graphs)

Too bad rebuilds with such a large array would take so much time it wouldn’t be very practical to test this for fun.
Tachyonic Karma: Future decisions traveling backwards in time to smite you now.
Convert
Grand Gerbil Poohbah
Gold subscriber
 
 
Posts: 3134
Joined: Fri Nov 14, 2003 6:47 am

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Wed Mar 20, 2013 1:57 pm

Convert wrote:This reminds me of the problem with SSD’s and how many times the flash can be written to. There’s a lot of math you can run on that problem too. There are people out there though that have been running programs to constantly write data to their SSDs to test their longevity. Obviously these are two completely different things but it’s interesting to see how real life correlates to estimated values: http://www.xtremesystems.org/forums/sho ... nm-Vs-34nm (scroll down for more graphs)

As we've discussed previously in other threads, this sort of testing doesn't actually tell you much. As flash wears, its data retention time will become shorter. So unless there's something being done to evaluate the degree to which data retention time of the drive degrades, saying "I wrote 3x as much data to the drive as the endurance spec and the drive still works!" is pretty meaningless. Sure, I was just able to fill the drive with another batch of test data. But if I then leave the drive alone for a month and try to read that data back, will it still be there?
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 38123
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Is RAID 5/6 dead due to large drive capacities?

Postposted on Wed Mar 20, 2013 2:49 pm

You are definitely right, but some of the drives were tested with that in mind. They were left off for periods of time and then they attempted to read from the drive again and that time was recorded. It's not a definitive answer, and neither will be any personal experiences regarding the RAID 5 question Scrotos has, but it's interesting to see some real life experiences (to me at least). The SSD endurance testing was done on such a small scale I wouldn't lend it too much on that alone. It just reminds me of it is all, there's a theoretical number out there and sometimes there's just no way to really know without a large enough sample size and proper testing and then you still might end up being one of the outliers!
Tachyonic Karma: Future decisions traveling backwards in time to smite you now.
Convert
Grand Gerbil Poohbah
Gold subscriber
 
 
Posts: 3134
Joined: Fri Nov 14, 2003 6:47 am

PreviousNext

Return to Storage

Who is online

Users browsing this forum: No registered users and 3 guests