Personal computing discussed

Moderators: renee, SecretSquirrel, notfred

 
just brew it!
Administrator
Topic Author
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 9:15 am

I learned something fascinating about software RAID-1 on Linux yesterday. This all grew out of some messages I saw logged by the monthly scrub pass that Linux does on all active RAID arrays (it runs a scrub on the first Sunday of every month in the middle of the night).

The mystery began when I noticed that the scrub had logged a dozen or so mirror mismatches. Yet the array status still showed as healthy. My first thought was basically, "WTF? Is my array getting corrupted? Is a drive failing? And why does the scrub show mismatches, but the array still shows as healthy?"

The messages logged by the scrub did not give actual block addresses, so the first task was to figure out if the mismatches were real, and identify the block addresses associated with them. I wrote a short program that calculates MD5 hashes for each 1MB chunk of a raw partition, ran this program against both devices, and diffed the output to identify the 1MB spans where the mismatches were located. Then I ran the same program with a 4KB chunk size over just those 1MB ranges, to get a list of disk block offsets to each discrepancy.

Upon examination of the contents of the suspect blocks, I discovered that the two drives of the mirror always contained similar, but not identical data. In every case I examined, the block from drive A would have some non-zero data, followed by zeros. The corresponding block from drive B would also have some non-zero data followed by zeros, but the point at which the zeros started would be different. The non-zero data always matched, up to the point where the zeros started in the "shorter" block.

I then used debugfs to examine the mounted file system, and discovered that all of those mismatched blocks corresponded to free space in the file system. None of the mismatched blocks contained data belonging to a valid file.

After doing some Googling and reading about Linux's RAID-1 implementation, I believe I've figured out what happened. If you have an application which is appending to a file piecemeal, you can have a race condition where the file system decides to commit a block from the OS's cache to physical media just as the application is about to append additional data. Since the writes to the two drives of the RAID-1 mirror don't occur at exactly the same instant, one drive can get a slightly newer version of that block than the other one. Normally this discrepancy would not persist for long, since the second application write marks the cache block as dirty (again), and this will cause another physical write to get queued up, committing the updated (and consistent) data for that block to both devices in the array.

But what happens if the file gets deleted before this second physical write gets queued? Well, the corresponding blocks in the OS's disk cache get dropped, the second physical write never happens, and the last block of the (now deleted) file is left in an inconsistent state on the underlying RAID media!

Any application that creates temporary files which are then deleted a few seconds later could potentially hit this hole. But since the mismatch only ever happens with data belonging to deleted files, it is "mostly harmless". It may even result in a small performance gain in certain situations, since data belonging to temporary files which are created and quickly deleted never needs to be flushed to physical media.

It certainly has the potential to cause confusion and panic for sysadmins who don't understand that the RAID mismatches are "normal", though. In effect, it results in "false positives" from the scrub pass, since the scrub pass does not know anything about the file system sitting on top of the RAID array.

I also confirmed my theory by writing zeros to all of the free space on the mounted file system. After doing this, all of the RAID mismatches disappeared.

Bottom line: Linux RAID-1 interacts with the file system in non-obvious ways. The upshot of this is that under certain conditions, free space on the file system may have inconsistent data on the underlying RAID devices.

Edit: Corrected a typo and clarified a couple of things.
Nostalgia isn't what it used to be.
 
chuckula
Minister of Gerbil Affairs
Posts: 2109
Joined: Wed Jan 23, 2008 9:18 pm
Location: Probably where I don't belong.

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 9:31 am

We need an ABC after-school special about this.

I suggest that Scott Baio star as JBI.
4770K @ 4.7 GHz; 32GB DDR3-2133; Officially RX-560... that's right AMD you shills!; 512GB 840 Pro (2x); Fractal Define XL-R2; NZXT Kraken-X60
--Many thanks to the TR Forum for advice in getting it built.
 
DragonDaddyBear
Gerbil Elite
Posts: 985
Joined: Fri Jan 30, 2009 8:01 am

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 9:33 am

That's very interesting. Thanks for sharing. If I understand this, then it sounds like there is no lock put on the data set before it is written to disk, allowing for a race condition. That seems like a bug that could cause some issues. I apologize if I missed it, but what file system are you using?
 
chuckula
Minister of Gerbil Affairs
Posts: 2109
Joined: Wed Jan 23, 2008 9:18 pm
Location: Probably where I don't belong.

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 9:38 am

DragonDaddyBear wrote:
That's very interesting. Thanks for sharing. If I understand this, then it sounds like there is no lock put on the data set before it is written to disk, allowing for a race condition. That seems like a bug that could cause some issues. I apologize if I missed it, but what file system are you using?


I'd be curious to see if the race condition is in the filesystem itself or at the MD RAID layer. Based on his analysis it sounds like it might be agnostic to the underlying filesystem.
4770K @ 4.7 GHz; 32GB DDR3-2133; Officially RX-560... that's right AMD you shills!; 512GB 840 Pro (2x); Fractal Define XL-R2; NZXT Kraken-X60
--Many thanks to the TR Forum for advice in getting it built.
 
DragonDaddyBear
Gerbil Elite
Posts: 985
Joined: Fri Jan 30, 2009 8:01 am

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 9:55 am

chuckula wrote:
I'd be curious to see if the race condition is in the filesystem itself or at the MD RAID layer. Based on his analysis it sounds like it might be agnostic to the underlying filesystem.


Good point. I wonder if this happens with newer "next-gen" file systems with their fancy check sums and what not.
 
morphine
TR Staff
Posts: 11600
Joined: Fri Dec 27, 2002 8:51 pm
Location: Portugal (that's next to Spain)

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 10:02 am

Niiiice, that's some impressive sleuthing and an easy-to-understand explanation. I second the idea of JBI doing an educational show :)
There is a fixed amount of intelligence on the planet, and the population keeps growing :(
 
just brew it!
Administrator
Topic Author
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 10:06 am

@DragonDaddyBear - File system is ext4.

@chuckula - I believe the race exists with any file system that uses the Linux block cache to stage data and does not immediately flush dirty blocks belonging to a file when the file is closed. This is probably true of most modern file systems with a native Linux implementation, for performance reasons.

Based on my investigation, I believe this race cannot corrupt user data; if the file doesn't get deleted, then the on-disk copies will (eventually) end up back in sync. This behavior does, however, make investigating potential RAID anomalies more difficult, since you need to have knowledge of the file system structure (and know how to use the debugfs tool) to determine whether a discrepancy is in a block which is part of a valid file or not.

It might also open up some additional scenarios where things can end up in a weird state in the event of a power cut. But in that situation you really can't trust the contents of files which were being actively written anyway -- RAID or not, data may have been sitting in a volatile write cache internal to the HDD or SSD, waiting to be written to physical media.

I also suspect you will never see this situation on SSD-based arrays where the file system is mounted with the on-the-fly TRIM option, since TRIMmed blocks should read back as all zeros. (AFAIK Linux RAID-1 supports TRIM pass-through.)
Nostalgia isn't what it used to be.
 
DragonDaddyBear
Gerbil Elite
Posts: 985
Joined: Fri Jan 30, 2009 8:01 am

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 10:16 am

morphine wrote:
Niiiice, that's some impressive sleuthing and an easy-to-understand explanation. I second the idea of JBI doing an educational show :)

I vote the first episode be about beer brewing!
 
Forge
Lord High Gerbil
Posts: 8253
Joined: Wed Dec 26, 2001 7:00 pm
Location: Gone

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 10:27 am

These are why the fs journal exists. In case of a power cut mid-write, the journal can be replayed to get the fs to a sane state. It would also get the md into a sane state as well.
Please don't edit my signature for me. Thanks.
 
just brew it!
Administrator
Topic Author
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 10:29 am

Forge wrote:
These are why the fs journal exists. In case of a power cut mid-write, the journal can be replayed to get the fs to a sane state. It would also get the md into a sane state as well.

By default ext4 only journals meta-data, not file contents. You can configure it to journal everything, but this results in a substantial performance hit.
Nostalgia isn't what it used to be.
 
Forge
Lord High Gerbil
Posts: 8253
Joined: Wed Dec 26, 2001 7:00 pm
Location: Gone

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 10:33 am

That's a good point. I was figuring the actual file data would be a loss, since ext4 isn't COW, and was thinking more that the journal replay would flag the partial file as being no good. You're right, though, I don't think the default journalling setup would work the way I'm thinking.

Use ZFS. It's more better.
Please don't edit my signature for me. Thanks.
 
just brew it!
Administrator
Topic Author
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 10:36 am

Forge wrote:
Use ZFS. It's more better.

Yes, ZFS is on my long (and ever growing) "I need to learn about that" list.
Nostalgia isn't what it used to be.
 
Forge
Lord High Gerbil
Posts: 8253
Joined: Wed Dec 26, 2001 7:00 pm
Location: Gone

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 10:40 am

just brew it! wrote:
Forge wrote:
Use ZFS. It's more better.

Yes, ZFS is on my long (and ever growing) "I need to learn about that" list.


Spare machine of any config, at least 3-4 disks of a size (or real close) and an afternoon. FreeNAS makes a great start, and once you know the terms and concepts, ZOL (ZFS On Linux) will let you apply it anywhere the disks are available. It's very good stuff, I trust a lot of my most important files to it (family photos and such).
Please don't edit my signature for me. Thanks.
 
Topinio
Gerbil Jedi
Posts: 1839
Joined: Mon Jan 12, 2015 9:28 am
Location: London

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 10:41 am

Very nice, and worth knowing about.

just brew it! wrote:
By default ext4 only journals meta-data, not file contents. You can configure it to journal everything, but this results in a substantial performance hit.

I thought that (by default, data=ordered) this wasn't a problem because the writes are in transactions with the data blocks written to storage first, just before the metadata?
Desktop: 750W Snow Silent, X11SAT-F, E3-1270 v5, 32GB ECC, RX 5700 XT, 500GB P1 + 250GB BX100 + 250GB BX100 + 4TB 7E8, XL2730Z + L22e-20
HTPC: X-650, DH67GD, i5-2500K, 4GB, GT 1030, 250GB MX500 + 1.5TB ST1500DL003, KD-43XH9196 + KA220HQ
Laptop: MBP15,2
 
Topinio
Gerbil Jedi
Posts: 1839
Joined: Mon Jan 12, 2015 9:28 am
Location: London

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 10:44 am

Forge wrote:
Spare machine of any config, at least 3-4 disks of a size (or real close) and an afternoon. FreeNAS makes a great start, and once you know the terms and concepts, ZOL (ZFS On Linux) will let you apply it anywhere the disks are available. It's very good stuff, I trust a lot of my most important files to it (family photos and such).
This. I've run ZFS on several Solaris fileservers at work since 2006 (on a StorageTek 3510 FC array!) and it's the business. FreeNAS at home, in a ProLiant MicroServer 8)
Desktop: 750W Snow Silent, X11SAT-F, E3-1270 v5, 32GB ECC, RX 5700 XT, 500GB P1 + 250GB BX100 + 250GB BX100 + 4TB 7E8, XL2730Z + L22e-20
HTPC: X-650, DH67GD, i5-2500K, 4GB, GT 1030, 250GB MX500 + 1.5TB ST1500DL003, KD-43XH9196 + KA220HQ
Laptop: MBP15,2
 
just brew it!
Administrator
Topic Author
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 11:07 am

Topinio wrote:
just brew it! wrote:
By default ext4 only journals meta-data, not file contents. You can configure it to journal everything, but this results in a substantial performance hit.

I thought that (by default, data=ordered) this wasn't a problem because the writes are in transactions with the data blocks written to storage first, just before the metadata?

There's still no guarantee the data isn't sitting in a cache inside the drive when the power cut happens.

Topinio wrote:
Forge wrote:
Spare machine of any config, at least 3-4 disks of a size (or real close) and an afternoon. FreeNAS makes a great start, and once you know the terms and concepts, ZOL (ZFS On Linux) will let you apply it anywhere the disks are available. It's very good stuff, I trust a lot of my most important files to it (family photos and such).

This. I've run ZFS on several Solaris fileservers at work since 2006 (on a StorageTek 3510 FC array!) and it's the business. FreeNAS at home, in a ProLiant MicroServer 8)

Other than the drives, I should have hardware on hand that I could use to cobble together another home server box. I'm pretty sure there's an AM3 motherboard, a couple of sticks of ECC DDR3, and a 1090T somewhere in the hoard.

However, the front of the build queue is now a gaming rig (my first one in around a decade), using the goodies I got in the TR holiday giveaway last winter (yes, I'm finally doing this). CPU and RAM were ordered on Monday, I'm just waiting for them to arrive. After that build is complete I can start thinking about a ZFS box (assuming gaming doesn't start sucking up too much of my time). :wink:

Edit: The gaming build will also be the first Windows build I've done for myself in quite a few years. I've put together a few Windows systems for family members, but my personal desktop and laptop have been Linux-based since around 2009 or so.
Nostalgia isn't what it used to be.
 
Waco
Maximum Gerbil
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 11:32 am

Nice sleuthing! It doesn't surprise me that mdraid isn't perfect, it also doesn't surprise me that filesystems designed for speed over integrity might leave the underlying disks in an inconsistent state. If this was any other situation you'd be guessing at which disk is "correct", this is why 3-way mirrors are the only mirrors I tend to run.

ZFS, if you care, doesn't use the Linux block cache and is always consistent (in terms of filesystem integrity) on disk. In the event of a sudden power loss you will always have a consistent state on disk - if partial transactions are committed, you will be informed of them at pool import time.
Victory requires no explanation. Defeat allows none.
 
just brew it!
Administrator
Topic Author
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 12:04 pm

Waco wrote:
Nice sleuthing! It doesn't surprise me that mdraid isn't perfect, it also doesn't surprise me that filesystems designed for speed over integrity might leave the underlying disks in an inconsistent state. If this was any other situation you'd be guessing at which disk is "correct", this is why 3-way mirrors are the only mirrors I tend to run.

With this particular race, a 3-way mirror could very well end up with 3 different versions of the block though! :lol:
Nostalgia isn't what it used to be.
 
DragonDaddyBear
Gerbil Elite
Posts: 985
Joined: Fri Jan 30, 2009 8:01 am

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 12:28 pm

So, I read your write up one more time and have what may be a stupid question. Why doesn't the job just read the file system and scrub those sectors? Would it not be faster and avoid these kinds of harmless alarms?
 
just brew it!
Administrator
Topic Author
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 12:44 pm

DragonDaddyBear wrote:
So, I read your write up one more time and have what may be a stupid question. Why doesn't the job just read the file system and scrub those sectors? Would it not be faster and avoid these kinds of harmless alarms?

I suppose it could do that (not sure why you think it would be faster though). But in the general case, when there's a discrepancy on a 2-device RAID-1 the RAID subsystem really has no way of determining which block is correct. For this particular case, the RAID subsystem could assume that the block with more data in it is the "good" one; but it doesn't have any way of knowing whether we're in this situation or not, without figuring out whether the block is currently part of a valid file or on the free list.

An interesting alternative approach would be to have an option for ext4 to do a background wipe of free blocks. This would eliminate the RAID-1 mismatch issue, and have the side benefit of enhancing security by making it impossible to recover contents of deleted files using standard forensic techniques.

As I also noted above, this issue probably doesn't exist for situations where TRIM is enabled. As more of the world shifts to solid state storage, this race condition -- which is already more of a curiosity and/or a caveat for paranoid sysadmins than a genuine problem -- will become even less important.
Nostalgia isn't what it used to be.
 
DragonDaddyBear
Gerbil Elite
Posts: 985
Joined: Fri Jan 30, 2009 8:01 am

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 1:01 pm

just brew it! wrote:
DragonDaddyBear wrote:
So, I read your write up one more time and have what may be a stupid question. Why doesn't the job just read the file system and scrub those sectors? Would it not be faster and avoid these kinds of harmless alarms?

I suppose it could do that (not sure why you think it would be faster though). But in the general case, when there's a discrepancy on a 2-device RAID-1 the RAID subsystem really has no way of determining which block is correct. For this particular case, the RAID subsystem could assume that the block with more data in it is the "good" one; but it doesn't have any way of knowing whether we're in this situation or not, without figuring out whether the block is currently part of a valid file or on the free list.

An interesting alternative approach would be to have an option for ext4 to do a background wipe of free blocks. This would eliminate the RAID-1 mismatch issue, and have the side benefit of enhancing security by making it impossible to recover contents of deleted files using standard forensic techniques.

As I also noted above, this issue probably doesn't exist for situations where TRIM is enabled. As more of the world shifts to solid state storage, this race condition -- which is already more of a curiosity and/or a caveat for paranoid sysadmins than a genuine problem -- will become even less important.


I think it would be significantly faster on a less-full volume. 1GB to scrub vs TB's.
I don't suppose you feel like writing a cron job for testing that background wipe with something like zerofree on Saturday. I think it would be rather interesting to see the results.
 
just brew it!
Administrator
Topic Author
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 1:12 pm

DragonDaddyBear wrote:
I don't suppose you feel like writing a cron job for testing that background wipe with something like zerofree on Saturday. I think it would be rather interesting to see the results.

The problem with zerofree is that the file system needs to be unmounted (or at least mounted read-only) in order for it to run. Automating that via cron isn't going to work unless it can be guaranteed that nothing will have any directories or files open on the file system at the scheduled time.
Nostalgia isn't what it used to be.
 
Waco
Maximum Gerbil
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 2:24 pm

I also realize due to this that now I get to check mirrors at work on systems I didn't build.

I appreciate the effort you put into this!
Victory requires no explanation. Defeat allows none.
 
just brew it!
Administrator
Topic Author
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 2:54 pm

Waco wrote:
I also realize due to this that now I get to check mirrors at work on systems I didn't build.

I wouldn't stress over it if I was you. As noted above, it seems to be harmless from a data integrity standpoint, since it only affects deleted files. The main effects of it are that a scrub pass may report mismatches, and it complicates the investigation if you're looking into a suspected RAID corruption issue.

Waco wrote:
I appreciate the effort you put into this!

It was really bugging me, since (prior to figuring this stuff out) I was worried that my RAID array was going south. So my goal was to confirm that either 1) there's a hardware/system problem, or B) that it's a false alarm. Along the way, I learned some surprising things about how ext4 and md RAID interact.
Nostalgia isn't what it used to be.
 
SuperSpy
Minister of Gerbil Affairs
Posts: 2403
Joined: Thu Sep 12, 2002 9:34 pm
Location: TR Forums

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 2:58 pm

just brew it! wrote:
But in the general case, when there's a discrepancy on a 2-device RAID-1 the RAID subsystem really has no way of determining which block is correct.


This is precisely why I started putting everything on ZFS (checksum all the things!). If the drive falls off the face of the earth, sure md can fix it. But if the drives can't agree, you/md basically have to flip a coin to decide who to trust.

I need to do a bit of research on Windows software RAID, because I tend to use that on important machines of the Microsoft variety, and it is probably even more susceptible to such disagreements as it seems to do a full resync every time the box boots with an array marked dirty. :roll:
Desktop: i7-4790K @4.8 GHz | 32 GB | EVGA Gefore 1060 | Windows 10 x64
Laptop: MacBook Pro 2017 2.9GHz | 16 GB | Radeon Pro 560
 
blahsaysblah
Gerbil Elite
Posts: 581
Joined: Mon Oct 19, 2015 7:35 pm

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 4:21 pm

SuperSpy wrote:
just brew it! wrote:
But in the general case, when there's a discrepancy on a 2-device RAID-1 the RAID subsystem really has no way of determining which block is correct.


This is precisely why I started putting everything on ZFS (checksum all the things!). If the drive falls off the face of the earth, sure md can fix it. But if the drives can't agree, you/md basically have to flip a coin to decide who to trust.

I need to do a bit of research on Windows software RAID, because I tend to use that on important machines of the Microsoft variety, and it is probably even more susceptible to such disagreements as it seems to do a full resync every time the box boots with an array marked dirty. :roll:

Dont look at Windows software RAID, its very old tech. A simple mirror will for example always read from one drive only... Its not even random on bootup, one drive had massive reads, other had almost none(brand new drives id put into simple mirror). After some research, any sane assumption is just that. An assumption.

If you consider Storage Spaces, you will very likely need to use powershell to manually create everything because the GUI chooses some opinionated defaults. And again, assumptions. Like if you want to create RAID 10, verify its 2x2. Havnt checked recently. I think it did with newer Win 10 updates. Anyway, you have to go through checklist to verify manually it does what you think in background.


Some things i used to do for ReFS(would think its default for a simple mirror or raid 10) before i moved on.

Set-FileIntegrity -Enable 1 -Enforce 1 -FileName E:\
Get-Item -path E:\ | Set-FileIntegrity -Enable 1 -Enforce 1
Get-Item -path E:\ | Get-FileIntegrity

Get-StoragePool

Set-StoragePool -FriendlyName "Storage pool" -ClearOnDeallocate 1
(Fixed takes long time for bigger pools)
Set-StoragePool -FriendlyName "Storage pool" -ProvisioningTypeDefault Fixed
 
Topinio
Gerbil Jedi
Posts: 1839
Joined: Mon Jan 12, 2015 9:28 am
Location: London

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 4:37 pm

I noped out of Windows' software RAID on first encounter, on discovering the implemented idea of needing a manually-created fault tolerant boot floppy containing the files necessary to boot from the "other" disk, i.e. the mirror is a copy not an equivalent.

Sadly, this first encounter was someone's DC. That I needed to get back up...
Desktop: 750W Snow Silent, X11SAT-F, E3-1270 v5, 32GB ECC, RX 5700 XT, 500GB P1 + 250GB BX100 + 250GB BX100 + 4TB 7E8, XL2730Z + L22e-20
HTPC: X-650, DH67GD, i5-2500K, 4GB, GT 1030, 250GB MX500 + 1.5TB ST1500DL003, KD-43XH9196 + KA220HQ
Laptop: MBP15,2
 
Waco
Maximum Gerbil
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 5:40 pm

just brew it! wrote:
Waco wrote:
I also realize due to this that now I get to check mirrors at work on systems I didn't build.

I wouldn't stress over it if I was you. As noted above, it seems to be harmless from a data integrity standpoint, since it only affects deleted files. The main effects of it are that a scrub pass may report mismatches, and it complicates the investigation if you're looking into a suspected RAID corruption issue.

That's actually the part I'm worried about - there are some admin teams that will just replace a drive on any sign of a problem, so I'd like to skip the unneeded hardware replacements and subsequent rebuilds. :)

SuperSpy wrote:
I need to do a bit of research on Windows software RAID, because I tend to use that on important machines of the Microsoft variety, and it is probably even more susceptible to such disagreements as it seems to do a full resync every time the box boots with an array marked dirty. :roll:

Like others have said, don't even bother. Windows *always* trusts the primary drive, and if it ever goes offline, the array isn't even bootable any more. If the primary starts to fail and you end up with a dirty reboot, Windows will happily copy junk over your second drive.
Victory requires no explanation. Defeat allows none.
 
blahsaysblah
Gerbil Elite
Posts: 581
Joined: Mon Oct 19, 2015 7:35 pm

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 7:45 pm

Just FYI, to get FreeNAS 11.0-U2 working under Windows 10 Hyper-V, i had to create Gen 1 host with Guest services off. Remember to offline disks to make available for Hyper-V disk pass through if you want to give ZFS real disks(SMART,...) instead of VHDs.
 
just brew it!
Administrator
Topic Author
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: A Linux RAID-1 mystery (and some answers)

Wed Aug 09, 2017 7:58 pm

Waco wrote:
just brew it! wrote:
I wouldn't stress over it if I was you. As noted above, it seems to be harmless from a data integrity standpoint, since it only affects deleted files. The main effects of it are that a scrub pass may report mismatches, and it complicates the investigation if you're looking into a suspected RAID corruption issue.

That's actually the part I'm worried about - there are some admin teams that will just replace a drive on any sign of a problem, so I'd like to skip the unneeded hardware replacements and subsequent rebuilds. :)

Ahh, gotcha. Always gotta take the human element into account. :wink:

But in a situation like this, which drive would they replace? Just pick one at random? :roll: (I can see it now... "We replaced one of the drives, and the errors went away after the rebuild, so we must have guessed right!" :lol:)

In your shoes, I'd probably just let 'em do the drive swaps. As long as the other drive doesn't fail during the rebuild, it should be harmless... :lol:
Nostalgia isn't what it used to be.

Who is online

Users browsing this forum: No registered users and 1 guest
GZIP: On