While bitrot is a real thing, as HDDs' sectors are ECC protected, it is (fortunately) a rare event. So it is difficult that the problem Ars described was due to real bitrot.
The ECC protection should detect and recover most errors. Basically, three scenarios are possible:
- detected single bit error: it should be detected _and_ corrected. This evenience is logged by the HDD firmware but will not cause any harm, as the OS will receive correct informations;
- detected multi bit error: this will result in a "bad sector" error and the HDD firmware will refuse to pass the information to the OS;
- undetected multi bit error: this is the most dangerous condition. The HDD firmware will pass BAD data to the OS.
A "real" bitrot happens when a correctly-written information is altered due to external condition and/or due to the intrisic limits of magnetic storage. This kind of bitrot will generally alter a single bit, so that the ECC code will recover it harmlessy. A periodic check of HDD SMART data will highlight some ECC read errors, so the HDD can be swapped out before a real problem happens. Even multi-bit rot have good probability to be spotted by the ECC check, so the OS will receive a "bad sector" error and can re-read the same data from the RAID array.
The real problem is when a multi-bit rot is _not_ detected and so the OS can receive bad data. However, in normal condition (and in monitored systems) this should be a really rare event.
Another potential source of problem is described by the "bit error rate", which is the probability that, due to the intrisics limit of the magnetic process used to write on the media, the HDD will write bad data on the first attempt _and_ that the ECC code will detect but not correct the error. The bit error rate (BER) of common, consumer disk is one error each 10^14 bit written, or one unrecoverable bit error each 12.5 TiB of data (LINK: http://www.wdc.com/wdproducts/library/S ... 771438.pdf
), while server-grade disk are 10x-100x times better.The probability that this error is not only unrecoverable, but also undetected is quite small. Any redundant-based RAID array level (eg: 1,10) should have no problem to recover from these kind of errors, while parity-based level (eg: 4,5,6) are in big troubles.
Anyway, a MUCH bigger problem is the probability that the to-be-written data are corrupted inside the RAM, and this likely is the problem observed by Ars. The point is that a data corruption that happens in RAM is totally undetectable by the HDD units: after all, the data arrived to it _already changed_ and the HDD's calculated ECC code will be consistent with the inconsistent (!) data written. ECC-protected RAM is therefore absolutely critical in storage units.
Don't let me wrong: end-to-end data checksumming is a very valuable feature, and high-level SAN units implement it. At the same time, ECC-protected servers should only very rarely suffer from bit rot and undetectable bit error (but avoid parity-based RAID, please). Let now see the same thing on the consumer side: as consumer PCs are without ECC ram and they have a single HDD, you can think that they will greatly benefit from the additional error recovery granted by data checksumming. In some manner, this is true: after all, Ars's article just proved that. However, a ZFS or BTRFS scrub on a non-ECC equipped system with even a single problematic memory location will lead to catastrophic data corruption: http://forums.freenas.org/index.php?thr ... zfs.15449/
- Ars problem was likely caused by a flipped RAM bit (flipped _after_ the data checksum process, or during transfer to the SATA port);
- in a classic filesystem, this will cause a file corruption (as happened), while on a data checksumming filesystem the error will detected and (hopefully) corrected;
- using ECC RAM the problem is basically solved;
- in the event of a bad stick of non-ECC RAM, a scrub operation (on a scrub-capable filesystem, as BTRFS and ZFS) will catastrophically corrupt your data.
Now, a small performance-related detour. A COW-based filesystem will show BIG slowdown on rewrite-intensive workload when used on classic HDD. Database and virtual machine will peform much worse, due to the ever increasing file framments. I wrote something on the subject here (http://www.ilsistemista.net/index.php/l ... ml?start=5
) and here (http://www.ilsistemista.net/index.php/l ... -look.html
). I don't know about ZFS, but even with COW disabled BTRFS was noticeably slower the EXT4 or XFS.
So, what filesystem you should use for your storage?
- if you need a "cold storage" solution and the "experimental" label don't scare you, BTRFS + ECC memory is the way to go;
- if online data deduplication is a requirements, ZFS + 8/16/32GB ECC RAM is the best setup;
- if you want to run rewrite intensive application at full speed (eg: databases, virtual machines) use EXT4 or XFS (even better, if your application support it, directly use a LVM volume);
- if you plan to use non-ECC memory, stay with EXT4 or XFS.