Code: Select all
[ 366.857814] kernel BUG at /build/linux-EO9xOi/linux-4.4.0/fs/btrfs/extent_io.c:2125!
[ 366.857875] invalid opcode: 0000 [#1] SMP
...
[ 366.860066] Call Trace:
[ 366.860125] [<ffffffffc052d37d>] clean_io_failure+0x1ad/0x1c0 [btrfs]
[ 366.860208] [<ffffffffc052dc1a>] end_bio_extent_readpage+0x2fa/0x5b0 [btrfs]
[ 366.860268] [<ffffffff813c079f>] bio_endio+0x3f/0x60
[ 366.860338] [<ffffffffc05033bc>] end_workqueue_fn+0x3c/0x40 [btrfs]
[ 366.860422] [<ffffffffc053eeba>] btrfs_scrubparity_helper+0xca/0x2f0 [btrfs]
[ 366.860509] [<ffffffffc053f1ce>] btrfs_endio_helper+0xe/0x10 [btrfs]
[ 366.860564] [<ffffffff8109a575>] process_one_work+0x165/0x480
[ 366.860613] [<ffffffff8109a8db>] worker_thread+0x4b/0x4c0
[ 366.862906] [<ffffffff8109a890>] ? process_one_work+0x480/0x480
[ 366.865187] [<ffffffff810a0c08>] kthread+0xd8/0xf0
[ 366.867465] [<ffffffff810a0b30>] ? kthread_create_on_node+0x1e0/0x1e0
[ 366.869747] [<ffffffff8183888f>] ret_from_fork+0x3f/0x70
[ 366.872019] [<ffffffff810a0b30>] ? kthread_create_on_node+0x1e0/0x1e0
[ 366.874272] Code: ff 48 8b 45 a0 48 8b 50 40 eb d3 0f 0b 41 bd fb ff ff ff e9 7c fe ff ff 4c 89 e7 41 $
[ 366.879344] RIP [<ffffffffc052d0c7>] repair_io_failure+0x217/0x240 [btrfs]
[ 366.881850] RSP <ffff880858e63bf0>
[ 366.884346] ---[ end trace 1b1807870d32b53f ]---
RAID1 volume with nothing fancy. Disk 2 is failing, badly. Added a new disk 3. Started deleting failing disk 2.
DON'T TOUCH THE FILE SYSTEM. The above might happen!
Should I have replaced instead of add/delete? Perhaps, maybe even with -r.
But just crash? If I can't write to it, why did it even let me mount as write? I'm not even sure if that was the problem, could have just been the reads. But I figure it was the writes.
I didn't hot-swap anything, I rebooted to add the new disk. The filesystem came up fine with the failing disk. No problem adding the new disk to the array. I even rebooted again, to remove the failing disk and put it into a USB 3 enclosure. That way, when everything was done rebuilding overnight, I wouldn't have to open the system again to chunk the defective disk.
Anyway, all data is fine, everything is rebuilt and failing drive is successfully deleted from array (I just mounted the volume on some other random path during the rebuild to make sure I wouldn't overlook some random thing that might touch it overnight and crash everything again).
But still, what on earth?