2010-06-20

Does Btrfs survive silent disk data corruption in RAID1 mode?

Some of my experimenting with Btrfs in Linux 2.6.34 yielded the following results:

  • If only one disk (out of two) contains corrupt data, then Btrfs detects some checksum failures, and then recovers (overwriting the corrupt data with the corresponding good data from the other disk). This works even if the corruption happened while the filesystem was mounted.
  • If both disks contain corrupt data (but not at the same location), then Btrfs detects some checksum failures, then recovers some data, but it won't recover all of it: some files continue to have I/O errors when reading, and the syslog will contain checksum failures again and again. The explanation for this behavior may be that in Btrfs RAID 1 mode the two copies of a block of data might be at a different offset on the two disks.

Here is how I did the experiment:

  • I had a Linux 2.6.34 system.
  • I had 2 partitions of size 2000061 KB each. (1 KB == 1 << 10 bytes.)
  • # mkfs.btrfs -m raid1 -d raid1 /dev/sdc1 /dev/sdb1
  • # mount /dev/sdb1 /mnt/p
  • I copied /var/lib/dpkg (7248 small files of 36.93 MB) recursively to /mnt/p.
  • I copied 4 large files of 1517.98 MB in total to /mnt/p. (So the filesystem became >75% full.)
  • I created 10000 empty files.
  • I calculated the checksum of all files in /mnt/p with a userspace tool.
  • I introduced the single-disk corruption by running dd if=/dev/zero of=/dev/sdb1 bs=1M count=1600 seek=200
  • I calculated the checksum of all files again. At this point the kernel reported some block checksum mismatches in the syslog, but eventually Btrfs has recovered all the data, and the file checksums matched with the previous run.
  • I introduced the non-ovarlapping multi-disk corruption by running dd if=/dev/zero of=/dev/sdb1 bs=1M count=800 seek=200 && dd if=/dev/zero of=/dev/sdc1 bs=1M count=800 seek=1000
  • I calculated the checksum of all files again. At this point the kernel reported some block checksum mismatches in the syslog, and it could reover some blocks, but not all, and some files yielded an I/O error, but the checksum of the non-erroneous files matched with the previous run.

No comments: