Exploring btrfs for backups Part 1


Recently I once again came across an article about the benefits of the btrfs Linux file system. Last time I’d come across it, it was still in alpha or beta, and I also didn’t understand why I would want to use it. However, the most I’ve learned about the fragility of our modern storage systems, the more I’ve thought about how I want to protect my data. My first step was to sign up for offsite backups. I’ve done this on my Windows computer via Backblaze. They are pretty awesome because it’s a constant backup so it meets all the requirements of not forgetting to do it. The computer doesn’t even need to be on at a certain time or anything. I’ve loved using them for the past 2+ years, but one thing that makes me consider their competition is that they don’t support Linux. That’s OK for now because all my photos are on my Windows computer, but it leaves me in a sub-optimal place. I know this isn’t an incredibly influential blog and I’m just one person, but I’d like to think writing about this would help them realize that they could a) lose a customer and b) be making more money from those with Linux computers.

The second step is local backups. Because I really don’t want to actually USE the offsite backups I have. Getting all my data back from Backblaze would cost a few hundred dollars. (for the hard drives they’d send it back in) Right now on my Windows computer I don’t have a good solution for that at the moment, but on Linux I’m using Back In Time. Back In Time creates an incremental backup of my home drive. It works very well and has saved me a few times when settings files have gotten corrupt. But there’s an even more insidious problem that regular backups don’t solve: bit rot.

If you want to see the problem with bit rot, check out this Ars Technica article (the one that got me thinking about btrfs again). That just happens because we have ephemeral magnetic storage. Bits get flipped and your photos get ruined and other files don’t open and so forth. As the article shows, with a RAID 1 configuration, btrfs file systems can be self-healing against bit rot. (This doesn’t solve the problem with my Windows machine, but Linux has everything but my photos on it) So I want to set up a btrfs home directory in RAID 1 and have it back up to a backup drive.

Btrfs has some even cooler tricks up its sleeve. Btrfs can create snapshots which allow you to go back in time through your files in case you make a change you didn’t mean to. Think of it as stage 1 of the file backup system I’m trying to have here. So as long as your hard drive is fine, the snapshots let you recover deleted files or older versions of files that have been accidentally changed. (Say your kid wrecked it and then hit save) Stage 2 is RAID 1, which is protecting against bit rot as well as allowing one hard drive to fail without needing to dip into your backups. It allows you to keep working until the replacement hard drive arrives. Stage 3 is the backup hard drive which protects against failure of both drives from the RAID 1 as well as some file corruption protection in case btrfs’ bit rot healing fails. Finally, stage 4 is offsite backup. Don’t have that right now on Linux.

This article is meant to be a series of tutorials for myself as I set this up in the future as well as for anyone out there who wants to have a similar setup. I’m going to first do this on a test VM running Fedora 20. That will allow me to perfect things without damaging my data on my real computer. Then I’ll develop my backup script on my VM and then I’ll finally be ready to do things on my main computer.

OK, first thing’s first. Let’s get things set up with the first snapshot.

$ sudo btrfs sub list / 
ID 256 gen 443 top level 5 path root 
ID 258 gen 443 top level 5 path home 

$ sudo btrfs sub create /home/.snapshots 
Create subvolume '/home/.snapshots' 

$ sudo btrfs sub snapshot -r /home /home/.snapshots/myfirstsnapshot
 Create a readonly snapshot of '/home' in '/home/.snapshots/myfirstsnapshot'

$ sudo btrfs sub list /
 ID 256 gen 448 top level 5 path root
 ID 258 gen 447 top level 5 path home
 ID 263 gen 447 top level 5 path home/.snapshots
 ID 264 gen 447 top level 5 path home/.snapshots/myfirstsnapshot

I was able to ls into that snapshot and see that it did indeed contain  a copy of my home directory. This is where I’ll end things today. The next step is to start on the script that’s going to automatically snapshot the home directory every hour, cull the snapshots intelligently, and, finally, back it up to the backup drive. I’ll be doing this on github and I’ll have a blog post when I get the first part of it running.

Oh, a thank you to Duncan on the btrfs mailing list to helping me out with this. Also, the way this is setup, will it not recursively end up with snapshots within snapshots because .snapshots is in /home?