Some Notes on btrfs

2015-12-13 · 1139 words · 6 minute read

Technology

linux · disk · raid · btrfs

Over the past couple of years I’ve been playing with btrfs off and on. I converted my home server over to it last year, trusting it with my most important data (i.e. mine); this entry represents some notes around the strengths and weaknesses I’ve found with btrfs.

Some Background

I started working with ZFS professionally in 2008. This was a little after it’s production release, but it was still new enough to be a rich source of problems and pain (some of which have never really gone away); one thing this early exposure gave me was a good grounding in some of the concepts that btrfs would also exploit: COW, checksumming, and in particular the unification of device, volume, and filesystem management. While end-to-end checksums are often touted as one of the key benefits of the latest filesystems (ReFS, btrfs, ZFS) the biggest change in the day-to-day life of an admin is the last feature: the unification.

Unified Storage Management

In a traditional setup, we tend to manage RAID, volumes, and filesystems completely independently. We create a RAID policy with hardware or specialised RAID software; then we (typically) use a volume manager to create partitions, and then format them up with our filesytem of choice. This has the advantage of giving us complete control of each item in the chain (block device, RAID metadevice, volume metadevice, filesystem choice and tuning), but has the drawback of leaving each layer ignorant of what’s going in the other - as an admin, you end up having to pass hints up and down from (e.g.) your RAID layer to your FS layer to get the right performance.

btrfs, like ZFS, presents a single, simplified model. You add devices into a pool of storage and treat it as a unified blob to be managed; btrfs takes this even further than ZFS; ZFS still wants you to care about how VDEVs are configured and aggregated into zpools. btrfs collapses even those distinctions down into policies applied to a filesystem and its subvolumes. It’s quite a different model, and it can be a bit disconcerting until you’re used to it; setting different quotas and RAID policies is a question of setting up subvolumes (which simply look like directories) and applying policies to them, rather than building new RAID devices and volumes.

Advantages

The btrfs architecture differs radically from ZFS in one area in particular: whereas the underlying VDEVs for ZFS implement a traditional RAID mechanism, expecting heterogenous underlying devices (and sometimes reacting poorly when it doesn’t get them - for example, trying to replace disks with 512 byte vs 4096 byte sectors may either just not work, or cripple performance, depending on the scenario). As is traditionally the case, swapping a 1 TB drive with a 2 TB drive in an array simply results in the extra 1 TB being wasted. If you add a third drive to a RAID1 mirror you can have an extra clone of your data, but otherwise the extra capacity is wasted.

btrfs, on the other hand, takes the notion of a fully integrated stack to its logical conclusion, and implements an unRAID style approach to storage: it spreads blocks around to maximise available storage. If you add a third drive to a RAID1 array, it will spread the blocks over all three drives, giving you additional capacity. If you replace a 1 TB drive with a 2 TB drive, btrfs will try to use the extra capacity.

Mixing and matching drives this way is a huge and under-emphasised benefit of btrfs. Compared to the options available in mdraid (and ZFS) it vastly simplifies growning storage and replacing failing disks for SOHO environments.

Weaknesses of btrfs

btrfs is still very sensitive to kernel selection. After many years of development and Linus’ official blessing, btrfs is still not especially stable or reliable. Much like the 1.2/1.3 days of Linux, you need to be prepared to track specific kernel versions to work out which ones will keep your data and which ones will work as advertised. If you’re a gearhead who likes fiddling this is gratifying. If you don’t like rebuilding filesystems while people wait on them to become available… not so much.

Even if things weren’t unreliable, the tools are dreadfully immature - df has several FAQ entries because there’s no good mapping of “what is going on with storage” to “what users expect to see when they use df”; quotas are opaque and significantly less usable than traditional Linux quotas (which is saying something).

RAID5/RAID6 is still very immature; while it’s true that RAID5 has a lot of shortcomings with modern, high-capacity hard drives, not having it for (e.g.) bulk storage of low-criticality, replaceable data (e.g. DVD rips for a home server, on-line archives of old, backed-up jobs for a SOHO environment) is a lack.

Documentation could use some work. It’s not obvious to the casual reader how e.g. RAID1, RAID10, and RAID 5 work with multi-disk volumes; this has been gradually but slowly improving, but the experimental nature of btrfs tends to create a common problem, where the in-group (which includes me, I guess) who have worked with the tech don’t need to document what they’ve learned, and the newcomer finds it all pretty opaque.

Many more sophisticated management options aren’t available at all, or are awkward or buggy implementations; off the top of my head these would include:

Allocation policies - no way of forcing negative affinity for blocks across channels. This is important if you want large software defined storage systems.
No built-in SSD caching; tiering has become a more viable and useful thing (and will probably continue to do so until solid state storage options drop in price a few more multiples), but not for btrfs.
Encryption is clumsy and stupid. You need to encrypt the underlying block devices that make up a btrfs pool. You can’t, as you’d expect, apply encryption to a subvolume. You can encrypt whole pools, or nothing, and you need to decrypt every device in a pool individually.

Summary

The most fundamental flaw with btrfs is very simple, though: the rate of improvement. btrfs was blessed by Linus as the ext4 replacement back around 2010 (in what arguably is one of the worst decisions he’s made about kernel direction), and by the end of 2015 it’s still terribly immature. Filesystems routinly self-corrupt (causing more problems that they’re supposed to solve) over time; the tools to resolve problems are still very limited, and my general feeling is that the project is spinning its wheels.

At this point, I’d still struggle to recommend btrfs to anyone who doesn’t mind spending a lot of time faffing with their systems; ReFS/StorageSpaces have overtaken it in every single area. Frankly, if the MD RAID driver had an unRAID-style target, I wouldn’t be bothering with btrfs any more.