Geek Blight - Using borg for backups

Using borg for backups

Posted on 2020-04-19T13:18Z. Updated on 2020-07-25T19:13Z.

In the last few days I’ve replaced almost all my custom backup scripts with a single one based on Borg. Borg is based on Attic and has been around for a few years. If you didn’t know about it, I highly recommend you look into it.

How to make proper backups

When you read a bit about how to properly create data backups you quickly learn a few things. The first one is that a mirror is not a proper backup. If you have a replica of your data somewhere else and update it frequently so both data sets are exactly the same, you’re not protecting yourself from inadvertently deleting or corrupting what you have. Obviously if you remove something by accident and realize what you’ve done in that very moment, you may be able to get your data back from the mirror. But if you don’t realize about it until a few days later, when you may have already synchronized your mirror, you’re out of luck. So lesson number one is that backups need to preserve history to some degree, like a version control system does.

The second lesson is that you’ll want to keep backups in one, two or more locations, but you probably want at least one of those locations to be off-site. That is, if you backup your home directory to a second computer in your house, even if you monitor the health status of your primary and backup systems, you’re risking to lose all your data if, say, a fire breaks out in the house.

Given those lessons, one common scheme for creating backups without taking up too much disk space is to use a system that loses granularity over time. For example, you could create a backup archive (say, a tarball) once a week (or daily, if you prefer, but I will use a weekly scheme for this example) and you could preserve the last 4 or 5 archives to cover the whole past month. In addition, for the previous 11 months you preserve one archive for each one and, in addition, for the previous 9 years you preserve at least an archive for each year. With this scheme, using only 24 or 25 archives you can recover up to 10 years of data history if you need to do so, with recent archives providing finer-grained information deltas.

As a matter of fact, before moving to borg I had a relatively simple set of scripts to follow a scheme similar to the one described above.

How to use borg

Borg basically simplifies all of the above so it only takes a few lines in a script to achieve those goals. Borg itself is based on two concepts: repositories and archives. Imagine your backups are simple tarballs which are stored in a known directory somewhere, probably using the backup date as part of the tarball name. With borg, it’s just the same thing with somewhat different names: the directory is called repository and tarballs are called archives (tar comes from tape archive, so it’s not even that different).

The trick here is that borg gives you a couple additional things for free: inside a repository, archives are encrypted. There are a few different possibilities to manage encryption in a repository, and the most common one is probably using a passphrase chosen at repository creation time. Second, data inside a repository is compressed and deduplicated. Deduplication means your data is split into chunks and borg tries to avoid storing the same chunk twice. When you create a new archive a given week, chances are it can be stored using a fraction of the total space the files take on disk because only new and modified chunks need to be stored. In my case, I’m storing 265GB worth of data using just around 20GB on disk thanks to compression and deduplication.

Once you have created a repository, you can add archives to it. Creating a new archive is like creating a tarball: you need to tell borg what directories to store and a name for the archive, which can easily include the current date. Much like rsync and other similar tools, borg has options to indicate exclusion patterns to avoid storing everything under a given directory. Once your new archive is created, you’re almost done.

The final and optional step is to prune archives you no longer need from the repository. Remember: you probably want to have a backup scheme with varying granularity like I described before. You can just tell borg the desired granularity and it will find out if there’s any archive no longer worth including, removing it from the repo.

Other important details

borg can work both with local repositories and remote repositories if they are accessible using SSH, in a similar way to rsync. If the remote machine also has borg installed, borg can leverage its remote counterpart to operate more efficiently. Encryption is done on the machine creating the archives, so the remote server never sees unencrypted data and an attacker accessing the remote server disk cannot see files stored there.

There are several ways to access borg archives to recover data. My personal favorite is using borg’s mount command. It will mount, under a given directory, either a repository or a particular archive, which you can then browse as any other local file system. This is similar to using sshfs.

Also, shout out to the guys at rsync.net, who offer Unix-friendly disks in the cloud accessible using SSH. They have a special price for borg users with somewhat competitive pricing. It’s not as cheap as the likes of Dropbox, Google Drive or OneDrive, but neither of those have such a flexible price scheme, where you can pay exactly for the storage space you want, and neither of those let you access the disk using standard Unix tools.

An example from my case

As an example, I’m pasting below my custom borg script. I use it to create backups of my home directory. In it, REPO_PATH has been edited out but it contains an SSH-like URL with the remote location of my borg repository. Feel free to use it as a reference. Note after creating a new archive and pruning old archives I go one step ahead and verify the status of the borg repository as an additional measure, even if I monitor the health of all my local drives using smartd.

#!/usr/bin/env bash
read -s -p "borg passphrase: " BORG_PASSPHRASE
export BORG_PASSPHRASE
printf '\nCreating backup...\n'

REPO_PATH=<repo location edited out>

borg create                                         \
    --stats                                         \
    --show-rc                                       \
    --compression zstd                              \
    --exclude-caches                                \
    --exclude "$HOME"/Downloads                     \
    --exclude "$HOME"/.gvfs                         \
    --exclude "$HOME"/.mozilla                      \
    --exclude "$HOME"/.thunderbird                  \
    --exclude "$HOME"/.thumbnails                   \
    --exclude "$HOME"/.cache                        \
    --exclude "$HOME"/.local/share/Trash            \
    --exclude "$HOME"/mnt                           \
    "$REPO_PATH::{utcnow}"                          \
    "$HOME"

CREATE_RC=$?

borg prune                                          \
    --list                                          \
    --show-rc                                       \
    --keep-weekly 4                                 \
    --keep-monthly 12                               \
    --keep-yearly 10                                \
    "$REPO_PATH"

PRUNE_RC=$?

borg check                                          \
    -p                                              \
    --show-rc                                       \
    "$REPO_PATH"

CHECK_RC=$?

GLOBAL_RC="$( printf '%s\n%s\n%s\n' $CREATE_RC $PRUNE_RC $CHECK_RC | sort -n | tail -1 )"
exit "$GLOBAL_RC"

To import my backup history into borg I used a custom script based on bind mounts and the --timestamp option for borg’s create command. Borg’s documentation is pretty detailed, so do not hesitate to look for answers there.

Load comments