Automated off-site Linux Backups using Duply and Duplicity

Off-site backups are important, and even though I know this, I rarely implement them in my own servers. Lately, I’ve been setting up rsnapshot to do hourly and daily backups locally(to the same server), and I only do manual backups to remote servers occasionally. I decided to install duply on all of my servers/virtual machines(that I care about) and have them back up to a single backup server. This backup server will also do daily encrypted backups to Amazon S3, effectively giving me 3 redundant layers of backups.

If you haven’t heard of Duplicity or Duply before, Duply is basically a wrapper for Duplicity which makes it easier to manage. Duplicity itself is similar to rsnapshot except, it uses tar to efficiently store differences between backups (instead of hardlinks). Here’s the description from the man page:

Duplicity incrementally backs up files and directory by 
encrypting tar-format volumes with GnuPG and uploading 
them to a remote (or local) file server. Currently local, 
ftp, ssh/scp, rsync, WebDAV, WebDAVs, HSi and Amazon S3 backends 
are available. Because duplicity uses librsync, the incremental 
archives are space efficient and only record the parts of files 
that have changed since the last backup. Currently duplicity 
supports deleted files, full Unix permissions, directories, 
symbolic links, fifos, etc., but not hard links.

I wrote this mainly as a reference for myself when I need to set duply up on another server, but it might be useful for others as well.

CentOS 5 / 6 Instructions

Install the EPEL repo:

#Cent 6:
rpm -Uvh
#Cent 5:
rpm -Uvh

Install duplicity:

yum --enablerepo=epel install duplicity

Install duply:
Get the URL for the latest version here:
download it to your server, extract it, and copy duply to /usr/local/bin/duply then chmod +x /usr/local/bin/duply

tar xvzf duply_1.5.5.1.tgz
cp duply_1.5.5.1/duply /usr/local/bin/duply
chmod +x /usr/local/bin/duply

Note: Bug 675234 is a request to have duply put into the Fedora repo, but there is also a .spec and source rpm if you wish to build an rpm yourself.

Setup a basic Duply profile

mkdir /etc/duply && chmod 700 /etc/duply
duply testvm1 create

Note: If you don’t create /etc/duply, then it will use $HOME/.duply by default.

Take a look at the options in /etc/duply/testvm1/conf and configure it to your liking. There are many different TARGET formats you can use, including ssh, rsync over ssh, ftp, and even amazon’s S3. You can view them all here: URL Formats
This is what mine usually look like(without encryption) using rsync over ssh:

# egrep -v '^#|^$' /etc/duply/testvm1/conf
DUPL_PARAMS="$DUPL_PARAMS --include=/etc \
        --include=/home \
        --include=/root \
        --include=/var/www \
        --include=/var/lib/mysql \
        --include=/var/log \
        --exclude=/** "

I prefer to use multiple –include= options rather than fill /etc/duply/testvm1/exclude with every directory I *don’t* want backed up. Either way will work though. Also, if you’re going to use rsync or ssh/sftp, I’d recommend setting up the backup server so that you can log in with ssh keys and generate a separate key for each server you’re backing up from. You’ll also have to have ssh’d into the backup server as that user at least once to avoid errors about the target host key.

Duply/Duplicity supports using GPG to encrypt volumes before uploading them to the remote server, and the easiest way to enable encryption is by putting this in your conf:

#comment out #GPG_KEY from earlier

This will encrypt the volumes using the passphrase you put in GPG_PW, but you can refer to the documentation for how to set it up to use actual gpg keys.

Including MySQL Backups
If you’re running mysql on the server, you should consider adding something similar to this to /etc/testvm1/pre, which gets run automatically by duply, to dump all databases before backing up the server.

mkdir -pv /root/db_backups
for db in $(mysql -uroot -e 'show databases' -s --skip-column-names | grep -v 'information_schema');
        mysqldump -uroot $db > /root/db_backups/$db.sql;
        sleep 10;

Run your first backup:

duply testvm1 backup

Automate it
If all goes well(no errors), then you should be okay to set up a cronjob to run duply backup.

# crontab -l
30      3       *       *       *       /usr/local/bin/duply testvm1 backup
30      5       *       *       sun       /usr/local/bin/duply testvm1 backup_verify_purge --force

Purge Old Backups
If you use the above crontab, the 2nd line will run once a week, purging old backups from the remote server. If you’re worried about keeping too many backups, you might want to increase how often this runs and also decrease the options in the duply profile configuration.

Verify Backups
Duply makes it easy to see what you’re currently backing up.
To see a list of backups stored on the remote server:

duply testvm1 status

To see a list of files that have changed since the last backup:

duply testvm1 verify

List all files in a backup yesterday(leave out to show latest):

duply testvm1 list 1D

Restore Backups
It’s just as easy to restore complete or partial backups.
Restore the entire latest backup to /tmp/restore:

duply testvm1 restore /tmp/restore

Restore backup from 7 days ago to /tmp/restore:

duply testvm1 restore /tmp/restore 1W

Restore single file or directory to /tmp/restore:

duply testvm1 fetch home/justyns /tmp/restore
#When using 'fetch', make sure you leave off the leading slash.

Restore a file from a month ago:

duply testvm1 fetch home/justyns/plans_for_world_dom.txt /home/justyns/plans_for_world_dom.txt 1M

Tags: , , ,

Leave a Reply