Automated off-site Linux Backups using Duply and Duplicity
Off-site backups are important, and even though I know this, I rarely implement them in my own servers. Lately, I’ve been setting up rsnapshot to do hourly and daily backups locally(to the same server), and I only do manual backups to remote servers occasionally. I decided to install duply on all of my servers/virtual machines(that I care about) and have them back up to a single backup server. This backup server will also do daily encrypted backups to Amazon S3, effectively giving me 3 redundant layers of backups.
If you haven’t heard of Duplicity or Duply before, Duply is basically a wrapper for Duplicity which makes it easier to manage. Duplicity itself is similar to rsnapshot except, it uses tar to efficiently store differences between backups (instead of hardlinks). Here’s the description from the man page:
Duplicity incrementally backs up files and directory by encrypting tar-format volumes with GnuPG and uploading them to a remote (or local) file server. Currently local, ftp, ssh/scp, rsync, WebDAV, WebDAVs, HSi and Amazon S3 backends are available. Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. Currently duplicity supports deleted files, full Unix permissions, directories, symbolic links, fifos, etc., but not hard links.
I wrote this mainly as a reference for myself when I need to set duply up on another server, but it might be useful for others as well.
CentOS 5 / 6 Instructions
Install the EPEL repo:
#Cent 6: rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-5.noarch.rpm #Cent 5: rpm -Uvh http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
yum --enablerepo=epel install duplicity
Get the URL for the latest version here: http://duply.net/?title=Duply-downloads
download it to your server, extract it, and copy duply to /usr/local/bin/duply then chmod +x /usr/local/bin/duply
wget http://dev.justynshull.com/duply_220.127.116.11.tgz tar xvzf duply_18.104.22.168.tgz cp duply_22.214.171.124/duply /usr/local/bin/duply chmod +x /usr/local/bin/duply
Note: Bug 675234 is a request to have duply put into the Fedora repo, but there is also a .spec and source rpm if you wish to build an rpm yourself.
Setup a basic Duply profile
mkdir /etc/duply && chmod 700 /etc/duply duply testvm1 create
Note: If you don’t create /etc/duply, then it will use $HOME/.duply by default.
Take a look at the options in /etc/duply/testvm1/conf and configure it to your liking. There are many different TARGET formats you can use, including ssh, rsync over ssh, ftp, and even amazon’s S3. You can view them all here: URL Formats
This is what mine usually look like(without encryption) using rsync over ssh:
# egrep -v '^#|^$' /etc/duply/testvm1/conf GPG_KEY='disabled' TARGET='rsync://backups.justynshull.com//home/testvm1/backups' TARGET_USER='testvm1' SOURCE='/' MAX_AGE=3M MAX_FULL_BACKUPS=3 MAX_FULLBKP_AGE=30D VOLSIZE=3500 DUPL_PARAMS="$DUPL_PARAMS --full-if-older-than $MAX_FULLBKP_AGE " DUPL_PARAMS="$DUPL_PARAMS --volsize $VOLSIZE " DUPL_PARAMS="$DUPL_PARAMS --include=/etc \ --include=/home \ --include=/root \ --include=/var/www \ --include=/var/lib/mysql \ --include=/var/log \ --exclude=/** "
I prefer to use multiple –include= options rather than fill /etc/duply/testvm1/exclude with every directory I *don’t* want backed up. Either way will work though. Also, if you’re going to use rsync or ssh/sftp, I’d recommend setting up the backup server so that you can log in with ssh keys and generate a separate key for each server you’re backing up from. You’ll also have to have ssh’d into the backup server as that user at least once to avoid errors about the target host key.
Duply/Duplicity supports using GPG to encrypt volumes before uploading them to the remote server, and the easiest way to enable encryption is by putting this in your conf:
#comment out #GPG_KEY from earlier #GPG_KEY='disabled' GPG_PW='secret_password'
This will encrypt the volumes using the passphrase you put in GPG_PW, but you can refer to the documentation for how to set it up to use actual gpg keys.
Including MySQL Backups
If you’re running mysql on the server, you should consider adding something similar to this to /etc/testvm1/pre, which gets run automatically by duply, to dump all databases before backing up the server.
#!/bin/sh mkdir -pv /root/db_backups for db in $(mysql -uroot -e 'show databases' -s --skip-column-names | grep -v 'information_schema'); do mysqldump -uroot $db > /root/db_backups/$db.sql; sleep 10; done
Run your first backup:
duply testvm1 backup
If all goes well(no errors), then you should be okay to set up a cronjob to run duply backup.
# crontab -l 30 3 * * * /usr/local/bin/duply testvm1 backup 30 5 * * sun /usr/local/bin/duply testvm1 backup_verify_purge --force
Purge Old Backups
If you use the above crontab, the 2nd line will run once a week, purging old backups from the remote server. If you’re worried about keeping too many backups, you might want to increase how often this runs and also decrease the options in the duply profile configuration.
Duply makes it easy to see what you’re currently backing up.
To see a list of backups stored on the remote server:
duply testvm1 status
To see a list of files that have changed since the last backup:
duply testvm1 verify
List all files in a backup yesterday(leave out to show latest):
duply testvm1 list 1D
It’s just as easy to restore complete or partial backups.
Restore the entire latest backup to /tmp/restore:
duply testvm1 restore /tmp/restore
Restore backup from 7 days ago to /tmp/restore:
duply testvm1 restore /tmp/restore 1W
Restore single file or directory to /tmp/restore:
duply testvm1 fetch home/justyns /tmp/restore #When using 'fetch', make sure you leave off the leading slash.
Restore a file from a month ago:
duply testvm1 fetch home/justyns/plans_for_world_dom.txt /home/justyns/plans_for_world_dom.txt 1M