Website Backup Strategies, Part IV
Introduction
For backing up the database we won’t benefit from the snapback technique
because the data will change often enough that a compressed archive will
be more efficient then hard links (see analysis in Website Backup Strategies). Because the script archives and compresses the dump files, rsync will treat them as changed even if the data in the database hasn’t changed.
Even if you don’t archive and compress the dump files, you must still take care to ensure that the dump files are not considered different by rsync due to creation time (use --size-only), and that the dump files are not trivially different due to metadata like timestamps embedded in the file mysqldump (use --skip-comments). You can use also use snapback on /var/lib/mysql, but its not guaranteed to work.
The strategy will be to set up a cron job to run a script on the local backup server that will invoke a script on the remote server. The script on the remote server will execute mysqldump. The reason for invoking a script on the remote sever is that we don’t want to allow databases connections except from localhost.
I’ve chosen to use mysqldump for the flexibility to edit SQL if needed. The downside is that we are only capturing the individual databases, not the user tables or grants. Additionally, I’m going to dump the table schema separately from the data so that I don’t waste space and bandwidth backing up tables like cache.
Remote Script
The remote script called dbbackup.sh (download) has the following usage
Usage: dbbackup.sh [OPTIONS] " Where [OPTIONS] are : <-d database name> is the name of the MySQL database <-u database user> is database user <-p password> is the database user's password destination is the local directory where backups will be stored
The general design of the remote script is as follows:
- process command line arguments
- change directories to the destination
- get a list of tables in the database
- create a list of tables for which data will be backed up
- execute mysqldump –no-data on all tables
- archive and compress the dump
- test the compressed archive
- execute mysqldump –no-create-info on selected tables
- archive and compress the dump
- test the compressed archive
Local Script
The local script named dbsnapshot.sh (download) has the following usage:
Usage: dbsnapshot.sh [OPTIONS] "
Where [OPTIONS] are :
[-t type] one of hourly, daily, weekly, monthly
[-P port] is the ssh port number
<-s server> is the remote server's as identified in known_hosts
<-d database names> database names should be comma separated
or quoted white space
<-u database user> is database user
<-p password> is the database user's password
<-H [n]> where n is the number of hourly backups,default is 6
<-D [n]> where n is the number of daily backups, default is 7
<-W [n]> where n is the number of weekly backups, default is 4
<-M [n]> where n is the number of monthly backups, default is 12"
source is the directory where backups are on the remote server
destination is the local directory where backups will be stored
The general design of the local script is as follows:
- process command line arguments
- set the max number of backups based on type, -t argument
- rotate the existing backups
- create new backup by invoking remote script
- copy the new backup to local
- delete oldest backup that has rotated off the stack
Crontab
The script is meant to be run as a cron job. Your crontab will typically look something like this:
0 */4 * * * dbsnapback.sh [OPTIONS] -t hourly src dst >> dev/null >2&1 59 3 * * * dbsnapback.sh [OPTIONS] -t daily scr dst >> dev/null >2&1 58 3 * * 0 dbsnapback.sh [OPTIONS] -t weekly scr dst >> dev/null >2&1 57 3 1 * * dbsnapback.sh [OPTIONS] -t monthly scr dst >> dev/null >2&1
To Do List
Like any script, these two scripts could use some improvements. I’ll point out the weaknesses that I see as fair warning.
- The remote script name and path is hard-coded in the local script
- The tables for which data is not backed up are Drupal specific. Probably should be a command line option to take a list of tables that are to be excluded.
- The directory that stores the backups should be mounted read only after the script runs.
- The compressed archives should be removed from the remote server after being copied to the backup server
- rsync is not preserving original permissions and ownership, need to look into –link-dest
- the script should use a configuration file because there are too many command line arguments and most are required
That wraps up the first part of the series for website backup strategies. In the next parts we’ll cover how a site is restored and how a site is upgraded.
The download links for the backup scripts aren’t working. They take me back to the home page of your blog.
Comment by Jack Johnson — August 21, 2008 @ 9:29 pm