Website Backup Strategy, Part III
Web Accessible Files
In Part II Revision Control I mentioned my preference for git over other revision control systems. One reason is that git is guaranteed to give you back what you put into it because it computes sha1 and notices things like disk corruption. Recall, I mentioned in the section on svn that you might want to mirror your svn repository with svnsync. That only addresses corruption that may occur after a commit to the respository, it doesn’t do anything for you if the corruption occurs during he commit. The other nice aspect of git is that it is distributed so that every repository is in a sense a backup if we can match the signature. For these reasons, the next part of our backup strategy will backup up the entire set of files accessible to the web, including those under revision control.
As I mentioned in part I Website Backup Strategies, the snapback technique is ideally suited for website backups because the files are not expected to change very often. The snapback technique is well documented on Mike Rubel’s site. It takes advantage of hard links to reduce space and rsync to reduce bandwidth for files that don’t change often over the backup time horizon. The method depends on having a /usr/bin/cp that can make hard links, like GNU cp -al does.
As I mentioned in part I, there is an excellent implementation called snapback2 from Perusion.
In this post, I’m going to walk through an example where a local machine is used as a backup server to access the remote web server via ssh that hosts the site we want to backup. If you don’t have ssh setup, see my post Securing a new Fedora 6 Slice. The local server needs to have Perl installed and snapback2 depends on Config::ApacheFormat. The snapback2 README covers installation, and the Perusion site also has online documentation. The documentation is also installed as a manpage.
$ cd /usr/local/src $ /usr/bin/wget \ > http://search.cpan.org/CPAN/authors/id/M/MI/MIKEH/Snapback2-0.913.tar.gz $ /bin/tar xvf Snapback2-0.913.tar.gz $ cd Snapback2-0.913 $ /usr/bin/perl Makefile.PL $ /usr/bin/make $ /usr/bin/make test $ /usr/bin/make install
On Fedora, you can set up Perl to get modules from CPAN
$ /usr/bin/yum install perl-libwww-perl $ /usr/bin/perl -MCPAN -e shell > get Bundle::CPAN
Then installing a new module can be as simple as typing
$ /usr/bin/perl -MCPAN -e 'install Config::ApacheFormat'
To configue snapback2, you use an apache-like configuration file, /etc/snapback/snapback.conf
Hourlies 6
Dailies 7
Weeklies 4
Monthlies 12
AutoTime Yes
AdminEmail webmaster@mysite.com
LogFile /home/myuser/var/log/snapback.log
Exclude *debug
Exclude core.*
SnapbackRoot /etc/snapback
RsyncShell '/usr/bin/ssh -i /home/myuser/.ssh/id_dsa -p 2222'
Destination /home/myuser/mnt/backup1
<Backup mysite.com>
Directory /var/www/html/mysite/
</Backup>
Note the RsyncShell directive used to set up an ssh connection with a key and port number.
Next we need to setup a cron job. See this intro if you unfamiliar with cron.
$ /usr/bin/crontab -e
Add the line
10 */4 * * * /usr/bin/snapback2
Note that the configuration file specified Hourlies 6 and the cron is set up to run every 4 hours, covering 24 (6 * 4) hours with six backups every four hours.
Because cron is not run under myuser, the ssh keys should not have a passphrase unless you are using something like keychain. If you use passphrase, be aware that the process will not survive a reboot automatically, you will have to reenter the passphrase after a reboot. Obviously, the keys without a passphrase must be safeguarded.
In the next part, I’ll detail how to set up something very similar to snapback to backup a MySQL database using bash scripts on the local and remote servers.