fluidBlog

September 24, 2007

MySQL Replication

Filed under: Backups, Fedora, MySQL — trekr @ 9:03 am

This post will walk you through setting up replication for a MySQL database which has existing data.

Why Replication?

  • Backups. As part of a backup strategy, replication avoids shutting down the master if backups are made from the slave.
  • Scale-out. Replication can allow the distribution of reads over multiple replication slaves.
  • Reliability. Replication can provide a ready backup in the event of a failure.

This post will only address the use of replication in a backup strategy, but the setup steps are valid for other uses as well.

Assumptions:

  • The master database has existing data.
  • The master and slave have different IP addresses.
  • The master database to be replicated is named exampledb.
  • The database exampledb exists on the master but not on the slave.
  • The database uses MyISAM tables.
  • The master and slave run on Fedora. This should only affect the directory structure.
  • The master and slave use port 3306. This affects firewall setup and Master configuration. To use a different port, set --master-port in the CHANGE MASTER TO command.

Preliminary steps.

Configure the firewall to allow the master and slave to communicate.

You have a firewall, right? Assuming the master server is 2xx.xx.xx.01 and the slave is 2xx.xx.xx.02 corresponding to server_id=1 for the master and server_id=2 for the slave.

On the master add the following to /etc/sysconfig/iptables

# Allow incoming and outgoing traffic on port 3306 for MySQL slave server
-A INPUT -p tcp -s 20x.xx.xx.02 --sport 1024:65535 -d 2xx.xx.xx.01 --dport 3306 -m state --state NEW,ESTABLISHED -j ACCEPT
-A OUTPUT -p tcp -s 2xx.xx.xx.01 --sport 3306 -d 2xx.xx.xx.02 --dport 1024:65535 -m state --state ESTABLISHED -j ACCEPT

On the slave add the following to /etc/sysconfig/iptables

# Allow incoming and outgoing traffic on port 3306 for MySQL slave server
-A INPUT -p tcp -s 2xx.xx.xx.01 --sport 1024:65535 -d 2xx.xx.xx.02 --dport 3306 -m state --state NEW,ESTABLISHED -j ACCEPT
-A OUTPUT -p tcp -s 2xx.xx.xx.02 --sport 3306 -d 2xx.xx.xx.01 --dport 1024:65535 -m state --state ESTABLISHED -j ACCEPT

Restart the firewall

# /sbin/service iptables restart

Create a replication user, repl on the slave server.

# /usr/sbin/useradd -r repl
# /usb/bin/passwd repl

Grant privileges to the replication user on the master server.

mysql> GRANT FILE, REPLICATION SLAVE ON *.*
    -> TO 'repl'@'%' IDENTIFIED BY 'password';

Prepare the Master

In order to prepare the master, it will have to be locked for a short time. While it’s locked we will create a dump file, and edit the configuration in /etc/my.cnf.

You must enable binary logging so we’ll start by configuring the master server binary logging and restarting the master.

Each server in the replication group must have a unique id. The master server is typically assigned an id of 1.

The master server must allow networking, so ensure that --skip-networking and --bind-address are commented out if present.

While your at it, check that the pemissions of the file my.cnf are 0400 or 0600.

Edit the master database’s configuration in the file /etc/my.conf. Add the lines under the [mysqld] section.

[mysqld]
# enable networking and listen on all IP addresses
# by commenting out the following two lines if they exist
#skip-networking
#bind-address            = 127.0.0.1
# The following line will create the log file
# /var/lib/mysql/mysql-bin.00000x
log-bin=mysql-bin
# set expire_logs_days no lower than the number of days the slave is behind
expire_logs_days=3
server-id=1
# substitue 'hostname' with the actual name of your host
relay-log='hostname'-relay-bin
# duplicate the following line for every database that needs to be replicated
# substitute 'exampledb' with actual name of the database to be replicated
binlog-do-db='exampledb'

Restart the master

# /etc/init.d/mysqld --report-host restart

The --report-host option will make checking the replication status easier.

Check the error log at /var/log/mysqld.log after restarting mysqld.

Next we will lock the master database, record the binary log file’s position, and create a database dump that we’ll use to initialize the slave database.

Lock the database

# mysql -p -u root
mysql> use exampledb;
mysql> FLUSH TABLES WITH READ LOCK;

Obtain status of the binary log on the master.

mysql > SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |       98 | exampledb    |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

Record File and Position, you’ll need it to synchronize the slave.

Leave this client running while the dump file is created.

Create a dump backup file

# mysqldump -u root -p exampledb > exampledb.sql

On master, unlock the tables

mysql > UNLOCK TABLES;

Setting up the slave

Edit the slave’s configuration in the file /etc/my.cnf

[mysqld]
log-bin=mysql-bin
master-host=master-hostname
master-user=slave-user
master-password=slave-password
server-id=2
# duplicate the following line for every database that needs to be replicated
# substitute 'exampledb' with actual name of the database to be replicated
replicate-do-db='exampledb'
report-host='slave-hostname'

Since the master.info file overides settings in my.cnf, you may prefer to set up the slave using the CHANGE MASTER TO command. As a minimum, you need to use the CHANGE MASTER TO command to set the values of MASTER_LOG_FILE and MASTER_LOG_POS

Change the option values to the actual values for your servers

mysql> CHANGE MASTER TO
    ->     MASTER_HOST='master_host_name',
    ->     MASTER_USER='replication_user_name',
    ->     MASTER_PASSWORD='replication_password',
    ->     MASTER_LOG_FILE='recorded_log_file_name',
    ->     MASTER_LOG_POS='recorded_log_position';

Start the slave

$ /etc/init.d/mysqld --skip-slave start

The --skip-slave option tells the server not to start the slave threads. We’re using this option because we aren’t quite ready to replicate.

Create the database

# mysql -u root -p
mysql> create database exampledb;
mysql> grant all on exampledb.* to 'repl'@'localhost';

Import the dump file

# mysql -u root -p exampledb < full_dump_file.sql

Start the slave threads

$ mysql> START SLAVE;

Checking the status of the master and slave

On slave

mysql> SHOW SLAVE STATUS\G;

*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: fluidrails.com
                Master_User: repl
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000004
        Read_Master_Log_Pos: 98
             Relay_Log_File: hakota-relay-bin.000073
              Relay_Log_Pos: 235
      Relay_Master_Log_File: mysql-bin.000004
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB: garden
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 98
            Relay_Log_Space: 235
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 0
1 row in set (0.00 sec)

ERROR:
No query specified

On master

mysql> SHOW MASTER STATUS\G;
*************************** 1. row ***************************
File: mysql-bin.000004
Position: 98
Binlog_Do_DB: garden
Binlog_Ignore_DB:
1 row in set (0.00 sec)

ERROR:
No query specified
mysql> SHOW PROCESSLIST\G;
+------+------+-----------------------+------+-------------+------+----------------------------------------------------------------+------------------+
| Id   | User | Host                  | db   | Command     | Time | State                                                    | Info             |
+------+------+-----------------------+------+-------------+------+----------------------------------------------------------------+------------------+
| 6235 | repl | www.hakota.com:43033  | NULL | Binlog Dump | 3250 | Has sent all binlog to slave; waiting for binlog to be updated | NULL             |
+------+------+-----------------------+------+-------------+------+----------------------------------------------------------------+------------------+
mysql> SHOW SLAVE HOSTS;
+-----------+------------+------+-------------------+-----------+
| Server_id | Host       | Port | Rpl_recovery_rank | Master_id |
+-----------+------------+------+-------------------+-----------+
|         2 | hakota.com | 3306 |                 0 |         1 |
+-----------+------------+------+-------------------+-----------+
1 row in set (0.00 sec)

Logs and Backups

Now that we’ve enabled binary logging we need to implement a process to manage the logs. One approach is to use SHOW SLAVE STATUS (on slave) and SHOW BINARY LOGS (on master) to determine which logs are in use. Suppose the slave is using Master_Log_File: master-bin.000004 Make sure your backups contain the logs about to be deleted, those prior to the target log master-bin.000004, then

mysql> PURGE MASTER LOGS TO 'master-bin.000004';

will delete all logs prior to master-bin.000004

In addition, now that we’ve implemented replication, our backup strategy needs to include the master.info and relay-log.info status files as well as the binary logs.

Before executing mysqldump on the slave, replication should be temporarily stopped.

# mysqladmin -u root -p  stop-slave
# mysqldump -u root -p exampledb > exampledb.sql
# mysqladmin -u root -p start-slave

References

MySQL 5.0 Reference Manual, Chapter 15 Replication

August 27, 2007

Website Backup Strategies, Part IV

Filed under: Backups, Deployment, Drupal, Fedora, Linux — trekr @ 10:50 am

Introduction

For backing up the database we won’t benefit from the snapback technique because the data will change often enough that a compressed archive will be more efficient then hard links (see analysis in Website Backup Strategies). Because the script archives and compresses the dump files, rsync will treat them as changed even if the data in the database hasn’t changed. Even if you don’t archive and compress the dump files, you must still take care to ensure that the dump files are not considered different by rsync due to creation time (use --size-only), and that the dump files are not trivially different due to metadata like timestamps embedded in the file mysqldump (use --skip-comments). You can use also use snapback on /var/lib/mysql, but its not guaranteed to work.

The strategy will be to set up a cron job to run a script on the local backup server that will invoke a script on the remote server. The script on the remote server will execute mysqldump. The reason for invoking a script on the remote sever is that we don’t want to allow databases connections except from localhost.

I’ve chosen to use mysqldump for the flexibility to edit SQL if needed. The downside is that we are only capturing the individual databases, not the user tables or grants. Additionally, I’m going to dump the table schema separately from the data so that I don’t waste space and bandwidth backing up tables like cache.

Remote Script

The remote script called dbbackup.sh (download) has the following usage

Usage: dbbackup.sh [OPTIONS] "

Where [OPTIONS] are :

<-d database name> is the name of the MySQL database
<-u database user> is database user
<-p password> is the database user's password 

destination is the local directory where backups will be stored

The general design of the remote script is as follows:

  • process command line arguments
  • change directories to the destination
  • get a list of tables in the database
  • create a list of tables for which data will be backed up
  • execute mysqldump –no-data on all tables
  • archive and compress the dump
  • test the compressed archive
  • execute mysqldump –no-create-info on selected tables
  • archive and compress the dump
  • test the compressed archive

Local Script

The local script named dbsnapshot.sh (download) has the following usage:

Usage: dbsnapshot.sh [OPTIONS] "

Where [OPTIONS] are :

[-t type] one of hourly, daily, weekly, monthly
[-P port] is the ssh port number
<-s server> is the remote server's as identified in known_hosts
<-d database names> database names should be comma separated
                                     or quoted white space
<-u database user> is database user
<-p password> is the database user's password
<-H [n]> where n is the number of hourly backups,default is 6
<-D [n]> where n is the number of daily backups, default is 7
<-W [n]> where n is the number of weekly backups, default is 4
<-M [n]> where n is the number of monthly backups, default is 12"

source is the directory where backups are on the remote server
destination is the local directory where backups will be stored

The general design of the local script is as follows:

  • process command line arguments
  • set the max number of backups based on type, -t argument
  • rotate the existing backups
  • create new backup by invoking remote script
  • copy the new backup to local
  • delete oldest backup that has rotated off the stack

Crontab

The script is meant to be run as a cron job. Your crontab will typically look something like this:

0 */4 * * * dbsnapback.sh [OPTIONS] -t hourly src dst >>  dev/null >2&1
59 3 * * * dbsnapback.sh [OPTIONS] -t daily scr dst  >> dev/null >2&1
58 3 * * 0 dbsnapback.sh [OPTIONS] -t weekly scr dst  >> dev/null >2&1
57 3 1 * * dbsnapback.sh [OPTIONS] -t monthly scr dst  >> dev/null >2&1

To Do List

Like any script, these two scripts could use some improvements. I’ll point out the weaknesses that I see as fair warning.

  • The remote script name and path is hard-coded in the local script
  • The tables for which data is not backed up are Drupal specific. Probably should be a command line option to take a list of tables that are to be excluded.
  • The directory that stores the backups should be mounted read only after the script runs.
  • The compressed archives should be removed from the remote server after being copied to the backup server
  • rsync is not preserving original permissions and ownership, need to look into –link-dest
  • the script should use a configuration file because there are too many command line arguments and most are required

That wraps up the first part of the series for website backup strategies. In the next parts we’ll cover how a site is restored and how a site is upgraded.

Part I Part II Part III

August 24, 2007

Website Backup Strategy, Part III

Filed under: Backups, Deployment, Drupal, Fedora, Linux — trekr @ 7:08 am

Web Accessible Files

In Part II Revision Control I mentioned my preference for git over other revision control systems. One reason is that git is guaranteed to give you back what you put into it because it computes sha1 and notices things like disk corruption. Recall, I mentioned in the section on svn that you might want to mirror your svn repository with svnsync. That only addresses corruption that may occur after a commit to the respository, it doesn’t do anything for you if the corruption occurs during he commit. The other nice aspect of git is that it is distributed so that every repository is in a sense a backup if we can match the signature. For these reasons, the next part of our backup strategy will backup up the entire set of files accessible to the web, including those under revision control.

As I mentioned in part I Website Backup Strategies, the snapback technique is ideally suited for website backups because the files are not expected to change very often. The snapback technique is well documented on Mike Rubel’s site. It takes advantage of hard links to reduce space and rsync to reduce bandwidth for files that don’t change often over the backup time horizon. The method depends on having a /usr/bin/cp that can make hard links, like GNU cp -al does.

As I mentioned in part I, there is an excellent implementation called snapback2 from Perusion.

In this post, I’m going to walk through an example where a local machine is used as a backup server to access the remote web server via ssh that hosts the site we want to backup. If you don’t have ssh setup, see my post Securing a new Fedora 6 Slice. The local server needs to have Perl installed and snapback2 depends on Config::ApacheFormat. The snapback2 README covers installation, and the Perusion site also has online documentation. The documentation is also installed as a manpage.

$ cd /usr/local/src

$ /usr/bin/wget \
> http://search.cpan.org/CPAN/authors/id/M/MI/MIKEH/Snapback2-0.913.tar.gz

$ /bin/tar xvf Snapback2-0.913.tar.gz

$ cd Snapback2-0.913

$ /usr/bin/perl Makefile.PL

$ /usr/bin/make

$ /usr/bin/make test

$ /usr/bin/make install

On Fedora, you can set up Perl to get modules from CPAN

$ /usr/bin/yum install perl-libwww-perl
$ /usr/bin/perl -MCPAN -e shell
> get Bundle::CPAN

Then installing a new module can be as simple as typing

$ /usr/bin/perl -MCPAN -e 'install Config::ApacheFormat'

To configue snapback2, you use an apache-like configuration file, /etc/snapback/snapback.conf

    Hourlies    6
    Dailies     7
    Weeklies    4
    Monthlies  12
    AutoTime   Yes

    AdminEmail webmaster@mysite.com
    LogFile    /home/myuser/var/log/snapback.log
    Exclude *debug
    Exclude core.*
    SnapbackRoot /etc/snapback
    RsyncShell '/usr/bin/ssh -i /home/myuser/.ssh/id_dsa -p 2222'

    Destination /home/myuser/mnt/backup1

    <Backup mysite.com>
      Directory /var/www/html/mysite/
    </Backup>

Note the RsyncShell directive used to set up an ssh connection with a key and port number.

Next we need to setup a cron job. See this intro if you unfamiliar with cron.

    $ /usr/bin/crontab -e

Add the line

10 */4 * * * /usr/bin/snapback2

Note that the configuration file specified Hourlies 6 and the cron is set up to run every 4 hours, covering 24 (6 * 4) hours with six backups every four hours.

Because cron is not run under myuser, the ssh keys should not have a passphrase unless you are using something like keychain. If you use passphrase, be aware that the process will not survive a reboot automatically, you will have to reenter the passphrase after a reboot. Obviously, the keys without a passphrase must be safeguarded.

In the next part, I’ll detail how to set up something very similar to snapback to backup a MySQL database using bash scripts on the local and remote servers.

Part I Part II Part IV

August 22, 2007

Website Backup Strategies, Part II

Filed under: Backups, Deployment, Drupal, Fedora, Linux — trekr @ 2:33 pm

The second part of this series will take a look at revision control strategies for the core Drupal source.

CVS

Drupal is maintained in cvs. So cvs seems like a natural option. Pro Drupal Development covers this quite well in Chapter 21 “Development Best Practices”. There is a very short paragraph in the chapter about mixing SVN and CVS to deal with custom source.

Assuming cvs is installed, using CVS is as easy as

$ cd /var/www/html
$ /usr/bin/cvs \
> -d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal checkout \
> -r DRUPAL-5 mysite

Which will create /var/www/html/drupal

And updates are as simple as

$ cd /var/www/html/drupal
$ /usr/bin/cvs update -dP -r DRUPAL-5

But if you customize anything within the core source, then you’ll start to run into some difficulty. For starters, you will need to store the custom code in a different repository. Remember the short paragraph in Pro Drupal Development about mixing CVS and SVN? If custom source is not stored in another repository, the custom changes will not survive the update.

I’m pretty sure that I’ll be customizing something, headers, logos, maybe some javascript to get transparent images to work properly in IE. So, for my purposes, I’m going to look for another approach.

SVN

Using a svn repository solves the problem of keeping core source and custom source in a single repository at the cost of a little additional work. In the case of Drupal, the source is not in SVN so we cannot simply checkout the source from the development team repository. If you are new to svn, a good resource is Mike Mason’s excellent book Pragmatic Version Control: Using Subversion(2nd Edition). Another good resource is Version Control with Subversion

The following workflow assumes everything is being done on a local machine, and the user has appropriate permissions.

Create a repository

$ mkdir /var/svn/repos
$ svnadmin create /var/svn/repos

Get the source

$ cd /usr/local/src
$ wget http://ftp.drupal.org/files/projects/drupal-5.1.tar.gz

Uncompress the archive

$ tar xvf drupal-5.1.tar.gz

Import the source into the repository

$ svn import --no-auto-props -m"initial import of drupal 5.1" \
> drupal-5.1 file:///var/svn/repos/vendorsrc/drupal/current

Create a tag

$ svn copy -m"Tag 5.1 vendor drop" \
> file:///var/svn/repos/vendorsrc/drupal/current \
> file:///var/svn/repos/vendorsrc/drupal/5.1

Create project

$ svn mkdir -m"Create project myproject" file:///var/svn/repos/myproject

Copy into main development branch

$ svn copy file:///var/svn/repos/vendorsrc/drupal/5.1 \
> file:///var/svn/repos/myproject/trunk

Check out the project to create a working directory

$ svn co file:///var/svn/svnrepos/myproject/trunk \
> /var/www/html/myproject

Ignore files/ directory

$ svn mkdir files files/images files/images/temp files/css files/color

$ svn propset svn:ignore "*" files/ files/images \
> files/images/temp files/css files/color

$ svn commit -m "ignore files/ and subdirectories content from now"

If you don’t set ignore on the files/ directory and its subdirectories then svn status will get pretty busy. You need to set the ignore property on files/ and each subdiretory in files/, for example, files/css, files/images, files/images/temp, etc.

Any custom changes we make can go into the same project respository.

When it’s time to upgrade Drupal to a new version, the basic update workflow is as follows:

  • checkout the current version of the core source in a working directory

  • make the working directory look like a pristine copy of the new version by copying, and svn adding or deleting as required. To make your life easier, use svn.load.dirs.pl

  • commit the new version and add a tag

  • merge the old and new versions in your project’s working directory make sure you use the tag because the branch moves, the tag is fixed

  • resolve any conflicts and commit the changes to your project

More details can be found at Vendor Branches

Finally, you may want to consider using svnsync to mirror your repository.

GIT

In part one, I mentioned the excellent article on Verion Control Blog. This is my preferred process for source control of Drupal powered sites. The article is a very comprehensive how-to, so I don’t need to duplicate it here.

The basic workflow involves creating lines of development for

  • drupal - contains the core source distribution
  • drupal-and-modules is a clone of drupal plus contributed modules and themes
  • drupal-production is a clone of drupal-and-modules with project customizations

The lines of development are chained together with cloning and changes propogate from drupal/ through drupal-production/.

I will point out one area of concern with the approach as detailed in the article. Exploding the latest version of Drupal in the drupal/ line of development does not deal with files or directories that may have been deleted in the latest version.

For more info on git, you should start with the documentation.

So there you have it, three approaches to keeping the Drupal source under revision control. I’ve stated my preference but everyone’s situation is different and you can choose.

In the next part, I’ll detail how to backup the files that are not under revision control, for example, the files/ directory. The final segment will address backing up the MySQL database.

Part I Part III Part IV

August 20, 2007

Website Backup Strategy

Filed under: Backups, Deployment, Drupal, Fedora, Linux — trekr @ 1:46 pm

Introduction

In previous posts, I’ve covered installation of Rails and Drupal, and also some rudimentary security. But running a successful site requires a strategy for upgrades and backups. In this post I’m going to introduce some initial concepts for a web site backup strategy that will also touch on upgrades. I’ll be using a Drupal CMS powered site as an example. The principles should apply for other LAMP based web sites as well. In subsequent posts, I’ll be getting into the how-to, to implement the strategy. I’ll start by stating the desired goals for the backup strategy.

Goals

  • Simple; Use existing tools
  • Automatic; Set up as a cron job
  • Secure; Don’t introduce new security risks
  • Efficient; Minimize space, bandwidth, and restoration time

I find that it’s helpful to categorize the types of files that make up a typical Drupal web site.

  • Drupal source files
  • Contributed modules and themes
  • Custom versions of the above
  • Files uploaded to the site
  • The database

Source, Contributed source, and custom source

Drupal source files, contributed modules and themes, and custom versions of the latter will not change often and it makes sense to maintain them in a revision control repository.

You have lots of choices for a revision control system. I’ve decided to use git based on an excellent article on Version Control Blog. I like git because the separation of core Drupal from the contributed modules and the custom code seems more intuitive then branching in other systems.

Uploaded Files

Files uploaded to the site, also will not often change once uploaded. However, they may be deleted and there will always be new files to deal with. Uploaded files will be added and deleted by users using the CMS, so they will be outside of revision control system. Because of the administrative burden of adding and deleting uploaded content to revision control, it doesn’t make sense to try to keep them in a repository. If you have a novel solution that solves this problem I’d love to hear from you.

In Drupal, uploaded files are stored in a directory /files making it easy to backup just that directory. The /files directory is an excellent candidate for a novel rotataing backup system sometimes known as snapback. Snapback is based on the work of Mike Rubel and others. The basic idea is that for files that haven’t changed over the backup horizon, hard links instead of copies are maintained. The hardl links significantly reduce the space and bandwidth required for the backups.

Luckily, there is an excellent Perl implementation of the snapaback strategy, called snapback2 from Perusian

Database

The database consists of structure and data. The structure of the database may not change very frequently however the data in several tables probably will. So it makes sense to backup up the database structure and data separately. By doing so, we can also avoid backing up data in cache, sessions, and watchdog tables. Like uploaded files, the database will be backed up periodically in a rotating system.

MySQL is a popular choice for LAMP based systems and Drupal, so we’ll assume a MySQL database and use the mysqldump program to create the backups.

Unfortunately the database backups will probably not benefit from the “snapback” technique. To see why, lets do some back-of-the-envelope calculations.

If g is the tar gzip compression factor, and p is the percentage of the change in the backup files, s is the total size of the files, k is the number of backups, then for snapback to use less space than compressed archives, the following must hold

sgk > spk + s(1-p)

Where s(1-p) is the size of the unchanged files of which there is only one copy due to hard links

Simplifying by dividing by s,

gk > pk + (1-p)

If we ignore (1-p) which is small when k is large, then it is clear that for the method to have benefit, the percent of changed files must be less then the compression factor.

g > p

In my testing, I’ve been seeing compression factors between .15 and .20 for mysqldump files. Its hard to imagine a database that changes less than 15-20%

On a directory where the files don’t change much, and g > p holds, then the break even number of backups, k is given by

k = ceil((1-p)/(g-p))

For example, assume the percent of change, p, is 5% and the compression factor, g, is 15%, then the break even number of backup copies, k, is 10. A typical backup scheme has 6 hourly, 7 daily, 4 weekly, and 12 monthly = 29 copies, call it thirty, so in this example there is about a 3X savings over compressed archives. This is why I like the snapback technique for the uploaded files.

Therefore, to implement the database backup, we’ll being using some simple bash scripts on the local backup server and the remote web server that execute mysqldump, archive and compress the output, and move it to the remote server securely.

Well, that wraps up the outline of our strategy. In my next series of posts, I’ll cover in detail what it takes to implement each part of our three-tiered backup strategy.

Part II Part III Part IV

Next Page »

Hakota Design LLC