fluidBlog

September 24, 2007

MySQL Replication

Filed under: Backups, Fedora, MySQL — trekr @ 9:03 am

This post will walk you through setting up replication for a MySQL database which has existing data.

Why Replication?

  • Backups. As part of a backup strategy, replication avoids shutting down the master if backups are made from the slave.
  • Scale-out. Replication can allow the distribution of reads over multiple replication slaves.
  • Reliability. Replication can provide a ready backup in the event of a failure.

This post will only address the use of replication in a backup strategy, but the setup steps are valid for other uses as well.

Assumptions:

  • The master database has existing data.
  • The master and slave have different IP addresses.
  • The master database to be replicated is named exampledb.
  • The database exampledb exists on the master but not on the slave.
  • The database uses MyISAM tables.
  • The master and slave run on Fedora. This should only affect the directory structure.
  • The master and slave use port 3306. This affects firewall setup and Master configuration. To use a different port, set --master-port in the CHANGE MASTER TO command.

Preliminary steps.

Configure the firewall to allow the master and slave to communicate.

You have a firewall, right? Assuming the master server is 2xx.xx.xx.01 and the slave is 2xx.xx.xx.02 corresponding to server_id=1 for the master and server_id=2 for the slave.

On the master add the following to /etc/sysconfig/iptables

# Allow incoming and outgoing traffic on port 3306 for MySQL slave server
-A INPUT -p tcp -s 20x.xx.xx.02 --sport 1024:65535 -d 2xx.xx.xx.01 --dport 3306 -m state --state NEW,ESTABLISHED -j ACCEPT
-A OUTPUT -p tcp -s 2xx.xx.xx.01 --sport 3306 -d 2xx.xx.xx.02 --dport 1024:65535 -m state --state ESTABLISHED -j ACCEPT

On the slave add the following to /etc/sysconfig/iptables

# Allow incoming and outgoing traffic on port 3306 for MySQL slave server
-A INPUT -p tcp -s 2xx.xx.xx.01 --sport 1024:65535 -d 2xx.xx.xx.02 --dport 3306 -m state --state NEW,ESTABLISHED -j ACCEPT
-A OUTPUT -p tcp -s 2xx.xx.xx.02 --sport 3306 -d 2xx.xx.xx.01 --dport 1024:65535 -m state --state ESTABLISHED -j ACCEPT

Restart the firewall

# /sbin/service iptables restart

Create a replication user, repl on the slave server.

# /usr/sbin/useradd -r repl
# /usb/bin/passwd repl

Grant privileges to the replication user on the master server.

mysql> GRANT FILE, REPLICATION SLAVE ON *.*
    -> TO 'repl'@'%' IDENTIFIED BY 'password';

Prepare the Master

In order to prepare the master, it will have to be locked for a short time. While it’s locked we will create a dump file, and edit the configuration in /etc/my.cnf.

You must enable binary logging so we’ll start by configuring the master server binary logging and restarting the master.

Each server in the replication group must have a unique id. The master server is typically assigned an id of 1.

The master server must allow networking, so ensure that --skip-networking and --bind-address are commented out if present.

While your at it, check that the pemissions of the file my.cnf are 0400 or 0600.

Edit the master database’s configuration in the file /etc/my.conf. Add the lines under the [mysqld] section.

[mysqld]
# enable networking and listen on all IP addresses
# by commenting out the following two lines if they exist
#skip-networking
#bind-address            = 127.0.0.1
# The following line will create the log file
# /var/lib/mysql/mysql-bin.00000x
log-bin=mysql-bin
# set expire_logs_days no lower than the number of days the slave is behind
expire_logs_days=3
server-id=1
# substitue 'hostname' with the actual name of your host
relay-log='hostname'-relay-bin
# duplicate the following line for every database that needs to be replicated
# substitute 'exampledb' with actual name of the database to be replicated
binlog-do-db='exampledb'

Restart the master

# /etc/init.d/mysqld --report-host restart

The --report-host option will make checking the replication status easier.

Check the error log at /var/log/mysqld.log after restarting mysqld.

Next we will lock the master database, record the binary log file’s position, and create a database dump that we’ll use to initialize the slave database.

Lock the database

# mysql -p -u root
mysql> use exampledb;
mysql> FLUSH TABLES WITH READ LOCK;

Obtain status of the binary log on the master.

mysql > SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |       98 | exampledb    |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

Record File and Position, you’ll need it to synchronize the slave.

Leave this client running while the dump file is created.

Create a dump backup file

# mysqldump -u root -p exampledb > exampledb.sql

On master, unlock the tables

mysql > UNLOCK TABLES;

Setting up the slave

Edit the slave’s configuration in the file /etc/my.cnf

[mysqld]
log-bin=mysql-bin
master-host=master-hostname
master-user=slave-user
master-password=slave-password
server-id=2
# duplicate the following line for every database that needs to be replicated
# substitute 'exampledb' with actual name of the database to be replicated
replicate-do-db='exampledb'
report-host='slave-hostname'

Since the master.info file overides settings in my.cnf, you may prefer to set up the slave using the CHANGE MASTER TO command. As a minimum, you need to use the CHANGE MASTER TO command to set the values of MASTER_LOG_FILE and MASTER_LOG_POS

Change the option values to the actual values for your servers

mysql> CHANGE MASTER TO
    ->     MASTER_HOST='master_host_name',
    ->     MASTER_USER='replication_user_name',
    ->     MASTER_PASSWORD='replication_password',
    ->     MASTER_LOG_FILE='recorded_log_file_name',
    ->     MASTER_LOG_POS='recorded_log_position';

Start the slave

$ /etc/init.d/mysqld --skip-slave start

The --skip-slave option tells the server not to start the slave threads. We’re using this option because we aren’t quite ready to replicate.

Create the database

# mysql -u root -p
mysql> create database exampledb;
mysql> grant all on exampledb.* to 'repl'@'localhost';

Import the dump file

# mysql -u root -p exampledb < full_dump_file.sql

Start the slave threads

$ mysql> START SLAVE;

Checking the status of the master and slave

On slave

mysql> SHOW SLAVE STATUS\G;

*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: fluidrails.com
                Master_User: repl
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000004
        Read_Master_Log_Pos: 98
             Relay_Log_File: hakota-relay-bin.000073
              Relay_Log_Pos: 235
      Relay_Master_Log_File: mysql-bin.000004
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB: garden
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 98
            Relay_Log_Space: 235
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 0
1 row in set (0.00 sec)

ERROR:
No query specified

On master

mysql> SHOW MASTER STATUS\G;
*************************** 1. row ***************************
File: mysql-bin.000004
Position: 98
Binlog_Do_DB: garden
Binlog_Ignore_DB:
1 row in set (0.00 sec)

ERROR:
No query specified
mysql> SHOW PROCESSLIST\G;
+------+------+-----------------------+------+-------------+------+----------------------------------------------------------------+------------------+
| Id   | User | Host                  | db   | Command     | Time | State                                                    | Info             |
+------+------+-----------------------+------+-------------+------+----------------------------------------------------------------+------------------+
| 6235 | repl | www.hakota.com:43033  | NULL | Binlog Dump | 3250 | Has sent all binlog to slave; waiting for binlog to be updated | NULL             |
+------+------+-----------------------+------+-------------+------+----------------------------------------------------------------+------------------+
mysql> SHOW SLAVE HOSTS;
+-----------+------------+------+-------------------+-----------+
| Server_id | Host       | Port | Rpl_recovery_rank | Master_id |
+-----------+------------+------+-------------------+-----------+
|         2 | hakota.com | 3306 |                 0 |         1 |
+-----------+------------+------+-------------------+-----------+
1 row in set (0.00 sec)

Logs and Backups

Now that we’ve enabled binary logging we need to implement a process to manage the logs. One approach is to use SHOW SLAVE STATUS (on slave) and SHOW BINARY LOGS (on master) to determine which logs are in use. Suppose the slave is using Master_Log_File: master-bin.000004 Make sure your backups contain the logs about to be deleted, those prior to the target log master-bin.000004, then

mysql> PURGE MASTER LOGS TO 'master-bin.000004';

will delete all logs prior to master-bin.000004

In addition, now that we’ve implemented replication, our backup strategy needs to include the master.info and relay-log.info status files as well as the binary logs.

Before executing mysqldump on the slave, replication should be temporarily stopped.

# mysqladmin -u root -p  stop-slave
# mysqldump -u root -p exampledb > exampledb.sql
# mysqladmin -u root -p start-slave

References

MySQL 5.0 Reference Manual, Chapter 15 Replication

September 11, 2007

Syntax Highlighting for the Web using Vim

Filed under: Blogging, Web Development — trekr @ 12:10 pm

Brian Reindel wrote a review of syntax highlighters for blog code samples. One of the comments on his post caught my attention. Dan wrote that he uses vim and vividchalk for his blog. I love it when I learn about an existing solution using familiar tools. I gave Dan’s method a try and was very pleased with the result. I know there are many WordPress Plugins for syntax highlighting but this is another tool that comes in handy when source is presented on the web as a separate page, for example, dbbackup.sh.

You will need to use a version of vim with GUI support, gvim, on Fedora, to ensure the colors are correct. You can install gvim on Fedora with the following command

$ sudu yum install gvim*

On Fedora, syntax scripts are in the directory /usr/share/vim/vim70/syntax/ and color schemes are in the directory /usr/share/vim/vim70/colors/. You can get other scripts from vim online.

Launch gvim from a terminal or the menu Applications->Programming->Vi Improved.

If you want embedded style in the HTML output then issue the colon command

~
:let html_use_css = 1

If you like the vividchalk color scheme, issue the command

~
:colorscheme vividchalk

To create an HTML version of the file, issue the command

~
:TOhtml

and save the HTML file by issuing the “ZZ” command, and quit the session to leave the original file unchanged with the command

~
:q!

The :TOhtml command produces a complete HTML page. To embed the highlighted syntax within a post, simply change the <body> tag to an appropriate CSS tag for your theme and delete the <html> and <head> sections. Here is a snippet as an example of embedding the highlighted source within a blog post. The HTML was created with the commands

~
:let html_use_css = 0
~
:TOhtml

Without CSS, the generated HTML needs to be enclosed in a tag that can enclose the <font> tag, for example, the <code> tag.


class Zipcode < ActiveRecord::Base
  belongs_to :postoffice
  has_many :locations

  def self.find_by_postoffice_name(postoffice_name)
    postoffice_id = Postoffice.find_by_name(postoffice_name).id
    find(:all, :conditions => [postoffice_id = ?, postoffice_id])
  end

  def self.find_by_postoffice_id(postoffice_id)
      find(:all, :conditions => [postoffice_id = ?, postoffice_id])
  end
end

As an added bonus, the output from vim using css validates to HTML 4.01 Strict.

See the vim documentation for a complete rundown on syntax highlighting in vim.

September 2, 2007

Wordpress vs. Typo in one sentence.

Filed under: Blogging — admin @ 9:59 am

After struggling with Typo for several months, I’ve switched this blog to WordPress. In one sentence, here’s why.

You won’t be seeing a “How-to install WordPress” post on this blog in the style of Installing Typo on Fedora 6 slice from Slicehost because installing WordPress is just plain simple, and it works.

And there you have it, WordPress vs. Typo in one sentence. There are other reasons of course, but I should have seen the other difficulties to come right from the start. More often than we want to admit, first impressions do matter.

August 27, 2007

Website Backup Strategies, Part IV

Filed under: Backups, Deployment, Drupal, Fedora, Linux — trekr @ 10:50 am

Introduction

For backing up the database we won’t benefit from the snapback technique because the data will change often enough that a compressed archive will be more efficient then hard links (see analysis in Website Backup Strategies). Because the script archives and compresses the dump files, rsync will treat them as changed even if the data in the database hasn’t changed. Even if you don’t archive and compress the dump files, you must still take care to ensure that the dump files are not considered different by rsync due to creation time (use --size-only), and that the dump files are not trivially different due to metadata like timestamps embedded in the file mysqldump (use --skip-comments). You can use also use snapback on /var/lib/mysql, but its not guaranteed to work.

The strategy will be to set up a cron job to run a script on the local backup server that will invoke a script on the remote server. The script on the remote server will execute mysqldump. The reason for invoking a script on the remote sever is that we don’t want to allow databases connections except from localhost.

I’ve chosen to use mysqldump for the flexibility to edit SQL if needed. The downside is that we are only capturing the individual databases, not the user tables or grants. Additionally, I’m going to dump the table schema separately from the data so that I don’t waste space and bandwidth backing up tables like cache.

Remote Script

The remote script called dbbackup.sh (download) has the following usage

Usage: dbbackup.sh [OPTIONS] "

Where [OPTIONS] are :

<-d database name> is the name of the MySQL database
<-u database user> is database user
<-p password> is the database user's password 

destination is the local directory where backups will be stored

The general design of the remote script is as follows:

  • process command line arguments
  • change directories to the destination
  • get a list of tables in the database
  • create a list of tables for which data will be backed up
  • execute mysqldump –no-data on all tables
  • archive and compress the dump
  • test the compressed archive
  • execute mysqldump –no-create-info on selected tables
  • archive and compress the dump
  • test the compressed archive

Local Script

The local script named dbsnapshot.sh (download) has the following usage:

Usage: dbsnapshot.sh [OPTIONS] "

Where [OPTIONS] are :

[-t type] one of hourly, daily, weekly, monthly
[-P port] is the ssh port number
<-s server> is the remote server's as identified in known_hosts
<-d database names> database names should be comma separated
                                     or quoted white space
<-u database user> is database user
<-p password> is the database user's password
<-H [n]> where n is the number of hourly backups,default is 6
<-D [n]> where n is the number of daily backups, default is 7
<-W [n]> where n is the number of weekly backups, default is 4
<-M [n]> where n is the number of monthly backups, default is 12"

source is the directory where backups are on the remote server
destination is the local directory where backups will be stored

The general design of the local script is as follows:

  • process command line arguments
  • set the max number of backups based on type, -t argument
  • rotate the existing backups
  • create new backup by invoking remote script
  • copy the new backup to local
  • delete oldest backup that has rotated off the stack

Crontab

The script is meant to be run as a cron job. Your crontab will typically look something like this:

0 */4 * * * dbsnapback.sh [OPTIONS] -t hourly src dst >>  dev/null >2&1
59 3 * * * dbsnapback.sh [OPTIONS] -t daily scr dst  >> dev/null >2&1
58 3 * * 0 dbsnapback.sh [OPTIONS] -t weekly scr dst  >> dev/null >2&1
57 3 1 * * dbsnapback.sh [OPTIONS] -t monthly scr dst  >> dev/null >2&1

To Do List

Like any script, these two scripts could use some improvements. I’ll point out the weaknesses that I see as fair warning.

  • The remote script name and path is hard-coded in the local script
  • The tables for which data is not backed up are Drupal specific. Probably should be a command line option to take a list of tables that are to be excluded.
  • The directory that stores the backups should be mounted read only after the script runs.
  • The compressed archives should be removed from the remote server after being copied to the backup server
  • rsync is not preserving original permissions and ownership, need to look into –link-dest
  • the script should use a configuration file because there are too many command line arguments and most are required

That wraps up the first part of the series for website backup strategies. In the next parts we’ll cover how a site is restored and how a site is upgraded.

Part I Part II Part III

August 24, 2007

Website Backup Strategy, Part III

Filed under: Backups, Deployment, Drupal, Fedora, Linux — trekr @ 7:08 am

Web Accessible Files

In Part II Revision Control I mentioned my preference for git over other revision control systems. One reason is that git is guaranteed to give you back what you put into it because it computes sha1 and notices things like disk corruption. Recall, I mentioned in the section on svn that you might want to mirror your svn repository with svnsync. That only addresses corruption that may occur after a commit to the respository, it doesn’t do anything for you if the corruption occurs during he commit. The other nice aspect of git is that it is distributed so that every repository is in a sense a backup if we can match the signature. For these reasons, the next part of our backup strategy will backup up the entire set of files accessible to the web, including those under revision control.

As I mentioned in part I Website Backup Strategies, the snapback technique is ideally suited for website backups because the files are not expected to change very often. The snapback technique is well documented on Mike Rubel’s site. It takes advantage of hard links to reduce space and rsync to reduce bandwidth for files that don’t change often over the backup time horizon. The method depends on having a /usr/bin/cp that can make hard links, like GNU cp -al does.

As I mentioned in part I, there is an excellent implementation called snapback2 from Perusion.

In this post, I’m going to walk through an example where a local machine is used as a backup server to access the remote web server via ssh that hosts the site we want to backup. If you don’t have ssh setup, see my post Securing a new Fedora 6 Slice. The local server needs to have Perl installed and snapback2 depends on Config::ApacheFormat. The snapback2 README covers installation, and the Perusion site also has online documentation. The documentation is also installed as a manpage.

$ cd /usr/local/src

$ /usr/bin/wget \
> http://search.cpan.org/CPAN/authors/id/M/MI/MIKEH/Snapback2-0.913.tar.gz

$ /bin/tar xvf Snapback2-0.913.tar.gz

$ cd Snapback2-0.913

$ /usr/bin/perl Makefile.PL

$ /usr/bin/make

$ /usr/bin/make test

$ /usr/bin/make install

On Fedora, you can set up Perl to get modules from CPAN

$ /usr/bin/yum install perl-libwww-perl
$ /usr/bin/perl -MCPAN -e shell
> get Bundle::CPAN

Then installing a new module can be as simple as typing

$ /usr/bin/perl -MCPAN -e 'install Config::ApacheFormat'

To configue snapback2, you use an apache-like configuration file, /etc/snapback/snapback.conf

    Hourlies    6
    Dailies     7
    Weeklies    4
    Monthlies  12
    AutoTime   Yes

    AdminEmail webmaster@mysite.com
    LogFile    /home/myuser/var/log/snapback.log
    Exclude *debug
    Exclude core.*
    SnapbackRoot /etc/snapback
    RsyncShell '/usr/bin/ssh -i /home/myuser/.ssh/id_dsa -p 2222'

    Destination /home/myuser/mnt/backup1

    <Backup mysite.com>
      Directory /var/www/html/mysite/
    </Backup>

Note the RsyncShell directive used to set up an ssh connection with a key and port number.

Next we need to setup a cron job. See this intro if you unfamiliar with cron.

    $ /usr/bin/crontab -e

Add the line

10 */4 * * * /usr/bin/snapback2

Note that the configuration file specified Hourlies 6 and the cron is set up to run every 4 hours, covering 24 (6 * 4) hours with six backups every four hours.

Because cron is not run under myuser, the ssh keys should not have a passphrase unless you are using something like keychain. If you use passphrase, be aware that the process will not survive a reboot automatically, you will have to reenter the passphrase after a reboot. Obviously, the keys without a passphrase must be safeguarded.

In the next part, I’ll detail how to set up something very similar to snapback to backup a MySQL database using bash scripts on the local and remote servers.

Part I Part II Part IV

Next Page »

Hakota Design LLC