<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>fluidBlog</title>
	<link>http://www.fluidblog.com</link>
	<description>Just another WordPress weblog</description>
	<pubDate>Fri, 02 Nov 2007 23:23:11 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3</generator>
	<language>en</language>
			<item>
		<title>MySQL Replication</title>
		<link>http://www.fluidblog.com/archives/18</link>
		<comments>http://www.fluidblog.com/archives/18#comments</comments>
		<pubDate>Mon, 24 Sep 2007 13:03:57 +0000</pubDate>
		<dc:creator>trekr</dc:creator>
		
		<category><![CDATA[Backups]]></category>

		<category><![CDATA[Fedora]]></category>

		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://www.fluidblog.com/archives/18</guid>
		<description><![CDATA[This post will walk you through setting up replication for a MySQL database which has existing data.

Why Replication?


Backups. As part of a backup strategy, replication avoids shutting down the master if backups are made from the slave.
Scale-out. Replication can allow the distribution of reads over multiple replication slaves.
Reliability. Replication can provide a ready backup in [...]]]></description>
			<content:encoded><![CDATA[<p>This post will walk you through setting up replication for a MySQL database which has existing data.</p>

<p>Why Replication?</p>

<ul>
<li><strong>Backups.</strong> As part of a backup strategy, replication avoids shutting down the master if backups are made from the slave.</li>
<li><strong>Scale-out.</strong> Replication can allow the distribution of reads over multiple replication slaves.</li>
<li><strong>Reliability.</strong> Replication can provide a ready backup in the event of a failure.</li>
</ul>

<p>This post will only address the use of replication in a backup strategy, but the setup steps are valid for other uses as well.</p>

<h2>Assumptions:</h2>

<ul>
<li>The master database has existing data.</li>
<li>The master and slave have different IP addresses.</li>
<li>The master database to be replicated is named <code>exampledb.</code></li>
<li>The database <code>exampledb</code> exists on the master but not on the slave.</li>
<li>The database uses MyISAM tables.</li>
<li>The master and slave run on Fedora. This should only affect the directory structure.</li>
<li>The master and slave use port 3306. This affects firewall setup and Master configuration. To use a different port, set <code>--master-port</code> in the <code>CHANGE MASTER TO</code> command.</li>
</ul>

<h2>Preliminary steps.</h2>

<p>Configure the firewall to allow the master and slave to communicate.</p>

<p>You have a firewall, right? Assuming the master server is <code>2xx.xx.xx.01</code> and the slave is <code>2xx.xx.xx.02</code> corresponding to <code>server_id=1</code> for the master and <code>server_id=2</code> for the slave.</p>

<p>On the master add the following to <code>/etc/sysconfig/iptables</code></p>

<pre class="programlisting">
# Allow incoming and outgoing traffic on port 3306 for MySQL slave server
-A INPUT -p tcp -s 20x.xx.xx.02 --sport 1024:65535 -d 2xx.xx.xx.01 --dport 3306 -m state --state NEW,ESTABLISHED -j ACCEPT
-A OUTPUT -p tcp -s 2xx.xx.xx.01 --sport 3306 -d 2xx.xx.xx.02 --dport 1024:65535 -m state --state ESTABLISHED -j ACCEPT</pre>

<p>On the slave add the following to <code>/etc/sysconfig/iptables</code></p>

<pre class="programlisting">
# Allow incoming and outgoing traffic on port 3306 for MySQL slave server
-A INPUT -p tcp -s 2xx.xx.xx.01 --sport 1024:65535 -d 2xx.xx.xx.02 --dport 3306 -m state --state NEW,ESTABLISHED -j ACCEPT
-A OUTPUT -p tcp -s 2xx.xx.xx.02 --sport 3306 -d 2xx.xx.xx.01 --dport 1024:65535 -m state --state ESTABLISHED -j ACCEPT</pre>

<p>Restart the firewall</p>

<pre class="screen">
# /sbin/service iptables restart</pre>

<p>Create a replication user, <code>repl</code> on the slave server.</p>

<pre class="screen">
# /usr/sbin/useradd -r repl
# /usb/bin/passwd repl</pre>

<p>Grant privileges to the replication user on the master server.</p>

<pre class="screen">
mysql&gt; GRANT FILE, REPLICATION SLAVE ON *.*
    -&gt; TO 'repl'@'%' IDENTIFIED BY 'password';</pre>

<h2>Prepare the Master</h2>

<p>In order to prepare the master, it will have to be locked for a short time. While it&#8217;s locked we will create a dump file, and edit the configuration in <code>/etc/my.cnf</code>.</p>

<p>You must enable binary logging so we&#8217;ll start by configuring the master server binary logging and restarting the master.</p>

<p>Each server in the replication group must have a unique id. The master server is typically assigned an id of 1.</p>

<p>The master server must allow networking, so ensure that <code>--skip-networking</code> and <code>--bind-address</code> are commented out if present.</p>

<p>While your at it, check that the pemissions of the file <code>my.cnf</code>
are <code>0400</code> or <code>0600</code>.</p>

<p>Edit the master database&#8217;s configuration in the file <code>/etc/my.conf</code>. Add the lines under the <code>[mysqld]</code> section.</p>

<pre class="programlisting">
[mysqld]
# enable networking and listen on all IP addresses
# by commenting out the following two lines if they exist
#skip-networking
#bind-address            = 127.0.0.1
# The following line will create the log file
# /var/lib/mysql/mysql-bin.00000x
log-bin=mysql-bin
# set expire_logs_days no lower than the number of days the slave is behind
expire_logs_days=3
server-id=1
# substitue 'hostname' with the actual name of your host
relay-log='hostname'-relay-bin
# duplicate the following line for every database that needs to be replicated
# substitute 'exampledb' with actual name of the database to be replicated
binlog-do-db='exampledb'</pre>

<p>Restart the master</p>

<pre class="screen">
# /etc/init.d/mysqld --report-host restart</pre>

<p>The <code>--report-host</code> option will make checking the replication status easier.</p>

<p>Check the error log at <code>/var/log/mysqld.log</code> after restarting mysqld.</p>

<p>Next we will lock the master database, record the binary log file&#8217;s position, and create a database dump that we&#8217;ll use to initialize the slave database.</p>

<p>Lock the database</p>

<pre class="screen">
# mysql -p -u root
mysql&gt; use exampledb;
mysql&gt; FLUSH TABLES WITH READ LOCK;</pre>

<p>Obtain status of the binary log on the master.</p>

<pre class="screen">
mysql &gt; SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |       98 | exampledb    |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)</pre>

<p>Record File and Position, you&#8217;ll need it to synchronize the slave.</p>

<p>Leave this client running while the dump file is created.</p>

<p>Create a dump backup file</p>

<pre class="screen">
# mysqldump -u root -p exampledb &gt; exampledb.sql</pre>

<p>On master, unlock the tables</p>

<pre class="screen">
mysql &gt; UNLOCK TABLES;</pre>

<h2>Setting up the slave</h2>

<p>Edit the slave&#8217;s configuration in the file <code>/etc/my.cnf</code></p>

<pre class="programlisting">
[mysqld]
log-bin=mysql-bin
master-host=master-hostname
master-user=slave-user
master-password=slave-password
server-id=2
# duplicate the following line for every database that needs to be replicated
# substitute 'exampledb' with actual name of the database to be replicated
replicate-do-db='exampledb'
report-host='slave-hostname'</pre>

<p>Since the <code>master.info</code> file overides settings in <code>my.cnf</code>, you may prefer to set up the slave using the <code>CHANGE MASTER TO</code> command.  As a minimum, you need to use the <code>CHANGE MASTER TO</code> command to set the values
of  <code>MASTER_LOG_FILE</code> and <code>MASTER_LOG_POS</code></p>

<p>Change the option values to the actual values for your servers</p>

<pre class="screen">
mysql&gt; CHANGE MASTER TO
    -&gt;     MASTER_HOST='master_host_name',
    -&gt;     MASTER_USER='replication_user_name',
    -&gt;     MASTER_PASSWORD='replication_password',
    -&gt;     MASTER_LOG_FILE='recorded_log_file_name',
    -&gt;     MASTER_LOG_POS='recorded_log_position';</pre>

<p>Start the slave</p>

<pre class="screen">
$ /etc/init.d/mysqld --skip-slave start</pre>

<p>The <code>--skip-slave</code> option tells the server not to start the slave threads.  We&#8217;re using this option because we aren&#8217;t quite ready to replicate.</p>

<p>Create the database</p>

<pre class="screen">
# mysql -u root -p
mysql&gt; create database exampledb;
mysql&gt; grant all on exampledb.* to 'repl'@'localhost';</pre>

<p>Import the dump file</p>

<pre class="screen">
# mysql -u root -p exampledb &lt; full_dump_file.sql</pre>

<p>Start the slave threads</p>

<pre class="screen">
$ mysql&gt; START SLAVE;</pre>

<h2>Checking the status of the master and slave</h2>

<p>On slave</p>

<pre class="screen">
mysql&gt; SHOW SLAVE STATUS\G;

*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: fluidrails.com
                Master_User: repl
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000004
        Read_Master_Log_Pos: 98
             Relay_Log_File: hakota-relay-bin.000073
              Relay_Log_Pos: 235
      Relay_Master_Log_File: mysql-bin.000004
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB: garden
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 98
            Relay_Log_Space: 235
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 0
1 row in set (0.00 sec)

ERROR:
No query specified</pre>

<p>On master</p>

<pre class="screen">
mysql&gt; SHOW MASTER STATUS\G;
*************************** 1. row ***************************
File: mysql-bin.000004
Position: 98
Binlog_Do_DB: garden
Binlog_Ignore_DB:
1 row in set (0.00 sec)

ERROR:
No query specified</pre>

<pre class="screen">
mysql&gt; SHOW PROCESSLIST\G;
+------+------+-----------------------+------+-------------+------+----------------------------------------------------------------+------------------+
| Id   | User | Host                  | db   | Command     | Time | State                                                    | Info             |
+------+------+-----------------------+------+-------------+------+----------------------------------------------------------------+------------------+
| 6235 | repl | www.hakota.com:43033  | NULL | Binlog Dump | 3250 | Has sent all binlog to slave; waiting for binlog to be updated | NULL             |
+------+------+-----------------------+------+-------------+------+----------------------------------------------------------------+------------------+</pre>

<pre class="screen">
mysql&gt; SHOW SLAVE HOSTS;
+-----------+------------+------+-------------------+-----------+
| Server_id | Host       | Port | Rpl_recovery_rank | Master_id |
+-----------+------------+------+-------------------+-----------+
|         2 | hakota.com | 3306 |                 0 |         1 |
+-----------+------------+------+-------------------+-----------+
1 row in set (0.00 sec)</pre>

<h2>Logs and Backups</h2>

<p>Now that we&#8217;ve enabled binary logging we need to implement a process to manage the logs.
One approach is to use <code>SHOW SLAVE STATUS</code> (on slave) and <code>SHOW BINARY LOGS</code> (on master) to determine which logs are in use.  Suppose the slave is using <code>Master_Log_File: master-bin.000004</code>  Make sure your backups contain the logs about to be deleted, those prior to the target log <code>master-bin.000004</code>, then</p>

<pre class="screen">mysql&gt; PURGE MASTER LOGS TO 'master-bin.000004';</pre>

<p>will delete all logs prior to <code>master-bin.000004</code></p>

<p>In addition, now that we&#8217;ve implemented replication, our backup strategy needs to include the <code>master.info</code> and <code>relay-log.info</code> status files as well as the binary logs.</p>

<p>Before executing <code>mysqldump</code> on the slave, replication should be temporarily stopped.</p>

<pre class="screen">
# mysqladmin -u root -p  stop-slave
# mysqldump -u root -p exampledb &gt; exampledb.sql
# mysqladmin -u root -p start-slave</pre>

<h2>References</h2>

<p><a href="http://dev.mysql.com/doc/refman/5.0/en/replication.html">MySQL 5.0 Reference Manual, Chapter 15 Replication</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.fluidblog.com/archives/18/feed</wfw:commentRss>
		</item>
		<item>
		<title>Syntax Highlighting for the Web using Vim</title>
		<link>http://www.fluidblog.com/archives/15</link>
		<comments>http://www.fluidblog.com/archives/15#comments</comments>
		<pubDate>Tue, 11 Sep 2007 16:10:22 +0000</pubDate>
		<dc:creator>trekr</dc:creator>
		
		<category><![CDATA[Blogging]]></category>

		<category><![CDATA[Web Development]]></category>

		<guid isPermaLink="false">http://www.fluidblog.com/archives/15</guid>
		<description><![CDATA[Brian Reindel wrote a review of syntax highlighters for blog code samples.  One of the comments on his post caught my attention.  Dan wrote that he uses vim and vividchalk for his blog. I love it when I learn about an existing solution using familiar tools. I gave Dan&#8217;s method a try and [...]]]></description>
			<content:encoded><![CDATA[<p>Brian Reindel wrote a <a href="http://blog.reindel.com/2007/09/06/beautify-your-blogs-code-samples-with-these-syntax-highlighters/">review</a> of syntax highlighters for blog code samples.  One of the comments on his post caught my attention.  <a href="http://danintouch.blogspot.com/">Dan</a> wrote that he uses <a href="http://www.vim.org/">vim</a> and <a href="http://www.vim.org/scripts/script.php?script_id=1891">vividchalk</a> for his blog. I love it when I learn about an existing solution using familiar tools. I gave Dan&#8217;s method a try and was very pleased with the result.  I know there are many WordPress <a href="http://codex.wordpress.org/Plugins/Syntax_Highlighting">Plugins</a> for syntax highlighting but this is another tool that comes in handy when source is presented on the web as a separate page, for example, <a href="http://www.fluidblog.com/files/dbbackup.sh.html">dbbackup.sh</a>.</p>

<p>You will need to use a version of vim with GUI support, <code>gvim</code>, on Fedora, to ensure the colors are correct.  You can install <code>gvim</code> on Fedora with the following command</p>

<pre class="screen">
$ sudu yum install gvim*</pre>

<p>On Fedora, syntax scripts are in the directory <code>/usr/share/vim/vim70/syntax/</code> and color schemes are in the directory <code>/usr/share/vim/vim70/colors/</code>.  You can get other scripts from <a href="http://www.vim.org/scripts">vim online</a>.</p>

<p>Launch <code>gvim</code> from a terminal or the menu Applications-&gt;Programming-&gt;Vi Improved.</p>

<p>If you want embedded style in the HTML output then issue the colon command</p>

<pre class="screen">
~
:let html_use_css = 1</pre>

<p>If you like the <code>vividchalk</code> color scheme, issue the command</p>

<pre class="screen">
~
:colorscheme vividchalk</pre>

<p>To create an HTML version of the file, issue the command</p>

<pre class="screen">
~
:TOhtml</pre>

<p>and save the HTML file by issuing the &#8220;ZZ&#8221; command, and quit the session to leave the original file unchanged with the command</p>

<pre class="screen">
~
:q!</pre>

<p>The <code>:TOhtml</code> command produces a complete HTML page.  To embed the highlighted syntax within a post, simply change the &lt;body&gt; tag to an appropriate CSS tag for your theme and delete the &lt;html&gt; and &lt;head&gt; sections. Here is a snippet as an example of embedding the highlighted source within a blog post.  The HTML was created with the commands</p>

<pre class="screen">
~
:let html_use_css = 0</pre>

<pre class="screen">
~
:TOhtml</pre>

<p>Without CSS, the generated HTML needs to be enclosed in a tag that can enclose the &lt;font&gt; tag, for example, the &lt;code&gt; tag.</p>

<pre class="vividchalk">
<code>
<font color="#ff6600">class </font><font color="#aaaa77">Zipcode</font> &lt; <font color="#ffcc00">ActiveRecord</font>::<font color="#ffcc00">Base</font>
  belongs_to <font color="#339999">:postoffice</font>
  has_many <font color="#339999">:locations</font>

  <font color="#ff6600">def </font><font color="#ffcc00">self.find_by_postoffice_name</font>(postoffice_name)
    postoffice_id = <font color="#ffcc00">Postoffice</font>.find_by_name(postoffice_name).id
    find(<font color="#339999">:all</font>, <font color="#339999">:conditions</font> =&gt; [<font color="#66ff00">&#8220;</font><font color="#66ff00">postoffice_id = ?</font><font color="#66ff00">&#8220;</font>, postoffice_id])
  <font color="#ff6600">end</font>

  <font color="#ff6600">def </font><font color="#ffcc00">self.find_by_postoffice_id</font>(postoffice_id)
      find(<font color="#339999">:all</font>, <font color="#339999">:conditions</font> =&gt; [<font color="#66ff00">&#8220;</font><font color="#66ff00">postoffice_id = ?</font><font color="#66ff00">&#8220;</font>, postoffice_id])
  <font color="#ff6600">end</font>
<font color="#ff6600">end</font></code></pre>

<p>As an added bonus, the output from vim using css <a href="http://validator.w3.org/">validates</a> to HTML 4.01 Strict.</p>

<p>See the vim documentation for a complete rundown on <a href="http://www.vim.org/htmldoc/syntax.html">syntax highlighting in vim</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.fluidblog.com/archives/15/feed</wfw:commentRss>
		</item>
		<item>
		<title>Wordpress vs. Typo in one sentence.</title>
		<link>http://www.fluidblog.com/archives/14</link>
		<comments>http://www.fluidblog.com/archives/14#comments</comments>
		<pubDate>Sun, 02 Sep 2007 13:59:21 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Blogging]]></category>

		<guid isPermaLink="false">http://www.fluidblog.com/archives/14</guid>
		<description><![CDATA[After struggling with Typo for several months, I&#8217;ve switched this blog to WordPress.  In one sentence, here&#8217;s why.

You won&#8217;t be seeing a &#8220;How-to install WordPress&#8221; post on this blog in the style of Installing Typo on Fedora 6 slice from Slicehost because installing WordPress is just plain simple, and it works.

And there you have [...]]]></description>
			<content:encoded><![CDATA[<p>After struggling with <a href="http://typosphere.org/">Typo</a> for several months, I&#8217;ve switched this blog to <a href="http://wordpress.org/">WordPress</a>.  In one sentence, here&#8217;s why.</p>

<p>You won&#8217;t be seeing a &#8220;How-to install WordPress&#8221; post on this blog in the style of <a href="http://www.fluidblog.com/archives/5">Installing Typo on Fedora 6 slice from Slicehost</a> because <a href="http://codex.wordpress.org/Installing_WordPress#Famous_5-Minute_Install">installing</a> WordPress is just plain simple, and it works.</p>

<p>And there you have it, WordPress vs. Typo in one sentence.   There are other reasons of course, but I should have seen the other difficulties to come right from the start.  More often than we want to admit, first impressions do matter.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.fluidblog.com/archives/14/feed</wfw:commentRss>
		</item>
		<item>
		<title>Website Backup Strategies, Part IV</title>
		<link>http://www.fluidblog.com/archives/13</link>
		<comments>http://www.fluidblog.com/archives/13#comments</comments>
		<pubDate>Mon, 27 Aug 2007 14:50:32 +0000</pubDate>
		<dc:creator>trekr</dc:creator>
		
		<category><![CDATA[Backups]]></category>

		<category><![CDATA[Deployment]]></category>

		<category><![CDATA[Drupal]]></category>

		<category><![CDATA[Fedora]]></category>

		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://mike.fluidblog.com/archives/13</guid>
		<description><![CDATA[Introduction

For backing up the database we won&#8217;t benefit from the snapback technique
because the data will change often enough that a compressed archive will
be more efficient then hard links (see analysis in Website Backup Strategies). Because the script archives and compresses the dump files, rsync will treat them as changed even if the data in the [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>

<p>For backing up the database we won&#8217;t benefit from the snapback technique
because the data will change often enough that a compressed archive will
be more efficient then hard links (see analysis in <a href="http://www.fluidblog.com/articles/2007/08/20/website-backup-strategies">Website Backup Strategies</a>). Because the script archives and compresses the dump files, rsync will treat them as changed even if the data in the database hasn&#8217;t changed.
Even if you don&#8217;t archive and compress the dump files, you must still take care to ensure that the dump files are not considered different by rsync due to creation time (use <code>--size-only</code>), and that the dump files are not trivially different due to metadata like timestamps embedded in the file mysqldump (use <code>--skip-comments</code>). You can use also use snapback on /var/lib/mysql, but its not guaranteed to work.</p>

<p>The strategy will be to set up a cron job to run a script on the local backup server that will invoke a script on the remote server. The script on the remote server will execute mysqldump. The reason for invoking a script on the remote sever is that we don&#8217;t want to allow databases connections except from localhost.</p>

<p>I&#8217;ve chosen to use mysqldump for the flexibility to edit SQL if needed.
The downside is that we are only capturing the individual databases, not the user tables or grants.
Additionally, I&#8217;m going to dump the table schema separately from the data so that I don&#8217;t waste space and bandwidth backing up tables like cache.</p>

<h2>Remote Script</h2>

<p>The remote script called <a href="http://fluidblog.com/files/dbbackup.sh.html">dbbackup.sh</a> (<a href="http://fluidblog.com/files/dbbackup.sh.gz">download</a>) has the following usage</p>

<pre class="programlisting">
Usage: dbbackup.sh [OPTIONS] "

Where [OPTIONS] are :

&lt;-d database name&gt; is the name of the MySQL database
&lt;-u database user&gt; is database user
&lt;-p password&gt; is the database user's password 

destination is the local directory where backups will be stored</pre>

<p>The general design of the remote script is as follows:</p>

<ul>
<li>process command line arguments</li>
<li>change directories to the destination</li>
<li>get a list of tables in the database</li>
<li>create a list of tables for which data will be backed up</li>
<li>execute mysqldump &#8211;no-data on all tables</li>
<li>archive and compress the dump</li>
<li>test the compressed archive</li>
<li>execute mysqldump &#8211;no-create-info on selected tables</li>
<li>archive and compress the dump</li>
<li>test the compressed archive</li>
</ul>

<h2>Local Script</h2>

<p>The local script named <a href="http://fluidblog.com/files/dbsnapshot.sh.html">dbsnapshot.sh</a> (<a href="http://fluidblog.com/files/dbsnapshot.sh.gz">download</a>) has the following usage:</p>

<pre class="programlisting">
Usage: dbsnapshot.sh [OPTIONS] "

Where [OPTIONS] are :

[-t type] one of hourly, daily, weekly, monthly
[-P port] is the ssh port number
&lt;-s server&gt; is the remote server's as identified in known_hosts
&lt;-d database names&gt; database names should be comma separated
                                     or quoted white space
&lt;-u database user&gt; is database user
&lt;-p password&gt; is the database user's password
&lt;-H [n]&gt; where n is the number of hourly backups,default is 6
&lt;-D [n]&gt; where n is the number of daily backups, default is 7
&lt;-W [n]&gt; where n is the number of weekly backups, default is 4
&lt;-M [n]&gt; where n is the number of monthly backups, default is 12"

source is the directory where backups are on the remote server
destination is the local directory where backups will be stored</pre>

<p>The general design of the local script is as follows:</p>

<ul>
<li>process command line arguments</li>
<li>set the max number of backups based on type, -t argument</li>
<li>rotate the existing backups</li>
<li>create new backup by invoking remote script</li>
<li>copy the new backup to local</li>
<li>delete oldest backup that has rotated off the stack</li>
</ul>

<h2>Crontab</h2>

<p>The script is meant to be run as a cron job. Your crontab will typically look something like this:</p>

<pre class="programlisting">
0 */4 * * * dbsnapback.sh [OPTIONS] -t hourly src dst &gt;&gt;  dev/null &gt;2&amp;1
59 3 * * * dbsnapback.sh [OPTIONS] -t daily scr dst  &gt;&gt; dev/null &gt;2&amp;1
58 3 * * 0 dbsnapback.sh [OPTIONS] -t weekly scr dst  &gt;&gt; dev/null &gt;2&amp;1
57 3 1 * * dbsnapback.sh [OPTIONS] -t monthly scr dst  &gt;&gt; dev/null &gt;2&amp;1</pre>

<h2>To Do List</h2>

<p>Like any script, these two scripts could use some improvements. I&#8217;ll point out the weaknesses that I see as fair warning.</p>

<ul>
<li>The remote script name and path is hard-coded in the local script</li>
<li>The tables for which data is not backed up are Drupal specific. Probably should be a command line option to take a list of tables that are to be excluded.</li>
<li>The directory that stores the backups should be mounted read only after the script runs.</li>
<li>The compressed archives should be removed from the remote server after being copied to the backup server</li>
<li>rsync is not preserving original permissions and ownership, need to look into &#8211;link-dest</li>
<li>the script should use a configuration file because there are too many command line arguments and most are required</li>
</ul>

<p>That wraps up the first part of the series for website backup strategies. In the next parts we&#8217;ll cover how a site is restored and how a site is upgraded.</p>

<p><a href="http://www.fluidblog.com/articles/2007/08/20/website-backup-strategies">Part I</a>
<a href="http://www.fluidblog.com/articles/2007/08/22/website-backup-strategies-part-ii">Part II</a>
<a href="http://www.fluidblog.com/articles/2007/08/24/website-backup-strategy-part-iii">Part III</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.fluidblog.com/archives/13/feed</wfw:commentRss>
		</item>
		<item>
		<title>Website Backup Strategy, Part III</title>
		<link>http://www.fluidblog.com/archives/12</link>
		<comments>http://www.fluidblog.com/archives/12#comments</comments>
		<pubDate>Fri, 24 Aug 2007 11:08:35 +0000</pubDate>
		<dc:creator>trekr</dc:creator>
		
		<category><![CDATA[Backups]]></category>

		<category><![CDATA[Deployment]]></category>

		<category><![CDATA[Drupal]]></category>

		<category><![CDATA[Fedora]]></category>

		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://mike.fluidblog.com/archives/12</guid>
		<description><![CDATA[Web Accessible Files

In Part II Revision Control I mentioned my preference for git over other revision control systems.  One reason is that git is guaranteed to give you back what you put into it because it computes sha1 and notices things like disk corruption.  Recall, I mentioned in the section on svn that [...]]]></description>
			<content:encoded><![CDATA[<h2>Web Accessible Files</h2>

<p>In <a href="http://www.fluidblog.com/articles/2007/08/22/website-backup-strategies-part-ii">Part II Revision Control</a> I mentioned my preference for git over other revision control systems.  One reason is that git is guaranteed to give you back what you put into it because it computes sha1 and notices things like disk corruption.  Recall, I mentioned in the section on svn that you might want to mirror your svn repository with svnsync.  That only addresses corruption that may occur after a commit to the respository, it doesn&#8217;t do anything for you if the corruption occurs during he commit.  The other nice aspect of git is that it is distributed so that every repository is in a sense a backup if we can match the signature.
For these reasons, the next part of our backup strategy will backup up the entire set of files accessible to the web, including those under revision control.</p>

<p>As I mentioned in <a href="http://www.fluidblog.com/articles/2007/08/20/website-backup-strategies">part I Website Backup Strategies</a>, the snapback technique is ideally suited for website backups because the files are not expected to change very often.  The snapback technique is well documented on <a href="http://mikerubel.org/computers/rsync_snapshots/index.html">Mike Rubel&#8217;s</a> site.  It takes advantage of hard links to reduce space and rsync to reduce bandwidth for files that don&#8217;t change often over the backup time horizon.  The method depends on having a <code>/usr/bin/cp</code> that can make hard links, like GNU <code>cp -al</code> does.</p>

<p>As I mentioned in part I, there is an excellent implementation called <a href="http://search.cpan.org/CPAN/authors/id/M/MI/MIKEH/Snapback2-0.913.tar.gz">snapback2</a> from <a href="http://www.perusion.com/misc/Snapback2/">Perusion</a>.</p>

<p>In this post, I&#8217;m going to walk through an example where a local machine is used as a backup server to access the remote web server via ssh that hosts the site we want to backup.  If you don&#8217;t have ssh setup, see my post <a href="http://www.fluidblog.com/articles/2007/04/14/securing-ssh-on-fedora-6-slice">Securing a new Fedora 6 Slice</a>.  The local server needs to have Perl installed and snapback2 depends on Config::ApacheFormat.  The snapback2 <a href="http://www.perusion.com/misc/Snapback2/README">README</a> covers installation, and the Perusion site also has online <a href="http://www.perusion.com/misc/Snapback2/snapback2.html">documentation</a>.  The documentation is also installed as a manpage.</p>

<pre class="screen">
$ cd /usr/local/src

$ /usr/bin/wget \
&gt; http://search.cpan.org/CPAN/authors/id/M/MI/MIKEH/Snapback2-0.913.tar.gz

$ /bin/tar xvf Snapback2-0.913.tar.gz

$ cd Snapback2-0.913

$ /usr/bin/perl Makefile.PL

$ /usr/bin/make

$ /usr/bin/make test

$ /usr/bin/make install

</pre>

<p>On Fedora, you can set up Perl to get modules from CPAN</p>

<pre class="screen">
$ /usr/bin/yum install perl-libwww-perl
$ /usr/bin/perl -MCPAN -e shell
&gt; get Bundle::CPAN
</pre>

<p>Then installing a new module can be as simple as typing</p>

<pre class="screen">
$ /usr/bin/perl -MCPAN -e 'install Config::ApacheFormat'
</pre>

<p>To configue snapback2, you use an apache-like configuration file, <code>/etc/snapback/snapback.conf</code></p>

<pre class="programlisting">
    Hourlies    6
    Dailies     7
    Weeklies    4
    Monthlies  12
    AutoTime   Yes

    AdminEmail webmaster@mysite.com
    LogFile    /home/myuser/var/log/snapback.log
    Exclude *debug
    Exclude core.*
    SnapbackRoot /etc/snapback
    RsyncShell '/usr/bin/ssh -i /home/myuser/.ssh/id_dsa -p 2222'

    Destination /home/myuser/mnt/backup1

    &lt;Backup mysite.com&gt;
      Directory /var/www/html/mysite/
    &lt;/Backup&gt;
</pre>

<p>Note the RsyncShell directive used to set up an ssh connection with a key and port number.</p>

<p>Next we need to setup a cron job.  See this <a href="http://www.unixgeeks.org/security/newbie/unix/cron-1.html">intro</a> if you unfamiliar with cron.</p>

<pre class="screen">
    $ /usr/bin/crontab -e
</pre>

<p>Add the line</p>

<pre class="programlisting">
10 */4 * * * /usr/bin/snapback2
</pre>

<p>Note that the configuration file specified <code>Hourlies 6</code> and the cron is set up to run every 4 hours, covering 24 (6 * 4) hours with six backups every four hours.</p>

<p>Because cron is not run under myuser, the ssh keys should not have a passphrase unless you are using something like <a href="http://www.gentoo.org/proj/en/keychain/">keychain</a>.  If you use passphrase, be aware that the process will not survive a reboot automatically, you will have to reenter the passphrase after a reboot. Obviously, the keys without a passphrase must be safeguarded.</p>

<p>In the next part, I&#8217;ll detail how to set up something very similar to snapback to backup a MySQL database using bash scripts on the local and remote servers.</p>

<p><a href="http://www.fluidblog.com/articles/2007/08/20/website-backup-strategies">Part I </a>
<a href="http://www.fluidblog.com/articles/2007/08/22/website-backup-strategies-part-ii">Part II</a>
<a href="http://www.fluidblog.com/articles/2007/08/27/website-backup-strategies-part-iv">Part IV</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.fluidblog.com/archives/12/feed</wfw:commentRss>
		</item>
		<item>
		<title>Website Backup Strategies, Part II</title>
		<link>http://www.fluidblog.com/archives/11</link>
		<comments>http://www.fluidblog.com/archives/11#comments</comments>
		<pubDate>Wed, 22 Aug 2007 18:33:53 +0000</pubDate>
		<dc:creator>trekr</dc:creator>
		
		<category><![CDATA[Backups]]></category>

		<category><![CDATA[Deployment]]></category>

		<category><![CDATA[Drupal]]></category>

		<category><![CDATA[Fedora]]></category>

		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://mike.fluidblog.com/archives/11</guid>
		<description><![CDATA[The second part of this series will take a look at revision control strategies for the core Drupal source.

CVS

Drupal is maintained in cvs. So cvs seems like a natural option. Pro Drupal Development covers this quite well in Chapter 21 &#8220;Development Best Practices&#8221;. There is a very short paragraph in the chapter about mixing SVN [...]]]></description>
			<content:encoded><![CDATA[<p>The second part of this series will take a look at revision control strategies for the core Drupal source.</p>

<h2>CVS</h2>

<p>Drupal is maintained in <a href="http://www.nongnu.org/cvs/">cvs</a>. So cvs seems like a natural option. Pro Drupal Development covers this quite well in Chapter 21 &#8220;Development Best Practices&#8221;. There is a very short paragraph in the chapter about mixing SVN and CVS to deal with custom source.</p>

<p>Assuming cvs is installed, using CVS is as easy as</p>

<pre class="screen">
$ cd /var/www/html
$ /usr/bin/cvs \
&gt; -d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal checkout \
&gt; -r DRUPAL-5 mysite
</pre>

<p>Which will create <code>/var/www/html/drupal</code></p>

<p>And updates are as simple as</p>

<pre class="screen">
$ cd /var/www/html/drupal
$ /usr/bin/cvs update -dP -r DRUPAL-5
</pre>

<p>But if you customize anything within the core source, then you&#8217;ll start to run into some difficulty.  For starters, you will need to store the custom code in a different repository. Remember the short paragraph in Pro Drupal Development about mixing CVS and SVN?
If custom source is not stored in another repository, the custom changes will not survive the update.</p>

<p>I&#8217;m pretty sure that I&#8217;ll be customizing something, headers, logos, maybe some javascript to get transparent images to work properly in IE.  So, for my purposes, I&#8217;m going to look for another approach.</p>

<h2>SVN</h2>

<p>Using a svn repository solves the problem of keeping core source and custom source in a single repository at the cost of a little additional work.  In the case of Drupal, the source is not in SVN so we cannot simply checkout the source from the development team repository. If you are new to svn, a good resource is Mike Mason&#8217;s excellent book Pragmatic Version Control: Using Subversion(2nd Edition). Another good resource is <a href="http://svnbook.red-bean.com/">Version Control with Subversion</a></p>

<p>The following workflow assumes everything is being done on a local machine, and the user has appropriate permissions.</p>

<p>Create a repository</p>

<pre class="screen">
$ mkdir /var/svn/repos
$ svnadmin create /var/svn/repos
</pre>

<p>Get the source</p>

<pre class="screen">
$ cd /usr/local/src
$ wget http://ftp.drupal.org/files/projects/drupal-5.1.tar.gz
</pre>

<p>Uncompress the archive</p>

<pre class="screen">
$ tar xvf drupal-5.1.tar.gz
</pre>

<p>Import the source into the repository</p>

<pre class="screen">
$ svn import --no-auto-props -m"initial import of drupal 5.1" \
&gt; drupal-5.1 file:///var/svn/repos/vendorsrc/drupal/current
</pre>

<p>Create a tag</p>

<pre class="screen">
$ svn copy -m"Tag 5.1 vendor drop" \
&gt; file:///var/svn/repos/vendorsrc/drupal/current \
&gt; file:///var/svn/repos/vendorsrc/drupal/5.1
</pre>

<p>Create project</p>

<pre class="screen">
$ svn mkdir -m"Create project myproject" file:///var/svn/repos/myproject
</pre>

<p>Copy into main development branch</p>

<pre class="screen">
$ svn copy file:///var/svn/repos/vendorsrc/drupal/5.1 \
&gt; file:///var/svn/repos/myproject/trunk
</pre>

<p>Check out the project to create a working directory</p>

<pre class="screen">
$ svn co file:///var/svn/svnrepos/myproject/trunk \
&gt; /var/www/html/myproject
</pre>

<p>Ignore files/ directory</p>

<pre class="screen">
$ svn mkdir files files/images files/images/temp files/css files/color

$ svn propset svn:ignore "*" files/ files/images \
&gt; files/images/temp files/css files/color

$ svn commit -m "ignore files/ and subdirectories content from now"
</pre>

<p>If you don&#8217;t set ignore on the <code>files/</code> directory and its subdirectories then <code>svn status</code> will get pretty busy. You need to set the ignore property on <code>files/</code> and each subdiretory in <code>files/</code>, for example, <code>files/css, files/images, files/images/temp</code>, etc.</p>

<p>Any custom changes we make can go into the same project respository.</p>

<p>When it&#8217;s time to upgrade Drupal to a new version, the basic update workflow is as follows:</p>

<ul>
<li><p>checkout the current version of the core source in a working directory</p></li>
<li><p>make the working directory look like a pristine copy of the new version
by copying, and svn adding or deleting as required.
To make your life easier, use <a href="http://svn.collab.net/repos/svn/trunk/contrib/client-side/svn_load_dirs/">svn.load.dirs.pl</a></p></li>
<li><p>commit the new version and add a tag</p></li>
<li><p>merge the old and new versions in your project&#8217;s working directory
make sure you use the tag because the branch moves, the tag is fixed</p></li>
<li><p>resolve any conflicts and commit the changes to your project</p></li>
</ul>

<p>More details can be found at <a href="http://svnbook.red-bean.com/nightly/en/svn.advanced.vendorbr.html">Vendor Branches</a></p>

<p>Finally, you may want to consider using <a href="http://svn.collab.net/repos/svn/trunk/notes/svnsync.txt">svnsync</a> to mirror your repository.</p>

<h2>GIT</h2>

<p>In <a href="http://www.fluidblog.com/articles/2007/08/20/website-backup-strategies">part one</a>, I mentioned the excellent article on <a href="http://versioncontrolblog.com/2007/08/02/upgrading-drupal-52-with-git/">Verion Control Blog</a>.  This is my preferred process for source control of Drupal powered sites.  The article is a very comprehensive how-to, so I don&#8217;t need to duplicate it here.</p>

<p>The basic workflow involves creating lines of development for</p>

<ul>
<li><code>drupal</code> - contains the core source distribution</li>
<li><code>drupal-and-modules</code> is a clone of drupal plus contributed modules and themes</li>
<li><code>drupal-production</code> is a clone of drupal-and-modules with project customizations</li>
</ul>

<p>The lines of development are chained together with cloning and changes propogate from <code>drupal/</code> through <code>drupal-production/</code>.</p>

<p>I will point out one area of concern with the approach as detailed in the article.  Exploding the latest version of Drupal in the <code>drupal/</code> line of development does not deal with files or directories that may have been deleted in the latest version.</p>

<p>For more info on git, you should start with the <a href="http://git.or.cz/gitwiki/GitDocumentation">documentation</a>.</p>

<p>So there you have it, three approaches to keeping the Drupal source under revision control.  I&#8217;ve stated my preference but everyone&#8217;s situation is different and you can choose.</p>

<p>In the next part, I&#8217;ll detail how to backup the files that are not under revision control, for example, the <code>files/</code> directory. The final segment will address backing up the MySQL database.</p>

<p><a href="http://www.fluidblog.com/articles/2007/08/20/website-backup-strategies">Part I</a>
<a href="http://www.fluidblog.com/articles/2007/08/24/website-backup-strategy-part-iii">Part III</a>
<a href="http://www.fluidblog.com/articles/2007/08/27/website-backup-strategies-part-iv">Part IV</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.fluidblog.com/archives/11/feed</wfw:commentRss>
		</item>
		<item>
		<title>Website Backup Strategy</title>
		<link>http://www.fluidblog.com/archives/10</link>
		<comments>http://www.fluidblog.com/archives/10#comments</comments>
		<pubDate>Mon, 20 Aug 2007 17:46:06 +0000</pubDate>
		<dc:creator>trekr</dc:creator>
		
		<category><![CDATA[Backups]]></category>

		<category><![CDATA[Deployment]]></category>

		<category><![CDATA[Drupal]]></category>

		<category><![CDATA[Fedora]]></category>

		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://mike.fluidblog.com/archives/10</guid>
		<description><![CDATA[Introduction

In previous posts, I&#8217;ve covered installation of Rails and Drupal, and also some rudimentary security.  But running a successful site requires a strategy for upgrades and backups.  In this post I&#8217;m going to introduce some initial concepts for a web site
backup strategy that will also touch on upgrades.  I&#8217;ll be using a [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>

<p>In previous posts, I&#8217;ve covered installation of Rails and Drupal, and also some rudimentary security.  But running a successful site requires a strategy for upgrades and backups.  In this post I&#8217;m going to introduce some initial concepts for a web site
backup strategy that will also touch on upgrades.  I&#8217;ll be using a Drupal CMS powered site as an example.
The principles should apply for other LAMP based web sites as well.
In subsequent posts, I&#8217;ll be getting into the how-to, to implement the strategy.
I&#8217;ll start by stating the desired goals for the backup strategy.</p>

<h2>Goals</h2>

<ul>
<li>Simple; Use existing tools</li>
<li>Automatic; Set up as a cron job</li>
<li>Secure; Don&#8217;t introduce new security risks</li>
<li>Efficient; Minimize space, bandwidth, and restoration time</li>
</ul>

<p>I find that it&#8217;s helpful to categorize the types of files that
make up a typical Drupal web site.</p>

<ul>
<li>Drupal source files</li>
<li>Contributed modules and themes</li>
<li>Custom versions of the above</li>
<li>Files uploaded to the site</li>
<li>The database</li>
</ul>

<h2>Source, Contributed source, and custom source</h2>

<p>Drupal source files, contributed modules and themes, and custom versions
of the latter will not change often and it makes sense to maintain them
in a revision control repository.</p>

<p>You have lots of choices for a revision control system. I&#8217;ve decided to use <a href="http://git.or.cz/">git</a> based on an excellent article on <a href="http://versioncontrolblog.com/2007/08/02/upgrading-drupal-52-with-git/">Version Control Blog</a>.
I like git because the separation of core Drupal from the contributed modules and the custom code seems more intuitive then branching in other systems.</p>

<h2>Uploaded Files</h2>

<p>Files uploaded to the site, also will not often change once uploaded.
However, they may be deleted and there will always be new files to deal with.
Uploaded files will be added and deleted by users using the CMS, so they will
be outside of revision control system. Because of the administrative
burden of adding and deleting uploaded content to revision control, it
doesn&#8217;t make sense to try to keep them in a repository. If you have a novel solution that solves this problem I&#8217;d love to hear from you.</p>

<p>In Drupal, uploaded files are stored in a directory /files making it easy to
backup just that directory.  The /files directory is an excellent candidate
for a novel rotataing backup system sometimes known as snapback.  Snapback is
based on the work of <a href="http://mikerubel.org/computers/rsync_snapshots/index.html">Mike Rubel</a>  and others.
The basic idea is that for files that haven&#8217;t changed
over the backup horizon, hard links instead of copies are maintained.
The hardl links significantly reduce the space and bandwidth required for
the backups.</p>

<p>Luckily, there is an excellent Perl implementation of the snapaback strategy, called <a href="http://search.cpan.org/CPAN/authors/id/M/MI/MIKEH/Snapback2-0.913.tar.gz">snapback2</a>
from <a href="http://www.perusion.com/misc/Snapback2/">Perusian</a></p>

<h2>Database</h2>

<p>The database consists of structure and data.  The structure of the database
may not change very frequently however the data in several tables probably
will.  So it makes sense to backup up the database structure and data
separately.  By doing so, we can also avoid backing up data in cache,
sessions, and watchdog tables. Like uploaded files, the database will be
backed up periodically in a rotating system.</p>

<p>MySQL is a popular choice for LAMP based systems and Drupal, so we&#8217;ll assume
a MySQL database and use the <a href="http://dev.mysql.com/doc/refman/5.0/en/mysqldump.html">mysqldump</a> program to create the backups.</p>

<p>Unfortunately the database backups will probably not benefit from the &#8220;snapback&#8221; technique.  To see why, lets do some back-of-the-envelope calculations.</p>

<p>If g is the tar gzip compression factor, and p is the percentage of the change in the backup files, s is the total size of the files, k is the number of backups, then for snapback to use less space than compressed archives, the following must hold</p>

<p>sgk &gt; spk + s(1-p)</p>

<p>Where s(1-p) is the size of the unchanged files of which there is only one copy due to hard links</p>

<p>Simplifying by dividing by s,</p>

<p>gk &gt; pk + (1-p)</p>

<p>If we ignore (1-p) which is small when k is large, then
it is clear that for the method to have benefit, the percent of  changed files must be less then the compression factor.</p>

<p>g &gt; p</p>

<p>In my testing, I&#8217;ve been seeing compression factors between .15 and .20  for
mysqldump files.  Its hard to imagine a database that changes less than 15-20%</p>

<p>On a directory where the files don&#8217;t change much, and g &gt; p holds, then the break even number of backups, k is given by</p>

<p>k = ceil((1-p)/(g-p))</p>

<p>For example, assume the percent of change, p, is 5% and the compression factor, g, is 15%, then the break even number of backup copies, k, is 10.   A typical backup scheme has 6 hourly, 7 daily, 4 weekly, and 12 monthly = 29 copies, call it thirty,
so in this example there is about a 3X savings over compressed archives. This is why I like the snapback technique for the uploaded files.</p>

<p>Therefore, to implement the database backup, we&#8217;ll being using some simple bash scripts on the local backup server and the remote web server that execute mysqldump, archive and compress the output, and move it to the remote server securely.</p>

<p>Well, that wraps up the outline of our strategy.  In my next series of posts, I&#8217;ll cover in detail what it takes to implement each part of our three-tiered backup strategy.</p>

<p><a href="http://www.fluidblog.com/articles/2007/08/22/website-backup-strategies-part-ii">Part II</a>
<a href="http://www.fluidblog.com/articles/2007/08/24/website-backup-strategy-part-iii">Part III</a>
<a href="http://www.fluidblog.com/articles/2007/08/27/website-backup-strategies-part-iv">Part IV</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.fluidblog.com/archives/10/feed</wfw:commentRss>
		</item>
		<item>
		<title>Installing Drupal on Fedora 6</title>
		<link>http://www.fluidblog.com/archives/9</link>
		<comments>http://www.fluidblog.com/archives/9#comments</comments>
		<pubDate>Wed, 08 Aug 2007 15:55:15 +0000</pubDate>
		<dc:creator>trekr</dc:creator>
		
		<category><![CDATA[Deployment]]></category>

		<category><![CDATA[Drupal]]></category>

		<category><![CDATA[Fedora]]></category>

		<category><![CDATA[Installation]]></category>

		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://mike.fluidblog.com/archives/9</guid>
		<description><![CDATA[In this post I&#8217;m going to walk you through the steps to create a Drupal based web site from the ground up starting with a newly minted Fedora 6 slice from Slicehost.

Secure the Slice

The first steps are exactly the same as my previous post,  Securing a new Fedora 6 Slice.  If you followed [...]]]></description>
			<content:encoded><![CDATA[<p>In this post I&#8217;m going to walk you through the steps to create a Drupal based web site from the ground up starting with a newly minted Fedora 6 slice from Slicehost.</p>

<h2>Secure the Slice</h2>

<p>The first steps are exactly the same as my previous post,  <a href="http://www.fluidblog.com/articles/2007/04/14/securing-ssh-on-fedora-6-slice">Securing a new Fedora 6 Slice</a>.  If you followed the instructions in that post you have accomplished the following</p>

<ul>
<li>changed root&#8217;s password</li>
<li>yum updated the base installation</li>
<li>yum installed sudo</li>
<li>created a user with sudo privileges</li>
<li>created a public key on your local machine and copied it to your slice</li>
<li>disabled password authentication via ssh</li>
<li>diabled challenge response authentication via ssh</li>
<li>changed the ssh port</li>
<li>disabled root login via ssh</li>
<li>installed denyhosts (optional)</li>
<li>installed and configured a firewall using iptables</li>
</ul>

<h2>Install Required Software</h2>

<p>Now that the slice is more secure, we can install the software required by Drupal.</p>

<ul>
<li>Apache Web Server, httpd</li>
<li>MySQL Server</li>
<li>PHP</li>
<li>GD</li>
<li>Sendmail</li>
</ul>

<p>Login into the slice and yum install the following packages</p>

<pre class="screen">
$ /usr/bin/sudo /usr/bin/yum -y install \
&gt; wget \
&gt; tar \
&gt; gzip \
&gt; make \
&gt; gcc \
&gt; openssh-clients \
&gt; mysql \
&gt; mysql-server \
&gt; php \
&gt; php-mysql \
&gt; php-devel \
&gt; php-gd \
&gt; gd \
&gt; gd-devel \
&gt; httpd  \
&gt; sendmail \
&gt; sendmail-mc \
&gt; sendmail-cf</pre>

<p>Start the mysqld server</p>

<pre class="screen">
$ /usr/bin/sudo /etc/init.d/mysqld restart</pre>

<p>Ensure MySQL starts at boot</p>

<pre class="screen">
$ /usr/bin/sudo /sbin/chkconfig --add mysqld
$ /usr/bin/sudo /sbin/chkconfig --level 345 mysqld on</pre>

<p>Secure Initial MySQL Accounts</p>

<p>see <a href="http://dev.mysql.com/doc/refman/5.0/en/default-privileges.html">Securing the initial MySQL accounts</a></p>

<pre class="screen">
$  /bin/su
# /usr/bin/mysql -u root
mysql&gt; SET PASSWORD FOR 'root'@'localhost' = PASSWORD('newpwd');
mysql&gt; exit
# exit
$</pre>

<p>Create a Drupal system user</p>

<pre class="screen">
$ /usr/bin/sudo /usr/sbin/useradd -r drupal</pre>

<p>Create MySQL user account</p>

<p>see <a href="http://dev.mysql.com/doc/refman/5.1/en/create-user.html">CREATE USER Sytnax</a></p>

<pre class="screen">
$ /bin/su
# mysql -p -u root
mysql&gt; create user 'drupal'@'localhost';</pre>

<p>Create the database for your Drupal site</p>

<pre class="screen">
mysql&gt; create database mysite;
mysql&gt; grant all on mysite.* to 'drupal'@'localhost';
mysql&gt; exit;
# exit;
$</pre>

<p>Download the Drupal software</p>

<p>see <a href="http://drupal.org/drupal-5.2">Download</a></p>

<p>Change directories to <code>/usr/local/src</code></p>

<pre class="screen">
$ cd /usr/local/src

$ /usr/bin/sudo /usr/bin/wget \
&gt; http://ftp.drupal.org/files/projects/drupal-5.2.tar.gz

$ /usr/bin/sudo /bin/tar xvf drupal-5.2.tar.gz

$ usr/bin/sudo /bin/cp -r drupal-5.2 /var/www/html/mysite</pre>

<p>Make <code>settings.php</code> writeable by the web server user</p>

<pre class="screen">
$ cd /var/www/html/mysite

$ /usr/bin/sudo /bin/chown root.apache \
&gt; /var/www/html/mysite/sites/default/settings.php

$ /usr/bin/sudo /bin/chmod g+w \
&gt; /var/www/html/mysite/sites/default/settings.php</pre>

<p>Make the <code>files/</code> directory and its subdirectories writable by the web server user</p>

<pre class="screen">
$ /usr/bin/sudo /bin/mkdir files files/color files/css \
&gt; files/images files/images/temp

$ /usr/bin/sudo /bin/chown root.apache files files/color files/css \
&gt; files/images files/images/temp

$ usr/bin/sudo /bin/chmod g+w files files/color files/css \
&gt; files/images files/images/temp</pre>

<p>Set up cron</p>

<p>Add a shell script to <code>/etc/cron.hourly</code></p>

<pre class="programlisting">
# !/bin/sh
# $Id: cron-curl.sh,v 1.3 2006/08/22 07:38:24 dries Exp $
curl  --silent --compressed http://mysite.com/cron.php</pre>

<p>Get optional <a href="http://drupal.org/project/Modules">modules</a> and <a href="http://drupal.org/project/Modules">themes</a></p>

<p>Setup Apache Web Server</p>

<p>edit <code>/etc/httpd/conf/httpd.conf</code></p>

<pre class="programlisting">
#ServerName :80
ServerName www.mysite.com:80
#Listen 12.34.56.78:80
Listen your.slice.ip.addr:80</pre>

<p>Add</p>

<pre class="programlisting">
&lt;Files *.inc&gt;
    Deny From All
&lt;/Files&gt;
&lt;Files *.class&gt;
    Deny From All
&lt;/Files&gt;
&lt;Files MANIFEST&gt;
    Deny From All
&lt;/Files&gt;</pre>

<p>See this article for tuning Apache for <a href="http://hostlibrary.com/Configuring-Apache-for-Maximum-Performance">performance</a></p>

<p>Create a virtual host</p>

<p>Edit <code>/etc/httpd/conf/httpd.conf</code></p>

<pre class="programlisting">
&lt;VirtualHost hostname:80&gt;
ServerAdmin webmaster@mysite.com
DocumentRoot /var/www/html/mysite
ServerName www.mysite.com

Options -Indexes +FollowSymLinks
ErrorLog logs/mysite-error_log
CustomLog logs/mysite-access_log combined
DirectoryIndex index.html index.html.var index.php

&lt;Directory "/var/www/html/mysite"&gt;
  AllowOverride all
&lt;/Directory&gt;
&lt;/VirtualHost&gt;</pre>

<p>Start the Web Server</p>

<pre class="screen">
$ /usr/sbin/apachectl configtest
$ /sbin/chkconfig --add httpd
$ /sbin/chkconfig --level 345 httpd on
$ /usr/sbin/apachectl start</pre>

<p>Configure PHP</p>

<p>See <a href="http://www.php.net/manual/en/ini.core.php#ini.sect.resource-limits">Description of core php.ini directives</a></p>

<p>You may need to adjust
<code>
upload<em>max</em>filesize
post<em>max</em>size
</code>
There is a good security <a href="http://codex.gallery2.org/Gallery2:Security">article</a> on the Gallery2 site that is worth reading.</p>

<p>Install Drupal</p>

<p>Navigate to <code>http://mysite.com/install.php</code> and follow along with the online install.  When finished, change permission on <code>/var/www/html/mysite/sites/default/settings.php</code></p>

<pre class="screen">
$ /usr/bin/sudo /bin/chmod g-w /var/www/html/mysite/sites/default/settings.php</pre>

<h2>Setup Sendmail</h2>

<p><a href="http://www.linuxhomenetworking.com/wiki/index.php/Quick_HOWTO_:_Ch21_:_Configuring_Linux_Mail_Servers">Configure Linux Mail Servers</a>
is a comprehensive article, I&#8217;ll just hit the highlights.</p>

<p>Configure DNS correctly</p>

<p>Add the following records</p>

<ul>
<li>mail pointing to your slice&#8217;s IP,</li>
<li>MX pointing to mail</li>
<li>TXT pointing to v=spf1 a mx -all</li>
</ul>

<p>Configure /etc/resolv.conf</p>

<p>Add the following line above the line <code>nameserver</code></p>

<pre class="programlisting">
    domain mysite.com</pre>

<p>Configure /etc/hosts</p>

<pre class="programlisting">
127.0.0.1       mysite.com localhost.localdomain localhost</pre>

<p>Configure /etc/sendmail.mc</p>

<p>Make sure sendmail is listening on all interfaces (0.0.0.0)</p>

<pre class="programlisting">
$ /bin/netstat -an | grep :25 | grep tcp
tcp 0 0 0.0.0.0:25 0.0.0.0:* LISTEN</pre>

<p>Comment out <code>DAEMON_OPTIONS</code> in <code>/etc/mail/sendmail.mc</code> if it is only listening on loopback</p>

<pre class="programlisting">
dnl DAEMON_OPTIONS(\`Port=smtp,Addr=127.0.0.1, Name=MTA')</pre>

<p>Make sure these lines are commented out to avoid having your server used to forward spam</p>

<pre class="programlisting">
dnl FEATURE(`accept_unresolvable_domains')dnl
dnl FEATURE(`relay_based_on_MX')dnl</pre>

<p>Configure /etc/mail/access</p>

<p>Add your domain</p>

<pre class="programlisting">
# by default we allow relaying from localhost...
Connect:localhost.localdomain           RELAY
Connect:localhost                       RELAY
Connect:127.0.0.1                       RELAY
Connect:mysite.com                      RELAY</pre>

<p>Configure /etc/mail/local-host-names</p>

<p>Add all aliases for your server</p>

<pre class="programlisting">
    mysite.com</pre>

<p>Configure /etc/mail/virtusertable</p>

<p>Add email address/user pairs</p>

<pre class="programlisting">
    root@mysite.com myuser
    webmaster@mysite.com myuser
    postmaster@mysite.com myuser
    info@mysite.com myuser
    abuse@mysite.com myuser
    apache@mysite.com myuser</pre>

<p>Configure /etc/aliases</p>

<p>Edit user that receives root&#8217;s email</p>

<pre class="programlisting">
    # Person who should get root's mail
    #root:           marc
    root             webmaster@mysite.com</pre>

<p>Update /etc/sysconfig/iptables by adding</p>

<pre class="programlisting">
#Allow mail
-A INPUT -p tcp --dport 25 -j ACCEPT
-A OUTPUT -p tcp --dport 25 -j ACCEPT
-A INPUT -p tcp --dport 110 -j ACCEPT
-A OUTPUT -p tcp --dport 110 -j ACCEPT
</pre>

<h2>Optionally Configure spam tools</h2>

<p>see <a href="http://www.linuxhomenetworking.com/wiki/index.php/Quick_HOWTO_:_Ch21_:_Configuring_Linux_Mail_Servers">Configure Linux Mail Servers</a></p>

<h2>Optionally set up POP3</h2>

<p>If you want to read your mail using a client on your PC, you need to set up POP3.</p>

<pre class="screen">
$ /usr/bin/sudo /usr/bin/yum -y install dovecot
$ /usr/bin/sudo /sbin/chkconfig --add dovecot
$ /usr/bin/sudo /sbin/chkconfig --level 345 dovecot on
$ /usr/bin/sudo /etc/init.d/dovecot start</pre>

<p>Edit /etc/dovecot.conf</p>

<pre class="programlisting">
    #protocols = imap imaps pop3 pop3s
    protocols = pop3</pre>

<p>Configure your client to receive mail from <code>mail.mysite.com</code></p>

<p>I don&#8217;t recommend that try to configure your slice to relay mail from your PC&#8217;s client software. Just use your ISP&#8217;s SMTP server to send mail.  But if you insist, read this <a href="http://rimuhosting.com/support/settingupemail.jsp?mta=sendmail&amp;t=test">guide</a> first.</p>

<h2>Optionally install a mail client</h2>

<p>Pine is a simple lightweight mail reader that you can use to read mail from a terminal session when you are logged on to your slice.</p>

<pre class="screen">
$ usr/bin/sudo /bin/rpm -ivh http://rpm.livna.org/livna-release-6.rpm</pre>

<p>ensure enable=1 is set to enable=0 in the following files</p>

<pre class="programlisting">
    /etc/yum.repos.d/livna.repo
    /etc/yum.repos.d/livna-devel.repo
    /etc/yum.repos.d/livna-testing.repo</pre>

<p>This will disable the livna repository for regular yum updates</p>

<p>Then, you can install Pine with:</p>

<pre class="screen">
$ /usr/bin/sudo /usr/bin/yum --enablerepo=livna install pine</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.fluidblog.com/archives/9/feed</wfw:commentRss>
		</item>
		<item>
		<title>The Hiring Process, Bayes Theorem and American Idol</title>
		<link>http://www.fluidblog.com/archives/8</link>
		<comments>http://www.fluidblog.com/archives/8#comments</comments>
		<pubDate>Wed, 25 Jul 2007 12:00:37 +0000</pubDate>
		<dc:creator>trekr</dc:creator>
		
		<category><![CDATA[Business]]></category>

		<category><![CDATA[Hiring]]></category>

		<category><![CDATA[Management]]></category>

		<guid isPermaLink="false">http://mike.fluidblog.com/archives/8</guid>
		<description><![CDATA[It seems there is no shortage of blog posts about how to hire great workers.  Most focus on the interview technique. If you are looking for a job, you should read as many of these blog posts as you can.

Here are two of the best ones &#8230;


Marc Andreesen
Joel Spolsky


In this post, I suggest that [...]]]></description>
			<content:encoded><![CDATA[<p>It seems there is no shortage of blog posts about how to hire great workers.  Most focus on the interview technique. If you are looking for a job, you should read as many of these blog posts as you can.</p>

<p>Here are two of the best ones &#8230;</p>

<ul>
<li><p><a href="http://blog.pmarca.com/2007/06/how_to_hire_the.html" title="How to hire the best people you've ever worked with">Marc Andreesen</a></p></li>
<li><p><a href="http://www.joelonsoftware.com/articles/GuerrillaInterviewing3.html" title="The Guerrilla Guide to Interviewing">Joel Spolsky</a></p></li>
</ul>

<p>In this post, I suggest that if you are trying to hire great workers, the <strong>process</strong> may be more critical to your success then your interviewing technique.</p>

<p>Your success in hiring great workers depends on how many great workers you have the opportunity to interview and your ability to interview.</p>

<p>Suppose you can identify the top 10% of any group 90% of the time.  You get it wrong only 10% of the time.  After you interview ten candidates, you&#8217;ve narrowed the field to one from the top 10, and one from the bottom 9.  If you pick one now, you&#8217;re looking at a 50% chance that you pick correctly.  You don&#8217;t like symmetry?  Ok, you only get it wrong 5% of the time. Your odds of picking correctly have only improved to 2/3.  Clearly, its best not to choose after a single evaluation.</p>

<p>What can be done to improve the odds of choosing correctly? You can either increase the number of great hires within the population of candidates you interview, or use a process of sequential multiple evaluations. This post will focus on the latter approach, improving the process.  Unfortunately, everything we do to make our company attractive to great workers will make it attractive to everyone.</p>

<p>One way to improve your results is to make a serial sequence of decisions that narrows the field such that the number of qualified candidates remaining after each decision increases as a percentage of the total remaining candidates. In other words, the process has the effect of increasing the probability that any of the remaining candidates meets the criteria. A single elimination process is the easiest to illustrate.</p>

<p>In this post, I won&#8217;t address the merits of any particular evaluation technique. Pick any you like that is more effective than a coin toss.  That&#8217;s not a flippant comment; you have to do better than 50/50 for the following process to work. <a href="http://www.adlerconcepts.com/resources/2004/04/using_the_onequestion_intervie.php">Lou Adler</a>
has some interesting things to say about how to improve your interview techniques.</p>

<p>What does this look like in practice?  It looks like American Idol. The candidate pool is evaluated and only candidates that pass the first evaluation continue. In successive rounds the evaluation focuses on different criteria that are increasingly more challenging and more relevant.</p>

<p>To illustrate, I&#8217;ll use another numerical example. Suppose the goal is to hire someone who is in the top 20% of qualified candidates.  For the sake of argument assume all interviewers can pick winners 80% of the time and they mistakenly pick unqualified candidates only 20% of the time.</p>

<p>Because I like easy numbers, suppose 100 candidates are to be evaluated, and we need to pick one from the top 20.</p>

<p>After the first interview, 16 from the top 20% make it to the next round and 16 from the bottom 80% go forward as well. Sixty-eight are sent home. If a choice is made after the first interview, again, we only have a 50/50 chance of getting it right.  After the second interview, 13 from the top 20% survive and 3 from the bottom 80% survive.  One more round, and we&#8217;re done with over 94%  probability that we pick a top 20% candidate.    Pick any numbers you&#8217;d like, apply Bayes theorem, the results will point to the same process.  In each iteration we are increasing the prior probability that a surviving candidate meets our criteria.</p>

<p>Most interviewing processes don&#8217;t work this way.  Usually, a group interviews every candidate and a group decision is made. Invariably, the group is dominated by one or more influential members. Effectively, there is a single decision maker. But we&#8217;ve seen that even if an interviewer has a very good technique for selecting qualified candidates, she doesn&#8217;t have a very good chance of getting it right when the prior probability of a candidate meeting the criteria is low.</p>

<p>In the single elimination process I&#8217;ve described, it&#8217;s important that the interviewers do not discuss candidates prior to interviewing the candidate, or prior to making their decision.  It&#8217;s equally important that they understand the math and don&#8217;t rely on the fact that the candidates  have survived previous interviews.</p>

<p>What are the downsides to a single elimination process? No matter how you go about it, qualified candidates will not be selected.  Therefore, it&#8217;s important that candidates understand the process and are treated respectfully.</p>

<p><strong>Thanks</strong> to Greg Yut of Supply Beyond for reading and commenting on a draft of this post.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.fluidblog.com/archives/8/feed</wfw:commentRss>
		</item>
		<item>
		<title>Changing a Drupal Site&#8217;s Domain</title>
		<link>http://www.fluidblog.com/archives/7</link>
		<comments>http://www.fluidblog.com/archives/7#comments</comments>
		<pubDate>Fri, 11 May 2007 23:14:53 +0000</pubDate>
		<dc:creator>trekr</dc:creator>
		
		<category><![CDATA[Deployment]]></category>

		<category><![CDATA[Drupal]]></category>

		<category><![CDATA[Fedora]]></category>

		<category><![CDATA[Installation]]></category>

		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://mike.fluidblog.com/archives/7</guid>
		<description><![CDATA[I recently needed to change a Drupal installation from www.example.com to subdomain.example.com.  Here&#8217;s how I did it.  There is probably a shorter way, but these steps leave the current site untouched until you are sure the new one works.


Get ready
Clear Drupal cache
Dump current Drupal database to a backup file
Create new database and grant [...]]]></description>
			<content:encoded><![CDATA[<p>I recently needed to change a Drupal installation from <code>www.example.com</code> to <code>subdomain.example.com</code>.  Here&#8217;s how I did it.  There is probably a shorter way, but these steps leave the current site untouched until you are sure the new one works.</p>

<ul>
<li>Get ready</li>
<li>Clear Drupal cache</li>
<li>Dump current Drupal database to a backup file</li>
<li>Create new database and grant privileges</li>
<li>Run stream editor against db backup file to fix paths</li>
<li>Load new database</li>
<li>Ensure current Drupal site directory is updated in svn</li>
<li>Export current site from repository to new directory</li>
<li>Edit files/default/settings.php to point to new db</li>
<li>Configure Apache to use new directory</li>
<li>Enter a new A record in DNS</li>
<li>Test</li>
</ul>

<p>To get ready, turn off css cache and make sure <code>$base_url</code> is commented out in your <code>files/default/settings.php</code> file.  I also disabled clean URLs, not sure if it&#8217;s necessary. Your Drupal directory should already be under subversion control. Put up your maintenance page since we are going to clear cache.</p>

<p>The next step is to clear Drupal cache and dump the database to a backup file.</p>

<p>Put the following code in a page, and set up a menu to access it.  Make sure you restrict access to admin for this menu. Don&#8217;t be lazy and do it from the command line - it&#8217;ll be faster this time and slower next.</p>

<pre class="programlisting">
db_query("DELETE FROM {cache} WHERE 1");
db_query("DELETE FROM {cache_filter} WHERE 1");
db_query("DELETE FROM {cache_menu} WHERE 1");
db_query("DELETE FROM {cache_page} WHERE 1");</pre>

<p>Ok, for the command line purists, it&#8217;s like this</p>

<pre class="screen">
mysql&gt; DELETE FROM cache WHERE 1;</pre>

<p>Then dump the database to a backup file</p>

<pre class="screen">
$ /usr/bin/mysqldump dbname &gt; mysite-backup.sql</pre>

<p>Make the current site accessible again by taking down your maintenance page.</p>

<p>Create a new database and grant privileges</p>

<pre class="screen">
mysql&gt; create database newdbname;
mysql&gt; grant all on newdbname.* to 'drupaluser'@'localhost';</pre>

<p>Before loading the new database we need to run a stream editor against the backup file and change any hard coded paths.</p>

<p>Check the backup file by grep&#8217;ing for the previous subdomain and path of the current install directory</p>

<pre class="screen">
$ /bin/grep -r mysite *

$ /bin/grep -r www *</pre>

<p>Then edit something like this</p>

<pre class="screen">
$ /usr/bin/perl -pi.bak -e's/http:\/\/www.example/http:\/\/subdomain.example/g;
&gt; s/\/html\/mysite/\/html\/mynewsite/g;' mysite-backup.sql</pre>

<p>And load the new database</p>

<pre class="screen">
$ /usr/bin/mysql newdbname &lt; mysite-backup.sql</pre>

<p>My Drupal installation was in <code>/var/www/html/mysite</code> under subversion control.  After cleaning up the working directory and checking in all changes, I exported mysite into a new working directory.</p>

<pre class="screen">
$ cd /var/www/html
$ /usr/bin/svn export file:///var/svn/repos/mysite mynewsite</pre>

<p>Make sure the <code>files/</code> directory is writable by the web server process owner.</p>

<p>Edit <code>files/default/settings.php</code> and change the name of the database, and make sure <code>$base_url</code> is commented out.</p>

<p>Configure a new virtual host in <code>/etc/httpd/conf/httpd.conf</code> for subdomain.example.com.  Copy the <code>&lt;virtualhost&gt;</code> entry for <code>www.example.com</code> and change the <code>ServerName</code>, <code>DocumentRoot</code>, and log file name.  If you have any Directory entries, change the path to your new directory.  Check your configurations changes</p>

<pre class="screen">
$ /usr/sbin/apachectl configtest</pre>

<p>And restart</p>

<pre class="screen">
$ /usr/sbin/apachectl graceful</pre>

<p>Make sure you have an A record in DNS for subdomain.example.com</p>

<p>Finally test every page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.fluidblog.com/archives/7/feed</wfw:commentRss>
		</item>
	</channel>
</rss>
