08 May 2007

Silhouette Clone: Part 3

Impromptu

Previously, we created a Subversion repository and automatically populated it with data from a directory which was shared over the network. This allowed users to work on the share while automated processes on the server stored versioned copies of the directory for later access. Now we will provide point-in-time recovery options for files in our repository without making users install and use a Subversion client.

Subversion repositories can be accessed over the network via WebDAV by means of special Apache HTTPD modules. The problem with this approach is that old versions aren't accessible using this method. Even after employing simple workarounds to present users with a prior versions, users would still have two places to look for data: the actual network share and a the URL of the WebDAV interface to the repository.

We can get around this by using WebDAV to serve up the Subversion repository and then re-sharing the WebDAV resource using the same protocols we shared the original network share with. Confused? Follow along.

The first thing we need to do is implement point-in-time tags in our Subversion repository. Subversion can copy files from one location in the repository to another without duplicating the file data. This means we can implement tags by simply copying files! For minimal impact, we will perform this copy entirely within the repository.

Commonly, Subversion repositories have a trunk/ or head/ directory where most of the work happens. We will also add a point-in-time/ directory to store our point-in-time copies. An easy way to create these directories is to use the svn mkdir command on our repository.

$ svn mkdir /path/to/repository/head -m "Created head/"
$ svn mkdir /path/to/repository/point-in-time -m "Created /point-in-time/"


Note the use of the -m option to specify a commit message. We are performing operations directly on the repository and are therefore creating new revisions of it. Subversion demands you leave a message (even an empty one) when creating a new revision.

Now that we are using a head/ directory, all of our day-to-day work should be performed there. Users do not need to be aware of the head/ directory at this point so we will simply pretend that head/ is the root of our repository.

$ svn checkout /path/to/repository/head /path/to/working/copy


Any changes to the working copy will be committed to the head/ directory in our repository transparently.

Now that we have a repository and working copy set up to utilize a head/ directory, we can continue as we did in parts 1 and 2 to enable automatic commits. From this point, implementing point-in-time copies is quick and easy. A simple shell script to copy head/ to a subdirectory of point-in-time/ at regular intervals will get the job done.

#!/bin/bash
svn copy file:///path/to/repository/head "file:///path/to/repository/snapshots/`date +%F\ %T`" -m "Point-in-time marker added"


Note that while we can put the above command in the same file as our previous commands to commit changes to the Subversion repository, we don't have to. Separate scripts will let us use separate schedules for our actual backups and point-in-time tags. This allows us to back up data frequently without presenting a user attempting to recover a file him or herself with an overwhelming number of file iterations to wade through.

Once our backup commands and point-in-time tags commands are running at regular intervals, we will have a repository layout that is fairly self explanatory.

$ svn list /path/to/repository
head/
point-in-time/

$svn list path/to/repository/point-in-time
2007-05-08 17:00:00/
2007-05-08 17:15:00/
2007-05-08 17:30:00/


Note that by modifying the date format string in our point-in-time tag command we can change how subdirectories of our point-in-time/ directory are named. 24-hour time is used instead of 12-hour time to make sure directories are always sorted in chronological order.

Now that we have our data efficiently stored and laid out, we have to provide access to it. Most file sharing packages do not understand Subversion repositories, so we will have to build a bridge. We can start by using Apache HTTPD to provide WebDAV access to the repository. After installing and enabling the mod_dav_svn module, a few lines in an Apache configuration file will do.

<Location /repository>
DAV svn
SVNPath /path/to/repository
</Location>


Ideally, you will want to lock-down this location using a combination of users, passwords, and IP addresses. See the Apache documentation for more information.

Once Apache is reloaded and serving the repository, install davfs2. When installed properly, you can mount the Subversion repository via WebDAV as you would any other file system. A simple setup is to create a directory for the network share, and place your data in a subdirectory. This will let you mount the Subversion repository along side your data.

$ mkdir /path/to/network/share
$ mkdir /path/to/network/share/data
$ mkdir /path/to/network/share/backups

$ mount -t davfs -o ro,noaskauth http://localhost/repository/point-in-time/ /path/to/network/share/backups


Note that you do not want to allow the mounted backup directory to be modified.

Mirroring our Subversion repository layout in our local file sytem seems silly, but we used this setup for a reason. Providing users direct access to the live Subversion repository for use as the live network share would generate up to 3 new revisions in our repository each time a file is saved! This is due to the rename-write-delete method commonly used to avoid data loss when saving files. Multiple revisions aren't too bad, but the way in which these revisions come about prevents Subversion from storing data efficiently.

Using Subversion as a (mostly) transparent replacement for Microsoft Window Server's Shadow Copy for Shared Folders can be quite cumbersome, but its a viable alternative. Current developments in Linux file systems will likely render the need for this workaround obsolete in the near future. Until then, take it one step at a time and enjoy automatic, versioned backups of your network shares.

2 comments:

Christopher Ingram said...

Ext3cow is an open source versioning file system based on Ext3. If you don't need the branches and tags of a a RCS system, this is a great alternative for point-in-time recovery. There are still some advantages a RCS has over a solution like this, but it will likley be a great option for many.

Maarty said...

Hi, You will become the similar result, when youll use the rsync tool with the backup directory settings similar as desribed here before (--backup-to=`date %F...` ). You can share the backup directory thru samba or Apache - the result will be the same.
I recommand to use this tool instead of the svn + webdav, but only when You dont need to use Webdav.