PostgreSQL table synchroniser

Up: Martijn's Homepage
Prev: Puzzle solving by computer	Next: Australia Post Barcode Perl Module

The purpose is to copy one table in one database to another database efficiently. In this sense it is similar to the popular rsync program. It uses the data already there to reduce the amount of data that needs to be sent.

However, more than plain copying, this program allows you to control exactly which data is copied out. For example, you can limit both which rows are copied out as well as which columns. For data that needs to be partially obscured you can apply an expression to modify the data copied.

There are several potential uses for this. They include:

Offsite backups
Instead of copying out a dump of the database every single day you can run a PostgreSQL server and use this to efficiently bring it up to date at the end of the day. Because it uses ssh to carry the data, there are no problems with security.
Even on-site backups are possible. You can keep your development server moderatly up-to-date without all the trouble involved with replication.
Customer data protection
In situations where customers are able to view information from a website, generally there is much more information that they are *not* permitted to see. Information that may be commercially sensetive. By allowing you to be specific about the information copied, you can easily keep the public site up-to-date without worrying about copying confidential data.

USAGE

pgrsync -s <dbspec> -d <dbspec> -t tablelist [-o option] [-e ssh] [-p prog]

    -s <dbspec>        Database to copy from
    -d <dbspec>        Database to copy to
    -t tablelist       Comma seperated list of tables, may be specified
                       multiple times.
    -o <option>        Option to control what is copied

    -e ssh             Alternative to ssh to use. Mayb behave the same
    -p prog            pgrsync executable on remote machine

  <dbspec> := <login>@<host>:<user>@<dbname>/<port>

    The colon and <dbname> are the only required parts. The usernames are
    optional, in which case the @ symbols must be omitted. The port number
    defaults to UNIX domain.

  <option> is one of:

      map:table=table2             Map tablename for remote machine
      map:table.field=field2       Map fieldname in table for remote machine
      expr:table.field=expr        Replace field with expression
      cond:table=<expr>            Condition for WHERE clause to restrict copying
      fields:table=<tablelist>     List of fields to copy. ! excludes. *=all. Last match

The database specification is very flexible. Examples include:

:test - test database on local machine
swizzle:test - test database on host swizzle
wally@swizzle:james@test - Log into host swizzle with username wally and then connect to the database test with user james.

INSTALLATION

At the moment the installation is a bit awkward. The single external module needs to be found by the main program. The easiest way to acheive this is to place PGDb.pm somewhere where the ssh login will find it. /usr/local/lib/site_perl is usually a good choice. The actual main script needs to be in your path, unless you use the -p switch.

LIMITATIONS

The optimisation in this program relies somewhat on being able to determine what order rows are supposed to be in so that you can easily determine which are missing. The simplest distinction is between numeric and string ordering. This must be detected to work properly.

A more serious problem is that of encodings, as that determines the default order of string output, in particular whether it is case-sensetive or not. Because Perl uses case sensetive comprisons it may sometimes disagree with the database and thus produce a non-optimal change set. However, we can still see that the databases have the rows in the same order. So in most cases this will not be a problem.

Even worse, if the source and destination use different encodings, the order will be different and the program will *never* see them as the same and send the same data each time. It is not clear what can be done about this. Hopefully tables where the primary key is a string whose case varies are not common. At least the end result still has the right data arriving.

Also in the interests of efficiency, this program requires the table to have a single column primary key. This is to avoid a possibly quite expensive sort on a large table. While using the index may be less efficient time-wise, it is much more efficient than an explicit sort. Note this restriction may be lifted in the future.

Similarly, this program could copy the structure of the table also, automatically creating and dropping columns as required. Creating and dropping triggers, functions and indexes would be an obviously useful feature. This program does not attempt to check datatypes. There was an attempt but assessing compatability proved too hard. Instead the transfer is cleanly aborted if any of the updates fail.

Finally, this program only copies between PostgreSQL databases. In principle it could work between any two databases. Only a few basic features are required: (1) support for cursors (DECLARE/FETCH) (sorry mysql) and (2) to be able to extract information about the structure of the database (tables/fields/indexes). If someone wants to extend the PGDb module to another database it would be appreciated.

EFFICIENCY

Since one the points of this program is to copy efficiently obviously this need to be explained. Currently it involves the destination to send about 8-12 bytes per row in the table. The source will then send the data to the destination. No changed data means the source sends nothing at all. This should be addressed soon as it could be improved for very large tables (>1,000,000 rows).

REVISION HISTORY

Version 0.01 - Initial release (01 Oct 2002)

DOWNLOAD

You can download it here (latest version 0.01).

BUGS / LIMITATIONS

Currently the following features are not supported. See above for more details.

Requires single column non-null primary key.
Has to guess whether primary key is numeric or string.
Cannot handle locale differences (but works well enough).

AUTHOR

Martijn van Oosterhout October 2002
http://svana.org/kleptog/

LICENCE

This program is distributed under the GPL.

Up: Martijn's Homepage
Prev: Puzzle solving by computer	Next: Australia Post Barcode Perl Module

By Martijn van Oosterhout (kleptog (at) svana.org)
Copyright © 2000-2006 - Last modified 3/10/2002