|Up: Martijn's Homepage|
|Prev: Puzzle solving by computer||Next: Australia Post Barcode Perl Module|
The purpose is to copy one table in one database to another database efficiently. In this sense it is similar to the popular rsync program. It uses the data already there to reduce the amount of data that needs to be sent.
However, more than plain copying, this program allows you to control exactly which data is copied out. For example, you can limit both which rows are copied out as well as which columns. For data that needs to be partially obscured you can apply an expression to modify the data copied.
There are several potential uses for this. They include:
Instead of copying out a dump of the database every single day you can run a PostgreSQL server and use this to efficiently bring it up to date at the end of the day. Because it uses ssh to carry the data, there are no problems with security.
Even on-site backups are possible. You can keep your development server moderatly up-to-date without all the trouble involved with replication.
In situations where customers are able to view information from a website, generally there is much more information that they are *not* permitted to see. Information that may be commercially sensetive. By allowing you to be specific about the information copied, you can easily keep the public site up-to-date without worrying about copying confidential data.
pgrsync -s <dbspec> -d <dbspec> -t tablelist [-o option] [-e ssh] [-p prog] -s <dbspec> Database to copy from -d <dbspec> Database to copy to -t tablelist Comma seperated list of tables, may be specified multiple times. -o <option> Option to control what is copied -e ssh Alternative to ssh to use. Mayb behave the same -p prog pgrsync executable on remote machine <dbspec> := <login>@<host>:<user>@<dbname>/<port> The colon and <dbname> are the only required parts. The usernames are optional, in which case the @ symbols must be omitted. The port number defaults to UNIX domain. <option> is one of: map:table=table2 Map tablename for remote machine map:table.field=field2 Map fieldname in table for remote machine expr:table.field=expr Replace field with expression cond:table=<expr> Condition for WHERE clause to restrict copying fields:table=<tablelist> List of fields to copy. ! excludes. *=all. Last matchThe database specification is very flexible. Examples include:
A more serious problem is that of encodings, as that determines the default order of string output, in particular whether it is case-sensetive or not. Because Perl uses case sensetive comprisons it may sometimes disagree with the database and thus produce a non-optimal change set. However, we can still see that the databases have the rows in the same order. So in most cases this will not be a problem.
Even worse, if the source and destination use different encodings, the order will be different and the program will *never* see them as the same and send the same data each time. It is not clear what can be done about this. Hopefully tables where the primary key is a string whose case varies are not common. At least the end result still has the right data arriving.
Also in the interests of efficiency, this program requires the table to have a single column primary key. This is to avoid a possibly quite expensive sort on a large table. While using the index may be less efficient time-wise, it is much more efficient than an explicit sort. Note this restriction may be lifted in the future.
Similarly, this program could copy the structure of the table also, automatically creating and dropping columns as required. Creating and dropping triggers, functions and indexes would be an obviously useful feature. This program does not attempt to check datatypes. There was an attempt but assessing compatability proved too hard. Instead the transfer is cleanly aborted if any of the updates fail.
Finally, this program only copies between PostgreSQL databases. In principle it could work between any two databases. Only a few basic features are required: (1) support for cursors (DECLARE/FETCH) (sorry mysql) and (2) to be able to extract information about the structure of the database (tables/fields/indexes). If someone wants to extend the PGDb module to another database it would be appreciated.
Version 0.01 - Initial release (01 Oct 2002)
You can download it here (latest version 0.01).
Currently the following features are not supported. See above for more details.
Martijn van Oosterhout
This program is distributed under the GPL.
Up: Martijn's Homepage Prev: Puzzle solving by computer Next: Australia Post Barcode Perl Module
Copyright © 2000-2006 - Last modified 3/10/2002
This program is distributed under the GPL.