Introducing Tagged Types

Up: PostgreSQL stuff
Prev: PostgreSQL Patches	Next: The cube puzzle

Tagged Types are a pet project of mine that I came up with a while ago that just seemed like a really nice idea. The idea is that you take an ordinary datatype and add a tag to it and preserve that tag across various operators.

The obvious applications would be:

A real money type: a numeric with a currency tag
A timestamp that remembers the timezone
A text type with collation order (new as of 0.2)

Anyway, I actually implemented it, you can download it below.

Examples
How it works
Operators
Auxiliary functions
FAQ
Downloading

1. Examples

Here are some examples of what you can do:

test=# select c1, print_currency(c1) from c;
     c1     | print_currency 
------------+----------------
 232.44 USD | US$    232.44
 21.20 EUR  | €       21.20
 -13.44 AUD | AU$    -13.44
 0.01 USD   | US$      0.01
 14.00 AUD  | AU$     14.00
(5 rows)
test=# select c1 * 5 from c where tag(c1) = 'USD';
  ?column?   
-------------
 1162.20 USD
 0.05 USD
(2 rows)
test=# select t,"timestamp"(t) from c;
                    t                    |      timestamp      
-----------------------------------------+---------------------
 2005-08-14 02:00:00+02 Europe/Amsterdam | 2005-08-14 02:00:00
 2005-08-14 02:00:00+02 Australia/Sydney | 2005-08-14 10:00:00
 2005-08-14 02:00:00+02 Asia/Hong_Kong   | 2005-08-14 08:00:00
 2005-08-14 02:00:00+02 America/New_York | 2005-08-13 20:00:00
 2005-08-14 02:00:00+02 Asia/Kuwait      | 2005-08-14 03:00:00
(5 rows)

As you can see there are some rough edges, especially with the timestamp code. That last example won't even work unless you have a recent version of PostgreSQL as it won't recognise the time zones. After all, this only defines the types, not the code to produce the answers. The tarball you can download below contains example SQL scripts for currencies, timestamps that remember timezones and locale specific text sorting.

2. How it works

This code merely exercises the type system PostgreSQL already provides. In the beginning you provide a table (WITH OIDS) that lists all the tags you permit. Then you execute the following:

  SELECT create_tagged_type( 'currency', 'numeric(8,2)', 'currencies' );

The first argument currency is the name of the type you want to create. The second argument is the base type. Note that you can specify the so called typmod of the type to specify things like precision. The third argument currencies is the table of tags to use.

What happens is that the module looks up the details for the base type and creates the new type with the correct parameters for storage. The storage is essentially what is used by the base type plus 4 bytes for the OID. For details see the FAQ.

When you insert a value it extracts the last word from the string (space delimited) as the tag. Hence you cannot have spaces in your tags. This tag is looked up in the tag table to check it is allowed. The remainder of the string is passed to the normal input function for the base type. These are then combined and stored as a unit.

3. Operators

Of course, a type by itself is not useful, you need to be able to do operations on it. After all, a currency is a kind of number so you should be able to number type things with it. This module does nothing automatically, since it may not make sense. However it does provide a function to make it easier:

# select create_tagged_operator( 'currency','*','numeric' ,'currency');
NOTICE:  *(currency,numeric) => currency maps to *(numeric,numeric) => numeric
# select create_tagged_operator( 'numeric', '*','currency','currency');
NOTICE:  *(numeric,currency) => currency maps to *(numeric,numeric) => numeric
# select create_tagged_operator( 'currency','/','currency','numeric' );
NOTICE:  /(currency,currency) => numeric maps to /(numeric,numeric) => numeric
# select create_tagged_operator( 'currency','/','numeric' ,'currency' );
NOTICE:  /(currency,currency) => numeric maps to /(numeric,numeric) => numeric
# select create_tagged_operator( 'currency','+','currency','currency');
NOTICE:  +(currency,currency) => currency maps to +(numeric,numeric) => numeric
# select create_tagged_operator( 'currency','-','currency','currency');
NOTICE:  -(currency,currency) => currency maps to -(numeric,numeric) => numeric

As you can see, it lets you create operators and automatically determines what base operator to use. This only produces simple operators. If the two arguments are both tagged, it requires them to have the same tag or it will error. If you want something more sophisticated, the auxiliary functions below will help.

Creating the operators above means you can now add two currency values or multiply a currency value by a plain number, but you can't add a currency type to a plain number or multiply two currency values. For simple examples like this this is enough.

4. Auxiliary functions

Above I demonstrated ways to create simple operations on the new types. However in real-life you're going to want to be able to do a lot more. Hence the module defines several auxiliary functions to help you on your way.

The auxiliary type

Before getting too far, you need to know about the auxiliary type that is created alongside each tagged type. It is created with the same name but with _t appended. The reason it exists is because PostgreSQL doesn't allow you to lookup fields in base types, only composite types. Hence for the type currency above, the auxiliary type is defined as:

  CREATE TYPE currency_t ( value numeric(8,2), tag text );

There are implicit casts defined between the auxiliary type and the tagged type so you can choose which to use in your functions. However, the auxiliary type allows you to access the value and tag directly, which may help.

Defined functions

tag( taggedtype )
Returns the tag in textual form associated with the tagged value
value( taggedtype )
Returns the value associated with the tagged value
tagdata( taggedtype )
Returns the tuple associated with the tag
This requires a little more explanation. Above I indicated that you needed to create a table with all valid tags. However, you are allowed to have more columns, providing meta-data for your functions. You may have been wondering how the print_currency in the example above knew how to format the value. Well, here is the definition of that function:
```
  CREATE FUNCTION print_currency( currency ) RETURNS text 
  AS 'SELECT to_char( value($1), (tagdata($1)).format )' 
  LANGUAGE 'sql' STRICT STABLE;
```
There is a second column in the currency table (called format) which describes the format to pass to to_char() for this value. This allows you to abstract away various details from your code.
taggedtype( basetype, tag )
Creates the tagged type directly from its components.
taggedtype_t( taggedtype )
taggedtype( taggedtype_t )
Explicit conversions between the tagged and auxiliary types.

5. FAQ

5.1. Can you index these columns?

I've given some thought as to creating a default index class for tagged types. However, it's not clear what order would be useful. Do you ignore the tag while ordering or order by tag first. I can't think of any use-cases that couldn't be handled by a functional index.

5.2. Why a new type, why not just use a composite type?

Three reasons actually:

Composite types are inefficient, space-wise. They have gotten more efficient as the versions progress, but they still store a whole tuple-header on disk for each value.
Composite types cannot have custom input/output functions. In recent versions of PostgreSQL there is actually a default output for composite types but it's still not nice enough to use normally.
Using a composite type you would either have to store the actual value of the tag, which is incredibly wasteful of disk space (those time zones names can get quite long), or store the OID and leave users with unreadable values (which wouldn't survive a dump/restore). A base type allows you to completely hide these implementation details.

5.3. What are the storage requirements

Tagged types take only the size of an OID (usually four bytes) extra on top of the base value. So for variable length types the length gets four bytes longer. For fixed types the value is stuck on the front. The only exception is really short types like shortint and "char" that are passed by value within a single datum. These become a fixed size of a datum plus an OID, usually around 8 bytes total.

5.4. Any limitations I should know about?

Well,

Tags can't have spaces, quotes or backslashes in them.
Tags can't be longer than 32 characters.
When you delete a tagged type some cruft is left behind (not in the system catalog though). PostgreSQL doesn't provide a way to tell the module that its type just got deleted. It won't hurt though unless you start doing it a lot.

5.5. What about performance?

I haven't tested huge tables so I can't really comment. There are a few caches internally to avoid looking up too many things at once but looking up tags can be expensive. In short, unknown at this stage.

5.6. What if I delete stuff from my table of tags?

Don't do that! Seriously, this is like deleting a row out of the a system catalog and wondering why something doesn't work anymore.

Strictly speaking, if a value is printed with a tag that no longer there, it'll print the OID instead (and display a NOTICE). This should be enough for you to find them (ie WHERE tag(x) = '#94823') and either delete them or construct a new value with the correct tag.

I've given thought to having a kind of foreign key reference to give a error when you delete a tag in use, but that would require an index on every column using the type to be efficient. Also, a trigger on the tagtable to stop people updating entries may be useful. However, I decided that in the end it's the admin's responsibility. They should revoke all privileges from the tag table if they feel it's a risk.

5.7. What about other functions, unary operators or aggregates?

I haven't gone to the trouble of creating helper functions for these things. it's a case of diminishing returns. Operators are useful and slightly more difficult to create. Function are hard in the general case, but trivial if you know what you want. For example, say you wanted to support abs() for the currency type. You would create the function as follows:

  CREATE FUNCTION abs(currency) RETURNS currency 
  AS 'SELECT currency( abs( value($1) ), tag($1) )' 
  LANGUAGE SQL STRICT IMMUTABLE;

It's easy doing it by hand, you can just write it down. To write a function that determines the right types and base function and calling everything at the right time is anything but trivial.

Note, writing it as an SQL language function as above means it will be in-lined directly into your queries.

5.8. Why use OIDs?

Well, because they're reasonably efficient to search for and they're non-updateable. They're automatically created (when you ask for them in the CREATE TABLE). For the code to check you've created a serial of the required type in the right place and that you've created a unique index and etc... It's easier just to use OIDs. Yes, they could wrap, but with user tables no longer having OIDs by default this should become less of a problem. Besides, the list of tags shouldn't change.

5.9. Bugs?

Yes, probably.

5.10. What if I want a different precision?

The above currency type was defined with two decimal places. This is usually fine but if you want more precision you'll need to redefine the type, or create a different version of the type for each precision you want. PostgreSQL doesn't allow user defined types to have parameters, though this is on the TODO list.

One side-effect of this is that if you do calculations with the currency type, the precision is enforced at each step. So if you divide by a number, the result will immediatly be rounded to 2 decimal places. If you have intermediate steps that require higher precision it may be worthwhile to extract the numeric value and operate on it directly, coercing it back to currency only at the end.

5.11. What versions does this work on?

I've tested it on 7.4 and on 8.1devel. There are a few #ifs in there to handle the changes made in the server code. The changes have been keyed to the CATALOG_VERSION_NO where the change happened, so it should work for all in between versions also, but I havn't tested that. There is a test file included. If you can successfully run it through psql without any strange errors, you're probably fine. The only expected one is older versions not recognising the timezones for conversion.

5.12. How do I (un)install it?

See the included INSTALL file.

5.13. What now?

Well, user defined types are still somewhat second-class citizens in PostgreSQL. For example, they can't take parameters like char and varchar. Imagine the currency type above, but with configurable precision. Another interesting avenue would tags be controlled by functions. The example would be to be able have a physics_unit type that behaved like:

  SELECT '2.5 ms-1'::physics_unit * '4 s'::physics_unit;
returns:
  10 m

Maybe not exactly something you want in a database but it would be a great example of what user-defined types can do.

This is however just a hobby thing, I'm not using it myself for anything major (yet) but I did learn quite a bit about PostgreSQL internals while writing it.

6. Downloading

TaggedTypes is upto release 0.2.

Up: PostgreSQL stuff
Prev: PostgreSQL Patches	Next: The cube puzzle

By Martijn van Oosterhout (kleptog (at) svana.org)
Copyright © 2000-2006 - Last modified 1/09/2005