Page Files

Page Files A description of the database file default page format. This section provides an overview of the page format used by PostgreSQL tables. User-defined access methods need not use this page format. In the following explanation, a byte is assumed to contain 8 bits. In addition, the term item refers to data that is stored in PostgreSQL tables. shows how pages in both normal PostgreSQL tables and PostgreSQL indexes (e.g., a B-tree index) are structured. This structure is also used for toast tables and sequences. There are five parts to each page. Sample Page LayoutPage Layout Item Description PageHeaderData 20 bytes long. Contains general information about the page to allow to access it. itemPointerData List of (offset,length) pairs pointing to the actual item. Free space The unallocated space. All new tuples are allocated from here, generally from the end. items The actual items themselves. Different access method have different data here. Special Space Access method specific data. Different method store different data. Unused by normal tables.

The first 20 bytes of each page consists of a page header (PageHeaderData). It's format is detailed in . The first two fields deal with WAL related stuff. This is followed by three 2-byte integer fields (lower, upper, and special). These represent byte offsets to the start of unallocated space, to the end of unallocated space, and to the start of the special space. PageHeaderData LayoutPageHeaderData Layout Field Type Length Description pd_lsn XLogRecPtr 6 bytes LSN: next byte after last byte of xlog pd_sui StartUpID 4 bytes SUI of last changes (currently it's used by heap AM only) pd_lower LocationIndex 2 bytes Offset to start of free space. pd_upper LocationIndex 2 bytes Offset to end of free space. pd_special LocationIndex 2 bytes Offset to start of special space. pd_opaque OpaqueData 2 bytes AM-generic information. Currently just stores the page size.

Special space is a region at the end of the page that is allocated at page initialization time and contains information specific to an access method. The last 2 bytes of the page header, opaque, currently only stores the page size. Page size is stored in each page because frames in the buffer pool may be subdivided into equal sized pages on a frame by frame basis within a table (is this true? - mvo). Following the page header are item identifiers (ItemIdData). New item identifiers are allocated from the first four bytes of unallocated space. Because an item identifier is never moved until it is freed, its index may be used to indicate the location of an item on a page. In fact, every pointer to an item (ItemPointer, also know as CTID) created by PostgreSQL consists of a frame number and an index of an item identifier. An item identifier contains a byte-offset to the start of an item, its length in bytes, and a set of attribute bits which affect its interpretation. The items themselves are stored in space allocated backwards from the end of unallocated space. The exact structure varies depending on what the table is to contain. Sequences and tables both use a structure named HeapTupleHeaderData, describe below. The final section is the "special section" which may contain anything the access method wishes to store. Ordinary tables do not use this at all (indicated by setting the offset to the pagesize). All tuples are structured the same way. A header of around 31 bytes followed by an optional null bitmask and the data. The header is detailed below in . The null bitmask is only present if the HEAP_HASNULL bit is set in the t_infomask. If it is present it takes up the space between the end of the header and the beginning of the data, as indicated by the t_hoff field. In this list of bits, a 1 bit indicates not-null, a 0 bit is a null. HeapTupleHeaderData LayoutHeapTupleHeaderData Layout Field Type Length Description t_oid Oid 4 bytes OID of this tuple t_cmin CommandId 4 bytes insert CID stamp t_cmax CommandId 4 bytes delete CID stamp t_xmin TransactionId 4 bytes insert XID stamp t_xmax TransactionId 4 bytes delete XID stamp t_ctid ItemPointerData 6 bytes current TID of this or newer tuple t_natts int16 2 bytes number of attributes t_infomask uint16 2 bytes Various flags t_hoff uint8 1 byte length of tuple header. Also offset of data.

All the details may be found in src/include/storage/bufpage.h. Interpreting the actual data can only be done with information obtained from other tables, mostly pg_attribute. The particular fields are attlen and attalign. There is no way to directly get a particular attribute, except when there are only fixed width fields and no NULLs. All this trickery is wrapped up in the functions heap_getattr, fastgetattr and heap_getsysattr. To read the data you need to examine each attribute in turn. First check whether the field is NULL according to the null bitmap. If it is, go to the next. Then make sure you have the right alignment. If the field is a fixed width field, then all the bytes are simply placed. If it's a variable length field (attlen == -1) then it's a bit more complicated, using the variable length structure varattrib. Depending on the flags, the data may be either inline, compressed or in another table (TOAST).