Page Files
A description of the database file default page format.
This section provides an overview of the page format used by PostgreSQL
tables. User-defined access methods need not use this page format.
In the following explanation, a
byte
is assumed to contain 8 bits. In addition, the term
item
refers to data that is stored in PostgreSQL tables.
shows how pages in both normal
PostgreSQL tables and
PostgreSQL indexes (e.g., a B-tree index)
are structured. This structure is also used for toast tables and sequences.
There are five parts to each page.
Sample Page Layout
Page Layout
Item
Description
PageHeaderData
20 bytes long. Contains general information about the page to allow to access it.
itemPointerData
List of (offset,length) pairs pointing to the actual item.
Free space
The unallocated space. All new tuples are allocated from here, generally from the end.
items
The actual items themselves. Different access method have different data here.
Special Space
Access method specific data. Different method store different data. Unused by normal tables.
The first 20 bytes of each page consists of a page header
(PageHeaderData). It's format is detailed in . The first two fields deal with WAL
related stuff. This is followed by three 2-byte integer fields
(lower, upper, and
special). These represent byte offsets to the start
of unallocated space, to the end of unallocated space, and to the start of
the special space.
Special space is a region at the end of the page that is allocated at page
initialization time and contains information specific to an access method.
The last 2 bytes of the page header, opaque,
currently only stores the page size. Page size is stored in each page
because frames in the buffer pool may be subdivided into equal sized pages
on a frame by frame basis within a table (is this true? - mvo).
Following the page header are item identifiers
(ItemIdData). New item identifiers are allocated
from the first four bytes of unallocated space. Because an item
identifier is never moved until it is freed, its index may be used to
indicate the location of an item on a page. In fact, every pointer to an
item (ItemPointer, also know as
CTID) created by
PostgreSQL consists of a frame number and an
index of an item identifier. An item identifier contains a byte-offset to
the start of an item, its length in bytes, and a set of attribute bits
which affect its interpretation.
The items themselves are stored in space allocated backwards from the end
of unallocated space. The exact structure varies depending on what the
table is to contain. Sequences and tables both use a structure named
HeapTupleHeaderData, describe below.
The final section is the "special section" which may contain anything the
access method wishes to store. Ordinary tables do not use this at all
(indicated by setting the offset to the pagesize).
All tuples are structured the same way. A header of around 31 bytes
followed by an optional null bitmask and the data. The header is detailed
below in . The null bitmask is
only present if the HEAP_HASNULL bit is set in the
t_infomask. If it is present it takes up the space
between the end of the header and the beginning of the data, as indicated
by the t_hoff field. In this list of bits, a 1 bit
indicates not-null, a 0 bit is a null.
All the details may be found in src/include/storage/bufpage.h.
Interpreting the actual data can only be done with information obtained
from other tables, mostly pg_attribute. The
particular fields are attlen and
attalign. There is no way to directly get a
particular attribute, except when there are only fixed width fields and no
NULLs. All this trickery is wrapped up in the functions
heap_getattr, fastgetattr
and heap_getsysattr.
To read the data you need to examine each attribute in turn. First check
whether the field is NULL according to the null bitmap. If it is, go to
the next. Then make sure you have the right alignment. If the field is a
fixed width field, then all the bytes are simply placed. If it's a
variable length field (attlen == -1) then it's a bit more complicated,
using the variable length structure varattrib.
Depending on the flags, the data may be either inline, compressed or in
another table (TOAST).