70.2. System Catalog Initial Data
Each catalog that has any manually-created initial data (some do not)
has a corresponding
.dat
file that contains its
initial data in an editable format.
70.2.1. Data File Format
Each
.dat
file contains Perl data structure literals
that are simply eval'd to produce an in-memory data structure consisting
of an array of hash references, one per catalog row.
A slightly modified excerpt from
pg_database.dat
will demonstrate the key features:
[ # A comment could appear here. { oid => '1', oid_symbol => 'TemplateDbOid', descr => 'database\'s default template', datname => 'template1', datdba => 'PGUID', encoding => 'ENCODING', datcollate => 'LC_COLLATE', datctype => 'LC_CTYPE', datistemplate => 't', datallowconn => 't', datconnlimit => '-1', datlastsysoid => '0', datfrozenxid => '0', datminmxid => '1', dattablespace => '1663', datacl => '_null_' }, ]
Points to note:
-
The overall file layout is: open square bracket, one or more sets of curly braces each of which represents a catalog row, close square bracket. Write a comma after each closing curly brace.
-
Within each catalog row, write comma-separated
key
=>
value
pairs. The allowedkey
s are the names of the catalog's columns, plus the metadata keysoid
,oid_symbol
, anddescr
. (The use ofoid
andoid_symbol
is described in Section 70.2.2 below.descr
supplies a description string for the object, which will be inserted intopg_description
orpg_shdescription
as appropriate.) While the metadata keys are optional, the catalog's defined columns must all be provided, except when the catalog's.h
file specifies a default value for the column. -
All values must be single-quoted. Escape single quotes used within a value with a backslash. Backslashes meant as data can, but need not, be doubled; this follows Perl's rules for simple quoted literals. Note that backslashes appearing as data will be treated as escapes by the bootstrap scanner, according to the same rules as for escape string constants (see Section 4.1.2.2 ); for example
\t
converts to a tab character. If you actually want a backslash in the final value, you will need to write four of them: Perl strips two, leaving\\
for the bootstrap scanner to see. -
Null values are represented by
_null_
. (Note that there is no way to create a value that is just that string.) -
Comments are preceded by
#
, and must be on their own lines. -
To aid readability, field values that are OIDs of other catalog entries can be represented by names rather than numeric OIDs. This is described in Section 70.2.3 below.
-
Since hashes are unordered data structures, field order and line layout aren't semantically significant. However, to maintain a consistent appearance, we set a few rules that are applied by the formatting script
reformat_dat_file.pl
:-
Within each pair of curly braces, the metadata fields
oid
,oid_symbol
, anddescr
(if present) come first, in that order, then the catalog's own fields appear in their defined order. -
Newlines are inserted between fields as needed to limit line length to 80 characters, if possible. A newline is also inserted between the metadata fields and the regular fields.
-
If the catalog's
.h
file specifies a default value for a column, and a data entry has that same value,reformat_dat_file.pl
will omit it from the data file. This keeps the data representation compact. -
reformat_dat_file.pl
preserves blank lines and comment lines as-is.
It's recommended to run
reformat_dat_file.pl
before submitting catalog data patches. For convenience, you can simply change tosrc/include/catalog/
and runmake reformat-dat-files
. -
-
If you want to add a new method of making the data representation smaller, you must implement it in
reformat_dat_file.pl
and also teachCatalog::ParseData()
how to expand the data back into the full representation.
70.2.2. OID Assignment
A catalog row appearing in the initial data can be given a
manually-assigned OID by writing an
oid
=>
metadata field.
Furthermore, if an OID is assigned, a C macro for that OID can be
created by writing an
nnnn
oid_symbol
=>
metadata field.
name
Pre-loaded catalog rows must have preassigned OIDs if there are OID
references to them in other pre-loaded rows. A preassigned OID is
also needed if the row's OID must be referenced from C code.
If neither case applies, the
oid
metadata field can
be omitted, in which case the bootstrap code assigns an OID
automatically, or leaves it zero in a catalog that has no OIDs.
In practice we usually preassign OIDs for all or none of the pre-loaded
rows in a given catalog, even if only some of them are actually
cross-referenced.
Writing the actual numeric value of any OID in C code is considered
very bad form; always use a macro, instead. Direct references
to
pg_proc
OIDs are common enough that there's
a special mechanism to create the necessary macros automatically;
see
src/backend/utils/Gen_fmgrtab.pl
. Similarly
- but, for historical reasons, not done the same way -
there's an automatic method for creating macros
for
pg_type
OIDs.
oid_symbol
entries are therefore not
necessary in those two catalogs. Likewise, macros for
the
pg_class
OIDs of system catalogs and
indexes are set up automatically. For all other system catalogs, you
have to manually specify any macros you need
via
oid_symbol
entries.
To find an available OID for a new pre-loaded row, run the
script
src/include/catalog/unused_oids
.
It prints inclusive ranges of unused OIDs (e.g., the output
line
"
45-900
"
means OIDs 45 through 900 have not been
allocated yet). Currently, OIDs 1-9999 are reserved for manual
assignment; the
unused_oids
script simply looks
through the catalog headers and
.dat
files
to see which ones do not appear. You can also use
the
duplicate_oids
script to check for mistakes.
(
genbki.pl
will also detect duplicate OIDs
at compile time.)
The OID counter starts at 10000 at the beginning of a bootstrap run.
If a catalog row is in a table that requires OIDs, but no OID was
preassigned by an
oid
field, then it will
receive an OID of 10000 or above.
70.2.3. OID Reference Lookup
Cross-references from one initial catalog row to another can be written
by just writing the preassigned OID of the referenced row. But
that's error-prone and hard to understand, so for frequently-referenced
catalogs,
genbki.pl
provides mechanisms to write
symbolic references instead. Currently this is possible for references
to access methods, functions, operators, opclasses, opfamilies, and
types. The rules are as follows:
-
Use of symbolic references is enabled in a particular catalog column by attaching
BKI_LOOKUP(
to the column's definition, wherelookuprule
)lookuprule
ispg_am
,pg_proc
,pg_operator
,pg_opclass
,pg_opfamily
, orpg_type
.BKI_LOOKUP
can be attached to columns of typeOid
,regproc
,oidvector
, orOid[]
; in the latter two cases it implies performing a lookup on each element of the array. -
In such a column, all entries must use the symbolic format except when writing
0
for InvalidOid. (If the column is declaredregproc
, you can optionally write-
instead of0
.)genbki.pl
will warn about unrecognized names. -
Access methods are just represented by their names, as are types. Type names must match the referenced
pg_type
entry'stypname
; you do not get to use any aliases such asinteger
forint4
. -
A function can be represented by its
proname
, if that is unique among thepg_proc.dat
entries (this works like regproc input). Otherwise, write it asproname(argtypename,argtypename,...)
, like regprocedure. The argument type names must be spelled exactly as they are in thepg_proc.dat
entry'sproargtypes
field. Do not insert any spaces. -
Operators are represented by
oprname(lefttype,righttype)
, writing the type names exactly as they appear in thepg_operator.dat
entry'soprleft
andoprright
fields. (Write0
for the omitted operand of a unary operator.) -
The names of opclasses and opfamilies are only unique within an access method, so they are represented by
access_method_name
/
object_name
. -
In none of these cases is there any provision for schema-qualification; all objects created during bootstrap are expected to be in the pg_catalog schema.
genbki.pl
resolves all symbolic references while it
runs, and puts simple numeric OIDs into the emitted BKI file. There is
therefore no need for the bootstrap backend to deal with symbolic
references.
70.2.4. Recipes for Editing Data Files
Here are some suggestions about the easiest ways to perform common tasks when updating catalog data files.
Add a new column with a default to a catalog:
Add the column to the header file with
a
BKI_DEFAULT(
annotation. The data file need only be adjusted by adding the field
in existing rows where a non-default value is needed.
value
)
Add a default value to an existing column that doesn't have
one:
Add a
BKI_DEFAULT
annotation to the header file,
then run
make reformat-dat-files
to remove
now-redundant field entries.
Remove a column, whether it has a default or not:
Remove the column from the header, then run
make
reformat-dat-files
to remove now-useless field entries.
Change or remove an existing default value:
You cannot simply change the header file, since that will cause the
current data to be interpreted incorrectly. First run
make
expand-dat-files
to rewrite the data files with all
default values inserted explicitly, then change or remove
the
BKI_DEFAULT
annotation, then run
make
reformat-dat-files
to remove superfluous fields again.
Ad-hoc bulk editing:
reformat_dat_file.pl
can be adapted to perform
many kinds of bulk changes. Look for its block comments showing where
one-off code can be inserted. In the following example, we are going
to consolidate two boolean fields in
pg_proc
into a char field:
-
Add the new column, with a default, to
pg_proc.h
:+ /* see PROKIND_ categories below */ + char prokind BKI_DEFAULT(f);
-
Create a new script based on
reformat_dat_file.pl
to insert appropriate values on-the-fly:- # At this point we have the full row in memory as a hash - # and can do any operations we want. As written, it only - # removes default values, but this script can be adapted to - # do one-off bulk-editing. + # One-off change to migrate to prokind + # Default has already been filled in by now, so change to other + # values as appropriate + if ($values{proisagg} eq 't') + { + $values{prokind} = 'a'; + } + elsif ($values{proiswindow} eq 't') + { + $values{prokind} = 'w'; + }
-
Run the new script:
$ cd src/include/catalog $ perl rewrite_dat_with_prokind.pl pg_proc.dat
At this point
pg_proc.dat
has all three columns,prokind
,proisagg
, andproiswindow
, though they will appear only in rows where they have non-default values. -
Remove the old columns from
pg_proc.h
:- /* is it an aggregate? */ - bool proisagg BKI_DEFAULT(f); - - /* is it a window function? */ - bool proiswindow BKI_DEFAULT(f);
-
Finally, run
make reformat-dat-files
to remove the useless old entries frompg_proc.dat
.
For further examples of scripts used for bulk editing, see
convert_oid2name.pl
and
remove_pg_type_oid_symbols.pl
attached to this
message:
https://www.postgresql.org/message-id/CAJVSVGVX8gXnPm+Xa=DxR7kFYprcQ1tNcCT5D0O3ShfnM6jehA@mail.gmail.com