Chapter 57. Writing A Table Sampling Method
Table of Contents
PostgreSQL
's implementation of the
TABLESAMPLE
clause supports custom table sampling methods, in addition to
the
BERNOULLI
and
SYSTEM
methods that are required
by the SQL standard. The sampling method determines which rows of the
table will be selected when the
TABLESAMPLE
clause is used.
At the SQL level, a table sampling method is represented by a single SQL function, typically implemented in C, having the signature
method_name(internal) RETURNS tsm_handler
The name of the function is the same method name appearing in the
TABLESAMPLE
clause. The
internal
argument is a dummy
(always having value zero) that simply serves to prevent this function from
being called directly from a SQL command.
The result of the function must be a palloc'd struct of
type
TsmRoutine
, which contains pointers to support functions for
the sampling method. These support functions are plain C functions and
are not visible or callable at the SQL level. The support functions are
described in
Section 57.1
.
In addition to function pointers, the
TsmRoutine
struct must
provide these additional fields:
-
List *parameterTypes
-
This is an OID list containing the data type OIDs of the parameter(s) that will be accepted by the
TABLESAMPLE
clause when this sampling method is used. For example, for the built-in methods, this list contains a single item with valueFLOAT4OID
, which represents the sampling percentage. Custom sampling methods can have more or different parameters. -
bool repeatable_across_queries
-
If
true
, the sampling method can deliver identical samples across successive queries, if the same parameters andREPEATABLE
seed value are supplied each time and the table contents have not changed. When this isfalse
, theREPEATABLE
clause is not accepted for use with the sampling method. -
bool repeatable_across_scans
-
If
true
, the sampling method can deliver identical samples across successive scans in the same query (assuming unchanging parameters, seed value, and snapshot). When this isfalse
, the planner will not select plans that would require scanning the sampled table more than once, since that might result in inconsistent query output.
The
TsmRoutine
struct type is declared
in
src/include/access/tsmapi.h
, which see for additional
details.
The table sampling methods included in the standard distribution are good
references when trying to write your own. Look into
the
src/backend/access/tablesample
subdirectory of the source
tree for the built-in sampling methods, and into the
contrib
subdirectory for add-on methods.