Chapter 47. Background Worker Processes
PostgreSQL can be extended to run user-supplied code in separate processes.
Such processes are started, stopped and monitored by
postgres
,
which permits them to have a lifetime closely linked to the server's status.
These processes have the option to attach to
PostgreSQL
's
shared memory area and to connect to databases internally; they can also run
multiple transactions serially, just like a regular client-connected server
process. Also, by linking to
libpq
they can connect to the
server and behave like a regular client application.
Warning
There are considerable robustness and security risks in using background
worker processes because, being written in the
C
language,
they have unrestricted access to data. Administrators wishing to enable
modules that include background worker process should exercise extreme
caution. Only carefully audited modules should be permitted to run
background worker processes.
Background workers can be initialized at the time that
PostgreSQL
is started by including the module name in
shared_preload_libraries
. A module wishing to run a background
worker can register it by calling
RegisterBackgroundWorker(
from its
BackgroundWorker *worker
)
_PG_init()
. Background workers can also be started
after the system is up and running by calling the function
RegisterDynamicBackgroundWorker(
. Unlike
BackgroundWorker
*worker, BackgroundWorkerHandle **handle
)
RegisterBackgroundWorker
, which can only be called from within
the postmaster,
RegisterDynamicBackgroundWorker
must be
called from a regular backend.
The structure
BackgroundWorker
is defined thus:
typedef void (*bgworker_main_type)(Datum main_arg); typedef struct BackgroundWorker { char bgw_name[BGW_MAXLEN]; int bgw_flags; BgWorkerStartTime bgw_start_time; int bgw_restart_time; /* in seconds, or BGW_NEVER_RESTART */ char bgw_library_name[BGW_MAXLEN]; char bgw_function_name[BGW_MAXLEN]; Datum bgw_main_arg; char bgw_extra[BGW_EXTRALEN]; int bgw_notify_pid; } BackgroundWorker;
bgw_name
is a string to be used in log messages, process
listings and similar contexts.
bgw_flags
is a bitwise-or'd bit mask indicating the
capabilities that the module wants. Possible values are:
-
BGWORKER_SHMEM_ACCESS
-
Requests shared memory access. Workers without shared memory access cannot access any of PostgreSQL's shared data structures, such as heavyweight or lightweight locks, shared buffers, or any custom data structures which the worker itself may wish to create and use.
-
BGWORKER_BACKEND_DATABASE_CONNECTION
-
Requests the ability to establish a database connection through which it can later run transactions and queries. A background worker using
BGWORKER_BACKEND_DATABASE_CONNECTION
to connect to a database must also attach shared memory usingBGWORKER_SHMEM_ACCESS
, or worker start-up will fail.
bgw_start_time
is the server state during which
postgres
should start the process; it can be one of
BgWorkerStart_PostmasterStart
(start as soon as
postgres
itself has finished its own initialization; processes
requesting this are not eligible for database connections),
BgWorkerStart_ConsistentState
(start as soon as a consistent state
has been reached in a hot standby, allowing processes to connect to
databases and run read-only queries), and
BgWorkerStart_RecoveryFinished
(start as soon as the system has
entered normal read-write state). Note the last two values are equivalent
in a server that's not a hot standby. Note that this setting only indicates
when the processes are to be started; they do not stop when a different state
is reached.
bgw_restart_time
is the interval, in seconds, that
postgres
should wait before restarting the process, in
case it crashes. It can be any positive value,
or
BGW_NEVER_RESTART
, indicating not to restart the
process in case of a crash.
bgw_library_name
is the name of a library in
which the initial entry point for the background worker should be sought.
The named library will be dynamically loaded by the worker process and
bgw_function_name
will be used to identify the
function to be called. If loading a function from the core code, this must
be set to "postgres".
bgw_function_name
is the name of a function in
a dynamically loaded library which should be used as the initial entry point
for a new background worker.
bgw_main_arg
is the
Datum
argument
to the background worker main function. This main function should take a
single argument of type
Datum
and return
void
.
bgw_main_arg
will be passed as the argument.
In addition, the global variable
MyBgworkerEntry
points to a copy of the
BackgroundWorker
structure
passed at registration time; the worker may find it helpful to examine
this structure.
On Windows (and anywhere else where
EXEC_BACKEND
is
defined) or in dynamic background workers it is not safe to pass a
Datum
by reference, only by value. If an argument is required, it
is safest to pass an int32 or other small value and use that as an index
into an array allocated in shared memory. If a value like a
cstring
or
text
is passed then the pointer won't be valid from the
new background worker process.
bgw_extra
can contain extra data to be passed
to the background worker. Unlike
bgw_main_arg
, this data
is not passed as an argument to the worker's main function, but it can be
accessed via
MyBgworkerEntry
, as discussed above.
bgw_notify_pid
is the PID of a PostgreSQL
backend process to which the postmaster should send
SIGUSR1
when the process is started or exits. It should be 0 for workers registered
at postmaster startup time, or when the backend registering the worker does
not wish to wait for the worker to start up. Otherwise, it should be
initialized to
MyProcPid
.
Once running, the process can connect to a database by calling
BackgroundWorkerInitializeConnection(
or
char *dbname
,
char *username
)
BackgroundWorkerInitializeConnectionByOid(
.
This allows the process to run transactions and queries using the
Oid dboid
,
Oid useroid
)
SPI
interface. If
dbname
is NULL or
dboid
is
InvalidOid
, the session is not connected
to any particular database, but shared catalogs can be accessed.
If
username
is NULL or
useroid
is
InvalidOid
, the process will run as the superuser created
during
initdb
.
A background worker can only call one of these two functions, and only
once. It is not possible to switch databases.
Signals are initially blocked when control reaches the
background worker's main function, and must be unblocked by it; this is to
allow the process to customize its signal handlers, if necessary.
Signals can be unblocked in the new process by calling
BackgroundWorkerUnblockSignals
and blocked by calling
BackgroundWorkerBlockSignals
.
If
bgw_restart_time
for a background worker is
configured as
BGW_NEVER_RESTART
, or if it exits with an exit
code of 0 or is terminated by
TerminateBackgroundWorker
,
it will be automatically unregistered by the postmaster on exit.
Otherwise, it will be restarted after the time period configured via
bgw_restart_time
, or immediately if the postmaster
reinitializes the cluster due to a backend failure. Backends which need
to suspend execution only temporarily should use an interruptible sleep
rather than exiting; this can be achieved by calling
WaitLatch()
. Make sure the
WL_POSTMASTER_DEATH
flag is set when calling that function, and
verify the return code for a prompt exit in the emergency case that
postgres
itself has terminated.
When a background worker is registered using the
RegisterDynamicBackgroundWorker
function, it is
possible for the backend performing the registration to obtain information
regarding the status of the worker. Backends wishing to do this should
pass the address of a
BackgroundWorkerHandle *
as the second
argument to
RegisterDynamicBackgroundWorker
. If the
worker is successfully registered, this pointer will be initialized with an
opaque handle that can subsequently be passed to
GetBackgroundWorkerPid(
or
BackgroundWorkerHandle *
,
pid_t *
)
TerminateBackgroundWorker(
.
BackgroundWorkerHandle *
)
GetBackgroundWorkerPid
can be used to poll the status of the
worker: a return value of
BGWH_NOT_YET_STARTED
indicates that
the worker has not yet been started by the postmaster;
BGWH_STOPPED
indicates that it has been started but is
no longer running; and
BGWH_STARTED
indicates that it is
currently running. In this last case, the PID will also be returned via the
second argument.
TerminateBackgroundWorker
causes the postmaster to send
SIGTERM
to the worker if it is running, and to unregister it
as soon as it is not.
In some cases, a process which registers a background worker may wish to
wait for the worker to start up. This can be accomplished by initializing
bgw_notify_pid
to
MyProcPid
and
then passing the
BackgroundWorkerHandle *
obtained at
registration time to
WaitForBackgroundWorkerStartup(
function.
This function will block until the postmaster has attempted to start the
background worker, or until the postmaster dies. If the background runner
is running, the return value will
BackgroundWorkerHandle
*handle
,
pid_t *
)
BGWH_STARTED
, and
the PID will be written to the provided address. Otherwise, the return
value will be
BGWH_STOPPED
or
BGWH_POSTMASTER_DIED
.
If a background worker sends asynchronous notifications with the
NOTIFY
command via the Server Programming Interface
(
SPI
), it should call
ProcessCompletedNotifies
explicitly after committing
the enclosing transaction so that any notifications can be delivered. If a
background worker registers to receive asynchronous notifications with
the
LISTEN
through
SPI
, the worker
will log those notifications, but there is no programmatic way for the
worker to intercept and respond to those notifications.
The
src/test/modules/worker_spi
module
contains a working example,
which demonstrates some useful techniques.
The maximum number of registered background workers is limited by max_worker_processes .