Parallel-query-and-PLJava
Parallel query and PL/Java
PL/Java 1.5.1 adds support for PostgreSQL 9.6, and with that comes the
possibility of using PL/Java functions in parallel queries. Simple testing shows
that this actually works; PL/Java functions can even be declared PARALLEL SAFE
if they meet the requirements, and executed in the parallelized parts of
queries.
However, this is a substantial change to conditions in which PL/Java was
developed, so this wiki page is here to collect the notes that are likely to
come with experience using this new capability. Such experience might include
empirically-determined, good values for parallel_setup_cost
, nonobvious cases
where a function should not be declared RESTRICTED
or SAFE
, and so on.
Notes go here
Preview of new documentation
Until PL/Java 1.5.1 is released, here is a preview of the new section of the user’s guide.
PL/Java in parallel query or background worker
With some restrictions, PL/Java can be used in parallel queries, from PostgreSQL 9.6, and in some background worker processes (as introduced in PostgreSQL 9.3, though 9.5 or later is needed for support in PL/Java).
Background worker processes
Because PL/Java requires access to a database containing the sqlj
schema,
PL/Java is only usable in a worker process that initializes a database
connection, which must happen before the first use of any function that
depends on PL/Java.
Parallel queries
Like any user-defined function, a PL/Java function can be
annotated with a level of “parallel safety”, UNSAFE
by default.
When a function labeled UNSAFE
is used in a query, the query cannot be
parallelized at all. If a query contains a function labeled RESTRICTED
, parts
of the query may execute in parallel, but the part that calls the RESTRICTED
function will be executed only in the lead process. A function labeled SAFE
may be executed in every process participating in the query.
Parallel setup cost
PostgreSQL parallel query processing uses multiple operating-system processes,
and these processes are new for each parallel query. If a PL/Java function is
labeled PARALLEL SAFE
and is pushed by the query planner to run in the
parallel worker processes, each new process will start a Java virtual machine.
The cost of doing so will reduce the expected advantage of parallel execution.
To inform the query planner of this trade-off, the value of the PostgreSQL
configuration variable parallel_setup_cost
should be increased.
The startup cost can be minimized with attention to the
PL/Java VM option recommendations, including class data sharing.
Limits on RESTRICTED
/SAFE
function behavior
There are stringent limits on what a function labeled RESTRICTED
may do,
and even more stringent limits on what may be done in a function labeled SAFE
.
The PostgreSQL manual describes the limits in the section
Parallel Labeling for Functions and Aggregates.
While PostgreSQL does check for some inappropriate operations from a
PARALLEL SAFE
or RESTRICTED
function, for the most part it relies on
functions being labeled correctly. When in doubt, the conservative approach
is to label a function UNSAFE
, which can’t go wrong. A function mistakenly
labeled RESTRICTED
or SAFE
could produce unpredictable results.
Internal workings of PL/Java
While a given PL/Java function itself may clearly qualify as RESTRICTED
or
SAFE
by inspection, there may still be cases where a forbidden operation
results from the internal workings of PL/Java itself. This has not been seen
in testing (simple parallel queries with RESTRICTED
or SAFE
PL/Java
functions work fine), but to rule out the possibility would require a careful
audit of PL/Java’s code. Until then, it would be prudent for any application
involving parallel query with RESTRICTED
or SAFE
PL/Java functions
to be first tested in a non-production environment.
Further reading
README.parallel in the PostgreSQL source, for more detail on why parallel query works the way it does.