Parallel query and PL/Java
PL/Java 1.5.1 adds support for PostgreSQL 9.6, and with that comes the
possibility of using PL/Java functions in parallel queries. Simple testing shows
that this actually works; PL/Java functions can even be declared
if they meet the requirements, and executed in the parallelized parts of
However, this is a substantial change to conditions in which PL/Java was
developed, so this wiki page is here to collect the notes that are likely to
come with experience using this new capability. Such experience might include
empirically-determined, good values for
parallel_setup_cost, nonobvious cases
where a function should not be declared
SAFE, and so on.
Notes go here
Preview of new documentation
Until PL/Java 1.5.1 is released, here is a preview of the new section of the user’s guide.
PL/Java in parallel query or background worker
With some restrictions, PL/Java can be used in parallel queries, from PostgreSQL 9.6, and in some background worker processes (as introduced in PostgreSQL 9.3, though 9.5 or later is needed for support in PL/Java).
Background worker processes
Because PL/Java requires access to a database containing the
PL/Java is only usable in a worker process that initializes a database
connection, which must happen before the first use of any function that
depends on PL/Java.
Like any user-defined function, a PL/Java function can be
annotated with a level of “parallel safety”,
UNSAFE by default.
When a function labeled
UNSAFE is used in a query, the query cannot be
parallelized at all. If a query contains a function labeled
of the query may execute in parallel, but the part that calls the
function will be executed only in the lead process. A function labeled
may be executed in every process participating in the query.
Parallel setup cost
PostgreSQL parallel query processing uses multiple operating-system processes,
and these processes are new for each parallel query. If a PL/Java function is
PARALLEL SAFE and is pushed by the query planner to run in the
parallel worker processes, each new process will start a Java virtual machine.
The cost of doing so will reduce the expected advantage of parallel execution.
To inform the query planner of this trade-off, the value of the PostgreSQL
parallel_setup_cost should be increased.
The startup cost can be minimized with attention to the
PL/Java VM option recommendations, including class data sharing.
SAFE function behavior
There are stringent limits on what a function labeled
RESTRICTED may do,
and even more stringent limits on what may be done in a function labeled
The PostgreSQL manual describes the limits in the section
Parallel Labeling for Functions and Aggregates.
While PostgreSQL does check for some inappropriate operations from a
PARALLEL SAFE or
RESTRICTED function, for the most part it relies on
functions being labeled correctly. When in doubt, the conservative approach
is to label a function
UNSAFE, which can’t go wrong. A function mistakenly
SAFE could produce unpredictable results.
Internal workings of PL/Java
While a given PL/Java function itself may clearly qualify as
SAFE by inspection, there may still be cases where a forbidden operation
results from the internal workings of PL/Java itself. This has not been seen
in testing (simple parallel queries with
functions work fine), but to rule out the possibility would require a careful
audit of PL/Java’s code. Until then, it would be prudent for any application
involving parallel query with
SAFE PL/Java functions
to be first tested in a non-production environment.
README.parallel in the PostgreSQL source, for more detail on why parallel query works the way it does.