42.11. PL/pgSQL under the Hood
This section discusses some implementation details that are frequently important for PL/pgSQL users to know.
42.11.1. Variable Substitution
SQL statements and expressions within a PL/pgSQL function can refer to variables and parameters of the function. Behind the scenes, PL/pgSQL substitutes query parameters for such references. Parameters will only be substituted in places where a parameter or column reference is syntactically allowed. As an extreme case, consider this example of poor programming style:
INSERT INTO foo (foo) VALUES (foo);
The first occurrence of
must syntactically be a table
name, so it will not be substituted, even if the function has a variable
. The second occurrence must be the name of a
column of the table, so it will not be substituted either. Only the
third occurrence is a candidate to be a reference to the function's
PostgreSQL versions before 9.0 would try to substitute the variable in all three cases, leading to syntax errors.
Since the names of variables are syntactically no different from the names of table columns, there can be ambiguity in statements that also refer to tables: is a given name meant to refer to a table column, or a variable? Let's change the previous example to
INSERT INTO dest (col) SELECT foo + bar FROM src;
must be table names, and
must be a column of
might reasonably be either variables of the function
or columns of
By default, PL/pgSQL will report an error if a name in a SQL statement could refer to either a variable or a table column. You can fix such a problem by renaming the variable or column, or by qualifying the ambiguous reference, or by telling PL/pgSQL which interpretation to prefer.
The simplest solution is to rename the variable or column.
A common coding rule is to use a
different naming convention for
variables than you use for column names. For example,
if you consistently name function variables
while none of your
column names start with
, no conflicts will occur.
Alternatively you can qualify ambiguous references to make them clear.
In the above example,
would be an unambiguous reference
to the table column. To create an unambiguous reference to a variable,
declare it in a labeled block and use the block's label
). For example,
> DECLARE foo int; BEGIN foo := ...; INSERT INTO dest (col) SELECT block.foo + bar FROM src;
means the variable even if there is a column
. Function parameters, as well as
special variables such as
, can be qualified by the
function's name, because they are implicitly declared in an outer block
labeled with the function's name.
Sometimes it is impractical to fix all the ambiguous references in a large body of PL/pgSQL code. In such cases you can specify that PL/pgSQL should resolve ambiguous references as the variable (which is compatible with PL/pgSQL 's behavior before PostgreSQL 9.0), or as the table column (which is compatible with some other systems such as Oracle ).
To change this behavior on a system-wide basis, set the configuration
to one of
is the factory default).
This parameter affects subsequent compilations
of statements in
functions, but not statements
already compiled in the current session.
Because changing this setting
can cause unexpected changes in the behavior of
functions, it can only be changed by a superuser.
You can also set the behavior on a function-by-function basis, by inserting one of these special commands at the start of the function text:
#variable_conflict error #variable_conflict use_variable #variable_conflict use_column
These commands affect only the function they are written in, and override
the setting of
. An example is
CREATE FUNCTION stamp_user(id int, comment text) RETURNS void AS $$ #variable_conflict use_variable DECLARE curtime timestamp := now(); BEGIN UPDATE users SET last_modified = curtime, comment = comment WHERE users.id = id; END; $$ LANGUAGE plpgsql;
will refer to the function's variable and parameters
whether or not
has columns of those names. Notice
that we had to qualify the reference to
clause to make it refer to the table column.
But we did not have to qualify the reference to
as a target in the
list, because syntactically
that must be a column of
. We could write the same
function without depending on the
in this way:
CREATE FUNCTION stamp_user(id int, comment text) RETURNS void AS $$ <
> DECLARE curtime timestamp := now(); BEGIN UPDATE users SET last_modified = fn.curtime, comment = stamp_user.comment WHERE users.id = stamp_user.id; END; $$ LANGUAGE plpgsql;
Variable substitution does not happen in the command string given
or one of its variants. If you need to
insert a varying value into such a command, do so as part of
constructing the string value, or use
, as illustrated in
Variable substitution currently works only in
because the main SQL engine allows query parameters only in these
commands. To use a non-constant name or value in other statement
types (generically called utility statements), you must construct
the utility statement as a string and
42.11.2. Plan Caching
The PL/pgSQL interpreter parses the function's source text and produces an internal binary instruction tree the first time the function is called (within each session). The instruction tree fully translates the PL/pgSQL statement structure, but individual SQL expressions and SQL commands used in the function are not translated immediately.
As each expression and
command is first
executed in the function, the
parses and analyzes the command to create a prepared statement,
Subsequent visits to that expression or command
reuse the prepared statement. Thus, a function with conditional code
paths that are seldom visited will never incur the overhead of
analyzing those commands that are never executed within the current
session. A disadvantage is that errors
in a specific expression or command cannot be detected until that
part of the function is reached in execution. (Trivial syntax
errors will be detected during the initial parsing pass, but
anything deeper will not be detected until execution.)
PL/pgSQL (or more precisely, the SPI manager) can furthermore attempt to cache the execution plan associated with any particular prepared statement. If a cached plan is not used, then a fresh execution plan is generated on each visit to the statement, and the current parameter values (that is, PL/pgSQL variable values) can be used to optimize the selected plan. If the statement has no parameters, or is executed many times, the SPI manager will consider creating a generic plan that is not dependent on specific parameter values, and caching that for re-use. Typically this will happen only if the execution plan is not very sensitive to the values of the PL/pgSQL variables referenced in it. If it is, generating a plan each time is a net win. See PREPARE for more information about the behavior of prepared statements.
saves prepared statements
and sometimes execution plans in this way,
SQL commands that appear directly in a
function must refer to the
same tables and columns on every execution; that is, you cannot use
a parameter as the name of a table or column in an SQL command. To get
around this restriction, you can construct dynamic commands using
statement - at the price of performing new parse analysis and
constructing a new execution plan on every execution.
The mutable nature of record variables presents another problem in this
connection. When fields of a record variable are used in
expressions or statements, the data types of the fields must not
change from one call of the function to the next, since each
expression will be analyzed using the data type that is present
when the expression is first reached.
used to get around this problem when necessary.
If the same function is used as a trigger for more than one table,
prepares and caches statements
independently for each such table - that is, there is a cache
for each trigger function and table combination, not just for each
function. This alleviates some of the problems with varying
data types; for instance, a trigger function will be able to work
successfully with a column named
even if it happens
to have different types in different tables.
Likewise, functions having polymorphic argument types have a separate statement cache for each combination of actual argument types they have been invoked for, so that data type differences do not cause unexpected failures.
Statement caching can sometimes have surprising effects on the interpretation of time-sensitive values. For example there is a difference between what these two functions do:
CREATE FUNCTION logfunc1(logtxt text) RETURNS void AS $$ BEGIN INSERT INTO logtable VALUES (logtxt, 'now'); END; $$ LANGUAGE plpgsql;
CREATE FUNCTION logfunc2(logtxt text) RETURNS void AS $$ DECLARE curtime timestamp; BEGIN curtime := 'now'; INSERT INTO logtable VALUES (logtxt, curtime); END; $$ LANGUAGE plpgsql;
In the case of
main parser knows when
should be interpreted as
, because the target column of
is of that type. Thus,
will be converted to a
constant when the
is analyzed, and then used in all
during the lifetime
of the session. Needless to say, this isn't what the programmer
wanted. A better idea is to use the
In the case of
main parser does not know
should become and therefore
it returns a data value of type
containing the string
. During the ensuing assignment
to the local variable
interpreter casts this
string to the
type by calling the
functions for the conversion. So, the computed time stamp is updated
on each execution as the programmer expects. Even though this
happens to work as expected, it's not terribly efficient, so
use of the
function would still be a better idea.