Load Balancing

Pgpool-II load balancing of SELECT queries works with any clustering mode except raw mode. When enabled Pgpool-II sends the writing queries to the primary node in Native Replication mode, all of the backend nodes in Replication mode, and other queries get load balanced among all backend nodes. To which node the load balancing mechanism sends read queries is decided at the session start time and will not be changed until the session ends unless statement_level_load_balance is specified. However there are some exceptions. See below for more details.

Note: Queries which are sent to primary node or replicated because they cannot be balanced are also accounted for in the load balancing algorithm.

Note: If you don't want a query that qualifies for the load balancing to be load balanced by Pgpool-II , you can put arbitrary comment( /*NO LOAD BALANCE*/ is usually used) before the SELECT statement. This will disable the load balance of the particular query and Pgpool-II will send it to the main node (the primary node in Native Replication mode).

Note: You can check which DB node is assigned as the load balancing node by using SHOW POOL NODES .

5.7.1. Condition for Load Balancing

For a query to be load balanced, all the following requirements must be met:

  • PostgreSQL version 7.4 or later

  • either in replication mode or native replication mode

  • the query must not be in an explicitly declared transaction (i.e. not in a BEGIN ~ END block)

    • However, if following conditions are met, load balance is possible even if in an explicit transaction

      • transaction isolation level is not SERIALIZABLE

      • transaction has not issued a write query yet (until a write query is issued, load balance is possible. Here "write query" means non SELECT DML or DDL. Before Pgpool-II 4.1 , SELECTs having write functions as specified in write or read_only function list is not regarded as a write query.)

      • If write and read_only function list is empty, SELECT having functions which are not volatile is regarded as a read only query.

  • it's not SELECT INTO

  • it's not SELECT FOR UPDATE nor FOR SHARE

  • it starts with "SELECT" or one of COPY TO STDOUT, EXPLAIN, EXPLAIN ANALYZE SELECT... ignore_leading_white_space = true will ignore leading white space. (Except for SELECTs using writing functions specified in write_function_list or read_only_function_list )

  • in native replication mode, in addition to above, following conditions must be met:

    • does not use temporary tables

    • does not use unlogged tables

    • does not use system catalogs

Note: You could suppress load balancing by inserting arbitrary comments just in front of the SELECT query:

/*REPLICATION*/ SELECT ...
   

If you want to use comments without suppressing load balancing, you can set allow_sql_comments to on. Please refer to replicate_select as well.

Note: The JDBC driver has an autocommit option. If the autocommit is false, the JDBC driver sends "BEGIN" and "COMMIT" by itself. In this case the same restriction above regarding load balancing will be applied.

5.7.2. Writing queries may affect Load Balancing

In general, read queries are load balanced if certain conditions are met. However, writing queries may affect the load balancing. Here "writing queries" mean all the queries except below:

  • SELECT/WITH without writing functions. Volatile functions are regarded writing functions. You can define your own writing functions by using write_function_list or read_only_function_list .

  • SELECT/WITH without FOR UPDATE/SHARE

  • WITH without DML statements

  • COPY TO STDOUT

  • EXPLAIN

  • EXPLAIN ANALYZE and the query is SELECT not including writing functions

  • SHOW

If writing queries appear, succeeding read queries may not be load balanced. i.e. sent to primary node (in streaming replication mode) or main node (in other mode) depending on the setting of disable_load_balance_on_write .

5.7.3. Load Balancing in Streaming Replication

While using Streaming replication and Hot Standby, it is important to determine which query can be sent to the primary or the standby, and which one should not be sent to the standby. Pgpool-II 's Streaming Replication mode carefully takes care of this.

We distinguish which query should be sent to which node by looking at the query itself.

  • These queries should be sent to the primary node only

    • INSERT, UPDATE, DELETE, COPY FROM, TRUNCATE, CREATE, DROP, ALTER, COMMENT

    • SELECT ... FOR SHARE | UPDATE

    • SELECT in transaction isolation level SERIALIZABLE

    • LOCK command more strict than ROW EXCLUSIVE MODE

    • DECLARE, FETCH, CLOSE

    • SHOW

    • Some transactional commands:

      • BEGIN READ WRITE, START TRANSACTION READ WRITE

      • SET TRANSACTION READ WRITE, SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE

      • SET transaction_read_only = off

    • Two phase commit commands: PREPARE TRANSACTION, COMMIT PREPARED, ROLLBACK PREPARED

    • LISTEN, UNLISTEN, NOTIFY

    • VACUUM

    • Some sequence functions (nextval and setval)

    • Large objects creation commands

    • Multi-statement queries (multiple SQL commands on single line)

  • These queries can be sent to both the primary node and the standby node. If load balancing is enabled, these types of queries can be sent to the standby node. However, if delay_threshold is set and the replication delay is higher than delay_threshold , queries are sent to the primary node.

    • SELECT not listed above

    • COPY TO STDOUT

    • EXPLAIN

    • EXPLAIN ANALYZE and the query is SELECT not including writing functions

    • SHOW

  • These queries are sent to both the primary node and the standby node

    • SET

    • DISCARD

    • DEALLOCATE ALL

In an explicit transaction:

  • Transaction starting commands such as BEGIN are sent to both the primary node and the standby node.

  • Following SELECT and some other queries that can be sent to both primary or standby are executed in the transaction or on the standby node.

  • Commands which cannot be executed on the standby such as INSERT are sent to the primary. After one of these commands, even SELECTs are sent to the primary node, This is because these SELECTs might want to see the result of an INSERT immediately. This behavior continues until the transaction closes or aborts.

In the extended protocol, it is possible to determine if the query can be sent to standby or not in load balance mode while parsing the query. The rules are the same as for the non extended protocol. For example, INSERTs are sent to the primary node. Following bind, describe and execute will be sent to the primary node as well.

Note: If the parse of a SELECT statement is sent to the standby node due to load balancing, and then a DML statement, such as an INSERT, is sent to Pgpool-II , then the parsed SELECT will have to be executed on the primary node. Therefore, we re-parse the SELECT on the primary node.

Lastly, queries that Pgpool-II 's parser thinks to be an error are sent to the primary node.

5.7.4. Load Balancing Settings

load_balance_mode ( boolean )

When set to on, Pgpool-II enables the load balancing on incoming SELECT queries. i.e. SELECT queries from the clients gets distributed to the configured PostgreSQL backends. Default is off.

This parameter can only be set at server start.

ignore_leading_white_space ( boolean )

When set to on, Pgpool-II ignores the white spaces at the beginning of SQL queries in load balancing. It is useful if used with APIs like DBI/DBD:Pg which adds white spaces against the user's intention.

This parameter can be changed by reloading the Pgpool-II configurations.

read_only_function_list ( string )

Specifies a comma separated list of function names that DO NOT update the database. SELECTs including functions not specified in this list are not load balanced. These are replicated among all the DB nodes in Replication mode, sent to the primary node only in other mode.

You can use regular expression to match function names, to which ^ and $ are automatically added.

Example 5-2. Using regular expression

If you have prefixed all your read only function with 'get_' or 'select_', You can set the read_only_function_list like below:

read_only_function_list = 'get_.*,select_.*'
	

Note: If the queries can refer to the function with and without the schema qualification then you must add both entries (with and without schema name) in the list.

#For example:
#If the queries sometime use "f1()" and other times "public.f1()"
#to refer the function f1 then the read_only_function_list
#would be configured as follows.

read_only_function_list = "f1,public.f1"
       

Note: If this parameter and write_function_list is empty string, function's volatile proper will be checked. If the property is volatile, the function is regarded as a writing function. This is convenient and recommended way. However this requires one extra query against system catalog for the first time (in the next time cached query result is used and no extra query will be sent). If you don't want to send such query, you can keep on using this parameter.

This parameter can be changed by reloading the Pgpool-II configurations.

write_function_list ( string )

Specifies a comma separated list of function names that DO update the database. SELECTs including functions specified in this list are not load balanced. These are replicated among all the DB nodes in Replication mode, sent to the primary node only in other mode.

You can use regular expression to match function names, to which ^ and $ are automatically added.

Example 5-3. Using regular expression

If you have prefixed all your updating functions with 'set_', 'update_', 'delete_' or 'insert_', You can set the write_function_list like below:

write_function_list = 'nextval,setval,set_.*,update_.*,delete_.*,insert_.*'
	

Note: If the queries can refer the function with and without the schema qualification then you must add both entries(with and without schema name) in the list.

#For example:
#If the queries sometime use "f1()" and other times "public.f1()"
#to refer the function f1 then the write_function_list
#would be configured as follows.

write_function_list = "f1,public.f1"
       

Note: write_function_list and read_only_function_list are mutually exclusive and only one of the two lists can be set in the configuration.

Example 5-4. Configuring using nextval() and setval() to land on proper backend

Prior to Pgpool-II V3.0 , nextval() and setval() were known as functions writing to the database. You can configure this by setting write_function_list and read_only_function_list as follows

read_only_function_list = ''
write_function_list = 'nextval,setval,lastval,currval'
	 

Note: PostgreSQL also contains lastval() and currval() in addition to nextval() and setval() . Though lastval() and currval() are not writing function type, but it is advised to treat lastval() and currval() as writing functions to avoid errors which occur when these functions are accidentally load balanced.

Note: If this parameter and read_only_function_list is empty string, function's volatile proper will be checked. If the property is volatile, the function is regarded as a writing function. This is convenient and recommended way. However this requires one extra query against system catalog for the first time (in the next time cached query result is used and no extra query will be sent). If you don't want to send such query, you can keep on using this parameter.

This parameter can be changed by reloading the Pgpool-II configurations.

primary_routing_query_pattern_list ( string )

Specifies a semicolon separated list of SQL patterns that should be sent to primary node. SQL that matched patterns specified in this list are not load balanced. Other than Only Native Replication mode is supported.

You can use regular expression to match SQL patterns, to which ^ and $ are automatically added. When using special characters in regular expressions (such as "'", ";", "*", "(", ")", "|", "+", ".", "\", "?", "^", "$", "{","}", "{" or "}", etc.) in SQL patterns, you need to escape them by using "\". SQL pattern specified in this parameter is case-insensitive.

Example 5-5. Using regular expression

If the following SQL should be sent to the primary node only, You can set the primary_routing_query_pattern_list like below:

  • SELECT * FROM table_name1;

  • SELECT col1, col2 FROM table_name2 WHERE col1 LIKE '%a%';

  • SQL including table_name3

primary_routing_query_pattern_list = 'SELECT \* FROM table_name1\;;SELECT col1, col2 FROM table_name2 WHERE col1 LIKE \'%a%\'\;;.*table_name3.*'
       

Note: If SQL matches both write_function_list and read_only_function_list , read_only_function_list setting is ignored and the SQL should be sent only to the primary node.

Depending on the SQL patterns, performance might be 1-2% lower when using this feature.

This parameter can be changed by reloading the Pgpool-II configurations.

database_redirect_preference_list ( string )

Specifies the list of "database-name:node id(ratio)" pairs to send SELECT queries to a particular backend node for a particular database connection at a specified load balance ratio. The load balance ratio specifies a value between 0 and 1. The default is 1.0.

For example, by specifying "test:1(0.5)", Pgpool-II will redirect 50% SELECT queries to the backend node of ID 1 for the connection to "test" database. You can specify multiple "database name:node id" pair by separating them using comma (,).

Regular expressions are also accepted for database name. You can use special keywords as node id . If "primary" is specified, queries are sent to the primary node, and if "standby" is specified, one of the standby nodes are selected randomly based on weights ( backend_weight ).

Example 5-6. Using database_redirect_preference_list

If you want to configure the following SELECT query routing rules:

  • Route all SELECT queries on postgres database to the primary backend node.

  • Route 30% SELECT queries on mydb0 or on mydb1 databases to backend node of ID. The other 70% SELECT queries will be sent to other backend nodes.

  • Route all SELECT queries on mydb2 database to standby backend nodes.

then the database_redirect_preference_list will be configured as follows:

database_redirect_preference_list = 'postgres:primary,mydb[01]:1(0.3),mydb2:standby'
	

This parameter can be changed by reloading the Pgpool-II configurations.

app_name_redirect_preference_list ( string )

Specifies the list of "application-name:node id(ratio)" pairs to send SELECT queries to a particular backend node for a particular client application connection at a specified load balance ratio.

Note: In PostgreSQL V9.0 or later the "Application name" is a name specified by a client when it connects to database.

For example, application name of psql command is "psql" .

Note: Pgpool-II recognizes the application name only specified in the start-up packet. Although a client can provide the application name later in the session, but that does not get considered by the Pgpool-II for query routing.

The notion of app_name_redirect_preference_list is same as the database_redirect_preference_list thus you can also use the regular expressions for application names. Similarly special keyword "primary" indicates the primary node and "standby" indicates one of standby nodes. The load balance weight specifies a value between 0 and 1. The default is 1.0.

Example 5-7. Using app-name_redirect_preference_list

If you want to configure the following SELECT query routing rules:

  • Route all SELECT from psql client to the primary backend node.

  • Route 30% SELECT queries from myapp1 client to backend node of ID 1. The other 70% SELECT queries will be sent to other backend nodes.

  • Route all SELECT queries from myapp2 client to standby backend nodes.

then the app_name_redirect_preference_list will be configured as follows:

app_name_redirect_preference_list = 'psql:primary,myapp1:1(0.3),myapp2:standby'
	

Note: app_name_redirect_preference_list takes precedence over the database_redirect_preference_list .

For example, if you set database_redirect_preference_list = 'postgres:standby(1.0)' and app_name_redirect_preference_list = 'myapp1:primary(1.0)' , all SELECT from application myapp1 on postgres database will be sent to primary backend node.

Note: By specifying of app_name_redirect_preference_list and database_redirect_preference_list , when multiple database names and application names are matched, the first setting will be used.

For example, if you set database_redirect_preference_list = 'postgres:primary,postgres:standby' , "postgres: primary" will be used.

Caution

JDBC driver PostgreSQL-9.3 and earlier versions does not send the application name in the startup packet even if the application name is specified using the JDBC driver option "ApplicationName" and "assumeMinServerVersion=9.0" . So if you want to use the app_name_redirect_preference_list feature through JDBC , Use PostgreSQL-9.4 or later version of the driver.

This parameter can be changed by reloading the Pgpool-II configurations.

allow_sql_comments ( boolean )

When set to on, Pgpool-II ignore the SQL comments when identifying if the load balance or query cache is possible on the query. When this parameter is set to off, the SQL comments on the query could effectively prevent the query from being load balanced or cached (pre Pgpool-II V3.4 behavior).

This parameter can be changed by reloading the Pgpool-II configurations. You can also use PGPOOL SET command to alter the value of this parameter for a current session.

disable_load_balance_on_write ( string )

Specify load balance behavior after write queries appear. This parameter is especially useful in streaming replication mode. When write queries are sent to primary server, the changes are applied to standby servers but there's a time lag. So if a client read the same row right after the write query, the client may not see the latest value of the row. If that's the problem, clients should always read data from the primary server. However this effectively disables load balancing, which leads to lesser performance. This parameter allows a fine tuning for the trade off between not-clustering-aware applications compatibility and performance.

If this parameter is set to off , read queries are load balanced even if write queries appear. This gives the best load balance performance but clients may see older data. This is useful for an environment where PostgreSQL parameter synchronous_commit = 'remote_apply', or in the native replication mode, since there's no replication delay in such environments.

If this parameter is set to transaction and write queries appear in an explicit transaction, subsequent read queries are not load balanced until the transaction ends. Please note that read queries not in an explicit transaction are not affected by the parameter. This setting gives the best balance in most cases and you should start from this. This is the default and same behavior in Pgpool-II 3.7 or before.

If this parameter is set to trans_transaction and write queries appear in an explicit transaction, subsequent read queries are not load balanced in the transaction and subsequent explicit transaction until the session ends. So this parameter is safer for older applications but give lesser performance than transaction . Please note that read queries not in an explicit transaction are not affected by the parameter.

If this parameter is set to always and write queries appear, subsequent read queries are not load balanced until the session ends regardless they are in explicit transactions or not. This gives the highest compatibility with not-clustering-aware applications and the lowest performance.

If this parameter is set to dml_adaptive Pgpool-II keep track of each TABLE referenced in the WRITE statements within the explicit transactions and will not load balances the subsequent READ queries if the TABLE they are reading from is previously modified inside the same transaction. Dependent functions, triggers, and views on the tables can be configured using dml_adaptive_object_relationship_list

dml_adaptive_object_relationship_list ( string )

To prevent load balancing of READ dependent objects, you may specificy the object name followed by a colon( : ) and then a comma( , ) separated list of dependent object names. "[object]:[dependent-object]" In an explicit transaction block after a WRITE statement has been issues, this will prevent load balancing of any READ statements containing references of dependent object(s).

Example 5-8. Configuring dml adaptive object relationship

If you have a trigger installed on table_1 that do INSERT in table_2 for each INSERT on table_1 . Then you would want to make sure that read on table_2 must not get load-balanced within the same transaction after INSERT into table_1 . For this configuration you can set

dml_adaptive_object_relationship_list = 'table_1:table_2'
		

This parameter is only valid for disable_load_balance_on_write = 'dml_adaptive'

Note: To configure the dependency on the function, The function must be present in the write_function_list

statement_level_load_balance ( boolean )

When set to on, the load balancing node is decided for each read query. When set to off, load balancing node is decided at the session start time and will not be changed until the session ends. For example, in applications that use connection pooling remain connections open to the backend server, because the session may be held for a long time, the load balancing node does not change until the session ends. In such applications, When statement_level_load_balance is enabled, it is possible to decide load balancing node per query, not per session. The default is off.

This parameter can be changed by reloading the Pgpool-II configurations.