Load Balancing

Pgpool-II load balancing of SELECT queries works with Master Slave mode ( Section 5.3.1 ) and Replication mode ( Section 5.3.2 ). When enabled Pgpool-II sends the writing queries to the primary node in Master Slave mode, all of the backend nodes in Replication mode, and other queries get load balanced among all backend nodes. To which node the load balancing mechanism sends read queries is decided at the session start time and will not be changed until the session ends. However there are some exceptions. See below for more details.

Note: Queries which are sent to primary node or replicated because they cannot be balanced are also accounted for in the load balancing algorithm.

Note: If you don't want a query that qualifies for the load balancing to be load balanced by Pgpool-II , you can put arbitrary comment( /*NO LOAD BALANCE*/ is usually used) before the SELECT statement. This will disable the load balance of the particular query and Pgpool-II will send it to the master node (the primary node in Master Slave mode).

Note: You can check which DB node is assigned as the load balancing node by using SHOW POOL NODES .

5.7.1. Condition for Load Balancing

For a query to be load balanced, all the following requirements must be met:

  • PostgreSQL version 7.4 or later

  • either in replication mode or master slave mode

  • the query must not be in an explicitly declared transaction (i.e. not in a BEGIN ~ END block)

    • However, if following conditions are met, load balance is possible even if in an explicit transaction

      • transaction isolation level is not SERIALIZABLE

      • transaction has not issued a write query yet (until a write query is issued, load balance is possible. Here "write query" means non SELECT DML or DDL. Before Pgpool-II 4.1 , SELECTs having write functions as specified in black or white function list is not regarded as a write query.)

      • If black and white function list is empty, SELECTs having functions is regarded as a read only query.

  • it's not SELECT INTO

  • it's not SELECT FOR UPDATE nor FOR SHARE

  • it starts with "SELECT" or one of COPY TO STDOUT, EXPLAIN, EXPLAIN ANALYZE SELECT... ignore_leading_white_space = true will ignore leading white space. (Except for SELECTs using writing functions specified in black_function_list or white_function_list )

  • in master slave mode, in addition to above, following conditions must be met:

    • does not use temporary tables

    • does not use unlogged tables

    • does not use system catalogs

Note: You could suppress load balancing by inserting arbitrary comments just in front of the SELECT query:

    /*REPLICATION*/ SELECT ...
   

If you want to use comments without suppressing load balancing, you can set allow_sql_comments to on. Please refer to replicate_select as well.

Note: The JDBC driver has an autocommit option. If the autocommit is false, the JDBC driver sends "BEGIN" and "COMMIT" by itself. In this case the same restriction above regarding load balancing will be applied.

5.7.2. Writing queries may affect Load Balancing

In general, read queries are load balanced if certain conditions are met. However, writing queries may affect the load balancing. Here "writing queries" mean all the queries except below:

  • SELECT/WITH without FOR UPDATE/SHARE

  • COPY TO STODUT

  • EXPLAIN

  • EXPLAIN ANALYZE and the query is SELECT not including writing functions

  • SHOW

If writing queires appear, succeeding read queries may not be load balanced. i.e. sent to primary node (in streaming replication mode) or master node (in other mode) depending on the setting of disable_load_balance_on_write .

5.7.3. Load Balancing in Streaming Replication

While using Streaming replication and Hot Standby, it is important to determine which query can be sent to the primary or the standby, and which one should not be sent to the standby. Pgpool-II 's Streaming Replication mode carefully takes care of this.

We distinguish which query should be sent to which node by looking at the query itself.

  • These queries should be sent to the primary node only

    • INSERT, UPDATE, DELETE, COPY FROM, TRUNCATE, CREATE, DROP, ALTER, COMMENT

    • SELECT ... FOR SHARE | UPDATE

    • SELECT in transaction isolation level SERIALIZABLE

    • LOCK command more strict than ROW EXCLUSIVE MODE

    • DECLARE, FETCH, CLOSE

    • SHOW

    • Some transactional commands:

      • BEGIN READ WRITE, START TRANSACTION READ WRITE

      • SET TRANSACTION READ WRITE, SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE

      • SET transaction_read_only = off

    • Two phase commit commands: PREPARE TRANSACTION, COMMIT PREPARED, ROLLBACK PREPARED

    • LISTEN, UNLISTEN, NOTIFY

    • VACUUM

    • Some sequence functions (nextval and setval)

    • Large objects creation commands

    • Multi-statement queries (multiple SQL commands on single line)

  • These queries can be sent to both the primary node and the standby node. If load balancing is enabled, these types of queries can be sent to the standby node. However, if delay_threshold is set and the replication delay is higher than delay_threshold , queries are sent to the primary node.

    • SELECT not listed above

    • COPY TO STDOUT

    • EXPLAIN

    • EXPLAIN ANALYZE and the query is SELECT not including writing functions

    • SHOW

  • These queries are sent to both the primary node and the standby node

    • SET

    • DISCARD

    • DEALLOCATE ALL

In an explicit transaction:

  • Transaction starting commands such as BEGIN are sent to both the primary node and the standby node.

  • Following SELECT and some other queries that can be sent to both primary or standby are executed in the transaction or on the standby node.

  • Commands which cannot be executed on the standby such as INSERT are sent to the primary. After one of these commands, even SELECTs are sent to the primary node, This is because these SELECTs might want to see the result of an INSERT immediately. This behavior continues until the transaction closes or aborts.

In the extended protocol, it is possible to determine if the query can be sent to standby or not in load balance mode while parsing the query. The rules are the same as for the non extended protocol. For example, INSERTs are sent to the primary node. Following bind, describe and execute will be sent to the primary node as well.

Note: If the parse of a SELECT statement is sent to the standby node due to load balancing, and then a DML statement, such as an INSERT, is sent to Pgpool-II , then the parsed SELECT will have to be executed on the primary node. Therefore, we re-parse the SELECT on the primary node.

Lastly, queries that Pgpool-II 's parser thinks to be an error are sent to the primary node.

5.7.4. Load Balancing Settings

load_balance_mode ( boolean )

When set to on, Pgpool-II enables the load balancing on incoming SELECT queries. i.e. SELECT queries from the clients gets distributed to the configured PostgreSQL backends. Default is off.

This parameter can only be set at server start.

ignore_leading_white_space ( boolean )

When set to on, Pgpool-II ignores the white spaces at the beginning of SQL queries in load balancing. It is useful if used with APIs like DBI/DBD:Pg which adds white spaces against the user's intention.

This parameter can be changed by reloading the Pgpool-II configurations.

white_function_list ( string )

Specifies a comma separated list of function names that DO NOT update the database. SELECTs including functions not specified in this list are not load balanced. These are replicated among all the DB nodes in Replication mode, sent to the primary node only in Master Slave mode.

You can use regular expression to match function names, to which ^ and $ are automatically added.

Example 5-2. Using regular expression

If you have prefixed all your read only function with 'get_' or 'select_', You can set the white_function_list like below:

	 white_function_list = 'get_.*,select_.*'
	

Note: Schema qualifications can not be used in white_function_list because Pgpool-II silently ignores a schema qualification in function names appearing in an input SQL while comparing the list and the input SQL. As a result, a schema qualified function name in the list never matches function names appearing in the input SQL.

This parameter can be changed by reloading the Pgpool-II configurations.

black_function_list ( string )

Specifies a comma separated list of function names that DO update the database. SELECTs including functions specified in this list are not load balanced. These are replicated among all the DB nodes in Replication mode, sent to the primary node only in Master Slave mode.

You can use regular expression to match function names, to which ^ and $ are automatically added.

Example 5-3. Using regular expression

If you have prefixed all your updating functions with 'set_', 'update_', 'delete_' or 'insert_', You can set the black_function_list like below:

	 black_function_list = 'nextval,setval,set_.*,update_.*,delete_.*,insert_.*'
	

Note: Schema qualifications can not be used in black_function_list because Pgpool-II silently ignores a schema qualification in function names appearing in an input SQL while comparing the list and the input SQL. As a result, a schema qualified function name in the list never matches function names appearing in the input SQL.

Note: black_function_list and white_function_list are mutually exclusive and only one of the two lists can be set in the configuration.

Example 5-4. Configuring using nextval() and setval() to land on proper backend

Prior to Pgpool-II V3.0 , nextval() and setval() were known as functions writing to the database. You can configure this by setting black_function_list and white_function_list as follows

	  white_function_list = ''
	  black_function_list = 'nextval,setval,lastval,currval'
	 

Note: PostgreSQL also contains lastval() and currval() in addition to nextval() and setval() . Though lastval() and currval() are not writing function type, but it is advised to treat lastval() and currval() as writing functions to avoid errors which occur when these functions are accidentally load balanced.

This parameter can be changed by reloading the Pgpool-II configurations.

black_query_pattern_list ( string )

Specifies a semicolon separated list of SQL patterns that should be sent to primary node. SQL that matched patterns specified in this list are not load balanced. Only Master Slave mode is supported.

You can use regular expression to match SQL patterns, to which ^ and $ are automatically added. When using special characters in regular expressions (such as "'", ";", "*", "(", ")", "|", "+", ".", "\", "?", "^", "$", "{","}", "{" or "}", etc.) in SQL patterns, you need to escape them by using "\". SQL pattern specified in this parameter is case-insensitive.

Example 5-5. Using regular expression

If the following SQL should be sent to the primary node only, You can set the black_query_pattern_list like below:

  • SELECT * FROM table_name1;

  • SELECT col1, col2 FROM table_name2 WHERE col1 LIKE '%a%';

  • SQL including table_nama3

	black_query_pattern_list = 'SELECT \* FROM table_name1\;;SELECT col1, col2 FROM table_name2 WHERE col1 LIKE \'%a%\'\;;.*table_name3.*'
       

Note: If SQL matches both black_function_list and white_function_list , white_function_list setting is ignored and the SQL should be sent only to the primary node.

Depending on the SQL patterns, performance might be 1-2% lower when using this feature.

This parameter can be changed by reloading the Pgpool-II configurations.

database_redirect_preference_list ( string )

Specifies the list of "database-name:node id(ratio)" pairs to send SELECT queries to a particular backend node for a particular database connection at a specified load balance ratio. The load balance ratio specifies a value between 0 and 1. The default is 1.0.

For example, by specifying "test:1(0.5)", Pgpool-II will redirect 50% SELECT queries to the backend node of ID 1 for the connection to "test" database. You can specify multiple "database name:node id" pair by separating them using comma (,).

Regular expressions are also accepted for database name. You can use special keywords as node id . If "primary" is specified, queries are sent to the primary node, and if "standby" is specified, one of the standby nodes are selected randomly based on weights ( backend_weight ).

Example 5-6. Using database_redirect_preference_list

If you want to configure the following SELECT query routing rules:

  • Route all SELECT queries on postgres database to the primary backend node.

  • Route 30% SELECT queries on mydb0 or on mydb1 databases to backend node of ID. The other 70% SELECT queries will be sent to other backend nodes.

  • Route all SELECT queries on mydb2 database to standby backend nodes.

then the database_redirect_preference_list will be configured as follows:

	 database_redirect_preference_list = 'postgres:primary,mydb[01]:1(0.3),mydb2:standby'
	

This parameter can be changed by reloading the Pgpool-II configurations.

app_name_redirect_preference_list ( string )

Specifies the list of "application-name:node id(ratio)" pairs to send SELECT queries to a particular backend node for a particular client application connection at a specified load balance ratio.

Note: In PostgreSQL V9.0 or later the "Application name" is a name specified by a client when it connects to database.

For example, application name of psql command is "psql" .

Note: Pgpool-II recognizes the application name only specified in the start-up packet. Although a client can provide the application name later in the session, but that does not get considered by the Pgpool-II for query routing.

The notion of app_name_redirect_preference_list is same as the database_redirect_preference_list thus you can also use the regular expressions for application names. Similarly special keyword "primary" indicates the primary node and "standby" indicates one of standby nodes. The load balance weight specifies a value between 0 and 1. The default is 1.0.

Example 5-7. Using app-name_redirect_preference_list

If you want to configure the following SELECT query routing rules:

  • Route all SELECT from psql client to the primary backend node.

  • Route 30% SELECT queries from myapp1 client to backend node of ID 1. The other 70% SELECT queries will be sent to other backend nodes.

  • Route all SELECT queries from myapp2 client to standby backend nodes.

then the app_name_redirect_preference_list will be configured as follows:

	 app_name_redirect_preference_list = 'psql:primary,myapp1:1(0.3),myapp2:standby'
	

Note: app_name_redirect_preference_list takes precedence over the database_redirect_preference_list .

For example, if you set database_redirect_preference_list = 'postgres:standby(1.0)' and app_name_redirect_preference_list = 'myapp1:primary(1.0)' , all SELECT from application myapp1 on postgres database will be sent to primary backend node.

Note: By specifying of app_name_redirect_preference_list and database_redirect_preference_list , when multiple database names and application names are matched, the first setting will be used.

For example, if you set database_redirect_preference_list = 'postgres:primary,postgres:standby' , "postgres: primary" will be used.

Caution

JDBC driver postgreSQL-9.3 and earlier versions does not send the application name in the startup packet even if the application name is specified using the JDBC driver option "ApplicationName" and "assumeMinServerVersion=9.0" . So if you want to use the app_name_redirect_preference_list feature through JDBC , Use postgreSQL-9.4 or later version of the driver.

This parameter can be changed by reloading the Pgpool-II configurations.

allow_sql_comments ( boolean )

When set to on, Pgpool-II ignore the SQL comments when identifying if the load balance or query cache is possible on the query. When this parameter is set to off, the SQL comments on the query could effectively prevent the query from being load balanced or cached (pre Pgpool-II V3.4 behavior).

This parameter can be changed by reloading the Pgpool-II configurations. You can also use PGPOOL SET command to alter the value of this parameter for a current session.

disable_load_balance_on_write ( string )

Specify load balance behavior after write queries appear. This parameter is especially useful in streaming replication mode. When write queries are sent to primary server, the changes are applied to standby servers but there's a time lag. So if a client read the same row right after the write query, the client may not see the latest value of the row. If that's the problem, clients should always read data from the primary server. However this effectively disables load balancing, which leads to lesser performance. This parameter allows a fine tuning for the trade off between not-clustering-aware applications compatibility and performance.

If this parameter is set to off , read queries are load balanced even if write queries appear. This gives the best load balance performance but clients may see older data. This is useful for an environment where PostgreSQL parameter synchronous_commit = 'remote_apply', or in the native replication mode, since there's no replication delay in such environments.

If this parameter is set to transaction and write queries appear in an explicit transaction, subsequent read queries are not load balanced until the transaction ends. Please note that read queries not in an explicit transaction are not affected by the parameter. This setting gives the best balance in most cases and you should start from this. This is the default and same behavior in Pgpool-II 3.7 or before.

If this parameter is set to trans_transaction and write queries appear in an explicit transaction, subsequent read queries are not load balanced in the transaction and subsequent explicit transaction until the session ends. So this parameter is safer for older applications but give lesser performance than transaction . Please note that read queries not in an explicit transaction are not affected by the parameter.

If this parameter is set to always and write queries appear, subsequent read queries are not load balanced until the session ends regardless they are in explicit transactions or not. This gives the highest compatibility with not-clustering-aware applications and the lowest performance.

statement_level_load_balance ( boolean )

When set to on, the load balancing node is decided for each read query. When set to off, load balancing node is decided at the session start time and will not be changed until the session ends. For example, in applications that use connection pooling remain connections open to the backend server, because the session may be held for a long time, the load balancing node does not change until the session ends. In such applications, When statement_level_load_balance is enabled, it is possible to decide load balancing node per query, not per session. The default is off.

This parameter can be changed by reloading the Pgpool-II configurations.