Load Balancing

Pgpool-II load balancing of SELECT queries works with Master Slave mode ( Section 5.3.1 ) and Replication mode ( Section 5.3.2 ). When enabled Pgpool-II sends the writing queries to the primay node in Master Slave mode, all of the backend nodes in Replication mode, and other queries get load balanced among all backend nodes. To which node the load balancing mechanism sends read queries is decided at the session start time and will not be changed until the session ends. The only exception is by writing special SQL comments. See below for more details.

Note: Queries which are sent to primary node or replicated because they cannot be balanced are also accounted for in the load balancing algorithm.

Note: If you don't want a query that qualifies for the load balancing to be load balanced by Pgpool-II , you can put /*NO LOAD BALANCE*/ comment before the SELECT statement. This will disable the load balance of the particular query and Pgpool-II will send it to the master node (the primary node in Master Slave mode).

5.7.1. Condition for Load Balancing

For a query to be load balanced, all the following requirements must be met:

  • PostgreSQL version 7.4 or later

  • either in replication mode or master slave mode

  • the query must not be in an explicitly declared transaction (i.e. not in a BEGIN ~ END block)

    • However, if following conditions are met, load balance is possible even if in an explicit transaction

      • transaction isolation level is not SERIALIZABLE

      • transaction has not issued a write query yet (until a write query is issued, load balance is possible. Here "write query" means non SELECT DML or DDL. SELECTs having write functions as specified in black or white function list is not regarded as a write query. This may be changed in the future.)

      • If black and white function list is empty, SELECTs having functions is regarded as a read only query.

  • it's not SELECT INTO

  • it's not SELECT FOR UPDATE nor FOR SHARE

  • it starts with "SELECT" or one of COPY TO STDOUT, EXPLAIN, EXPLAIN ANALYZE SELECT... ignore_leading_white_space = true will ignore leading white space. (Except for SELECTs using writing functions specified in black_function_list or white_function_list )

  • in master slave mode, in addition to above, following conditions must be met:

    • does not use temporary tables

    • does not use unlogged tables

    • does not use system catalogs

Note: You could suppress load balancing by inserting arbitrary comments just in front of the SELECT query:

/*REPLICATION*/ SELECT ...
   

If you want to use comments without supressing load balancing, you can set allow_sql_comments to on. Please refer to replicate_select as well.

Note: The JDBC driver has an autocommit option. If the autocommit is false, the JDBC driver sends "BEGIN" and "COMMIT" by itself. In this case the same restriction above regarding load balancing will be applied.

5.7.2. Load Balancing in Streaming Replication

While using Streaming replication and Hot Standby, it is important to determine which query can be sent to the primary or the standby, and which one should not be sent to the standby. Pgpool-II 's Streaming Replication mode carefully takes care of this.

We distinguish which query should be sent to which node by looking at the query itself.

  • These queries should be sent to the primary node only

    • INSERT, UPDATE, DELETE, COPY FROM, TRUNCATE, CREATE, DROP, ALTER, COMMENT

    • SELECT ... FOR SHARE | UPDATE

    • SELECT in transaction isolation level SERIALIZABLE

    • LOCK command more strict than ROW EXCLUSIVE MODE

    • DECLARE, FETCH, CLOSE

    • SHOW

    • Some transactional commands:

      • BEGIN READ WRITE, START TRANSACTION READ WRITE

      • SET TRANSACTION READ WRITE, SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE

      • SET transaction_read_only = off

    • Two phase commit commands: PREPARE TRANSACTION, COMMIT PREPARED, ROLLBACK PREPARED

    • LISTEN, UNLISTEN, NOTIFY

    • VACUUM

    • Some sequence functions (nextval and setval)

    • Large objects creation commands

  • These queries can be sent to both the primary node and the standby node. If load balancing is enabled, these types of queries can be sent to the standby node. However, if delay_threshold is set and the replication delay is higher than delay_threshold , queries are sent to the primary node.

    • SELECT not listed above

    • COPY TO

  • These queries are sent to both the primary node and the standby node

    • SET

    • DISCARD

    • DEALLOCATE ALL

In an explicit transaction:

  • Transaction starting commands such as BEGIN are sent to the primary node.

  • Following SELECT and some other queries that can be sent to both primary or standby are executed in the transaction or on the standby node.

  • Commands which cannot be executed on the standby such as INSERT are sent to the primary. After one of these commands, even SELECTs are sent to the primary node, This is because these SELECTs might want to see the result of an INSERT immediately. This behavior continues until the transaction closes or aborts.

In the extended protocol, it is possible to determine if the query can be sent to standby or not in load balance mode while parsing the query. The rules are the same as for the non extended protocol. For example, INSERTs are sent to the primary node. Following bind, describe and execute will be sent to the primary node as well.

Note: If the parse of a SELECT statement is sent to the standby node due to load balancing, and then a DML statement, such as an INSERT, is sent to Pgpool-II , then the parsed SELECT will have to be executed on the primary node. Therefore, we re-parse the SELECT on the primary node.

Lastly, queries that Pgpool-II 's parser thinks to be an error are sent to the primary node.

5.7.3. Load Balancing Settings

load_balance_mode ( boolean )

When set to on, Pgpool-II enables the load balancing on incoming SELECT queries. i.e. SELECT queries from the clients gets distributed to the configured PostgreSQL backends. Default is off.

This parameter can only be set at server start.

ignore_leading_white_space ( boolean )

When set to on, Pgpool-II ignores the white spaces at the beginning of SQL queries in load balancing. It is useful if used with APIs like DBI/DBD:Pg which adds white spaces against the user's intention.

This parameter can be changed by reloading the Pgpool-II configurations.

white_function_list ( string )

Specifies a comma separated list of function names that DO NOT update the database. SELECTs including functions not specified in this list are not load balanced. These are replicated among all the DB nodes in Replication mode, sent to the primary node only in Maste Slave mode.

You can use regular expression to match function names, to which ^ and $ are automatically added.

Example 5-2. Using regular expression

If you have prefixed all your read only function with 'get_' or 'select_', You can set the white_function_list like below:

white_function_list = 'get_.*,select_.*'
          

This parameter can be changed by reloading the Pgpool-II configurations.

black_function_list ( string )

Specifies a comma separated list of function names that DO update the database. SELECTs including functions specified in this list are not load balanced. These are replicated among all the DB nodes in Replication mode, sent to the primary node only in Maste Slave mode.

You can use regular expression to match function names, to which ^ and $ are automatically added.

Example 5-3. Using regular expression

If you have prefixed all your updating functions with 'set_', 'update_', 'delete_' or 'insert_', You can set the black_function_list like below:

black_function_list = 'nextval,setval,set_.*,update_.*,delete_.*,insert_.*'
          

Note: black_function_list and white_function_list are mutually exclusive and only one of the two lists can be set in the configuration.

Example 5-4. Configuring using nextval() and setval() to land on proper backend

Prior to Pgpool-II V3.0 , nextval() and setval() were known as functions writing to the database. You can configure this by setting black_function_list and white_function_list as follows

white_function_list = ''
black_function_list = 'nextval,setval,lastval,currval'
              

Note: PostgreSQL also contains lastval() and currval() in addition to nextval() and setval() . Though lastval() and currval() are not writing function type, but it is advised to treat lastval() and currval() as writing functions to avoid errors which occur when these functions are accidentally load balanced.

This parameter can be changed by reloading the Pgpool-II configurations.

database_redirect_preference_list ( string )

Specifies the list of "database-name:node id" pairs to send SELECT queries to a particular backend node for a particular database connection. For example, by specifying "test:1", Pgpool-II will redirect all SELECT queries to the backend node of ID 1 for the connection to "test" database. You can specify multiple "database name:node id" pair by separating them using comma (,).

Regular expressions are also accepted for database name. You can use special keywords as node id . If "primary" is specified, queries are sent to the primary node, and if "standby" is specified, one of the standby nodes are selected randomly based on weights.

Example 5-5. Using database_redirect_preference_list

If you want to configure the following SELECT query routing rules:

  • Route all SELECT queries on postgres database to the primary backend node.

  • Route all SELECT queries on mydb0 or on mydb1 databases to backend node of ID 1.

  • Route all SELECT queries on mydb2 database to standby backend nodes.

then the database_redirect_preference_list will be configured as follows:

database_redirect_preference_list = 'postgres:primary,mydb[01]:1,mydb2:standby'
              

This parameter can be changed by reloading the Pgpool-II configurations.

app_name_redirect_preference_list ( string )

Specifies the list of "application-name:node id" pairs to send SELECT queries to a particular backend node for a particular client application connection.

Note: In PostgreSQL V9.0 or later the "Application name" is a name specified by a client when it connects to database.

For example, application name of psql command is "psql"

Note: Pgpool-II recognizes the application name only specified in the start-up packet. Although a client can provide the application name later in the session, but that does not get considered by the Pgpool-II for query routing.

The notion of app_name_redirect_preference_list is same as the database_redirect_preference_list thus you can also use the regular expressions for application names. Similarly special keyword "primary" indicates the primary node and "standby" indicates one of standby nodes.

Example 5-6. Using app-name_redirect_preference_list

If you want to configure the following SELECT query routing rules:

  • Route all SELECT from psql client to the primary backend node.

  • Route all SELECT queries from myapp1 client to backend node of ID 1.

  • Route all SELECT queries from myapp2 client to standby backend nodes.

then the app_name_redirect_preference_list will be configured as follows:

app_name_redirect_preference_list = 'psql:primary,myapp1:1,myapp2:standby'
              

Caution

JDBC driver postgreSQL-9.3 and earlier versions does not send the application name in the startup packet even if the application name is specified using the JDBC driver option "ApplicationName" and "assumeMinServerVersion=9.0" . So if you want to use the app_name_redirect_preference_list feature through JDBC , Use postgreSQL-9.4 or later version of the driver.

This parameter can be changed by reloading the Pgpool-II configurations.

allow_sql_comments ( boolean )

When set to on, Pgpool-II ignore the SQL comments when identifying if the load balance or query cache is possible on the query. When this parameter is set to off, the SQL comments on the query could effectively prevent the query from being load balanced or cached (pre Pgpool-II V3.4 behavior).

This parameter can be changed by reloading the Pgpool-II configurations. You can also use PGPOOL SET command to alter the value of this parameter for a current session.