Configuring Pgpool-II

3.3.1. Configuring pgpool.conf

pgpool.conf is the main configuration file of Pgpool-II . You need to specify the path to the file when starting Pgpool-II using -f option. pgpool.conf is located at $prefix/etc/pgpool.conf by default, if it installed from source code.

For each Pgpool-II clustering mode, there are sample configurations.

Table 3-1. pgpool.conf samples

Clustering mode Configuration file name
Streaming replication mode pgpool.conf.sample-stream
Replication mode pgpool.conf.sample-replication
Logical replication mode pgpool.conf.sample-logical
Slony mode pgpool.conf.sample-slony
Snapshot isolation mode pgpool.conf.sample-snapshot
Raw mode pgpool.conf.sample-raw

These configuration files are located at /usr/local/etc with default installation from source code. You can copy one of them as pgpool.conf . (probably you need root privilege for this)

# cd /usr/local/etc
# cp pgpool.conf.sample-stream pgpool.conf
    

3.3.2. Running mode of Pgpool-II

There are six different running modes in Pgpool-II : streaming replication mode, logical replication mode, main replica mode (slony mode), native replication mode, raw mode and snapshot isolaton mode. In any mode, Pgpool-II provides connection pooling, and automatic fail over. Online recovery can be used only with streaming replication mode and native replication mode.

Those modes are exclusive each other and cannot be changed after starting the server. You should make a decision which to use in the early stage of designing the system. If you are not sure, it is recommended to use the streaming replication mode.

The streaming replication mode can be used with PostgreSQL servers operating streaming replication. In this mode, PostgreSQL is responsible for synchronizing databases. This mode is widely used and most recommended way to use Pgpool-II . Load balancing is possible in the mode.

In the native replication mode , Pgpool-II is responsible for synchronizing databases. The advantage for the mode is the synchronization is done in synchronous way: writing to the database does not return until all of PostgreSQL servers finish the write operation. However, you could get a similar effect using PostgreSQL 9.6 or later with synchronous_commit = remote_apply being set in streaming replication. If you could use the setting, we strongly recommend to use it instead of native replication mode because you can avoid some restrictions in the native replication mode. Since PostgreSQL does not provide cross node snapshot control, it is possible that session X can see data on node A committed by session Y before session Y commits data on node B. If session X tries to update data on node B based on the data seen on node A, then data consistency between node A and B might be lost. To avoid the problem, user need to issue an explicit lock on the data. This is another reason why we recommend to use streaming replication mode with synchronous_commit = remote_apply .

Load balancing is possible in the mode.

The logical replication mode can be used with PostgreSQL servers operating logical replication. In this mode, PostgreSQL is responsible for synchronizing tables. Load balancing is possible in the mode. Since logical replication does not replicate all tables, it's user's responsibility to replicate the table which could be load balanced. Pgpool-II load balances all tables. This means that if a table is not replicated, Pgpool-II may lookup outdated tables in the subscriber side.

The main replica mode mode (slony mode) can be used with PostgreSQL servers operating Slony . In this mode, Slony / PostgreSQL is responsible for synchronizing databases. Since Slony-I is being obsoleted by streaming replication, we do not recommend to use this mode unless you have specific reason to use Slony . Load balancing is possible in the mode.

The snapshot isolation mode is similar to the native replication mode and it adds the visibility consistency among nodes. Please note that there are some limitations in this mode and currently (in Pgpool-II 4.2) this mode is regarded as "experimental" implementation. Be warned that careful testings are required before you implement this in a production system.

  • It is necessary to set the transaction isolation level to REPEATABLE READ. That means you need to set it in postgresql.conf like this:

    default_transaction_isolation = 'repeatable read'
          

  • Consistent visibility in SERIAL data type and sequences are not guaranteed.

In the raw mode , Pgpool-II does not care about the database synchronization. It's user's responsibility to make the whole system does a meaningful thing. Load balancing is not possible in the mode.