Pgpoo-II + Watchdog Setup Example

This section shows an example of streaming replication configuration using Pgpool-II . In this example, we use 3 Pgpool-II servers to manage PostgreSQL servers to create a robust cluster system and avoid the single point of failure or split brain.

7.3.1. Requirements

We assume that all the Pgpool-II servers and the PostgreSQL servers are in the same subnet.

7.3.2. Cluster System Configuration

We use 2 PostgreSQL servers and 3 Pgpool-II servers with CentOS7. Let these servers be osspc16 , osspc17 , osspc18 , osspc19 and osspc20 .

Figure 7-1. Cluster System Configuration

Note: The roles of Active , Standy , Primary , Standby are not fixed and may be changed by further operations.

7.3.3. Installation

In this example, we install Pgpool-II and PostgreSQL by using RPM packages.

Table 7-2. Pgpool-II , PostgreSQL version informations and Configuration

Server Version Host Name Port $PGDATA Directory
PostgreSQL server (primary) PostgreSQL 9.6.1 osspc19 5432 /var/lib/pgsql/9.6/data
PostgreSQL server (standby) PostgreSQL 9.6.1 osspc20 5432 /var/lib/pgsql/9.6/data
Pgpool-II server Pgpool-II 3.6.1 osspc16 9999 -
Pgpool-II server Pgpool-II 3.6.1 osspc17 9999 -
Pgpool-II server Pgpool-II 3.6.1 osspc18 9999 -

Install Pgpool-II by using Pgpool-II YUM repository.

# yum install http://www.pgpool.net/yum/rpms/3.6/redhat/rhel-7-x86_64/pgpool-II-release-3.6-1.noarch.rpm
# yum install pgpool-II-pg96
# yum install pgpool-II-pg96-debuginfo
# yum install pgpool-II-pg96-devel
# yum install pgpool-II-pg96-extensions
      

Install PostgreSQL by using PostgreSQL YUM repository.

# yum install https://yum.postgresql.org/9.6/redhat/rhel-7-x86_64/pgdg-centos96-9.6-3.noarch.rpm
# yum install postgresql96 postgresql96-devel postgresql96-server
      

7.3.4. Before Starting

Before you start the configuration process, please check the following prerequisites.

  • Set up PostgreSQL streaming replication on the primary server. In this example, we use WAL archiving.

    First, we create the directory /var/lib/pgsql/archivedir to store WAL segments on both PostgreSQL servers ( osspc19 and osspc20 ).

    [PostgreSQL server]$ mkdir /var/lib/pgsql/archivedir
              

    Then we edit the configuration file $PGDATA/postgresql.conf on osspc19 (primary) as follows.

    listen_addresses = '*'
    wal_level = hot_standby
    max_wal_senders = 2
    
    archive_mode = on
    archive_command = 'cp "%p" "/var/lib/pgsql/archivedir/%f"'
              

    We use the online recovery functionality of Pgpool-II to setup standby server after the primary server is started.

  • Because of the security reasons, we create a user repl solely used for replication purpose, and a user pgpool for streaming replication delay check and health check of Pgpool-II . Assuming that all the Pgpool-II servers and the PostgreSQL servers are in the network of 133.137.174.0/24 , and edit pg_hba.conf to enable md5 authentication method.

    host    all             pgpool          133.137.174.0/24         md5
    host    all             all             0.0.0.0/0                md5
    
    host    replication     repl            133.137.174.0/24         md5
              
  • To use the failover and online recovery of Pgpool-II , the settings that allow SSH without passowrd to other servers ( osspc16 - osspc20 ) are necessary.

  • To allow repl user without specifying password for streaming replication and online recovery, we create the .pgpass file in postgres user's home directory and change the permisson to 600 on both postgreSQL servers osspc19 and osspc20 .

    [osspc19]$ cat /var/lib/pgsql/.pgpass
    osspc20:5432:replication:repl:<password of repl user>
                
    [osspc20]$ cat /var/lib/pgsql/.pgpass
    osspc19:5432:replication:repl:<passowrd of repl user>
                
    $ chmod 600  /var/lib/pgsql/.pgpass
                
  • When Pgpool-II connects to other Pgpool-II servers or PostgreSQL servers, the target port must be accessible by enabling firewall management softwares. Following is an example for CentOS/RHEL7 .

    [PostgreSQL server]# firewall-cmd --permanent --zone=public --add-service=postgresql
    [PostgreSQL server]# firewall-cmd --reload
                
  • The following commands are to enable Pgpool-II and PostgreSQL start on system boot.

    [Pgpool-II server]# systemctl enable pgpool.service
                
    [PostgreSQL server]# systemctl enable postgresql.service
                

7.3.5. Pgpool-II Configuration

7.3.5.1. Common Settings

Here are the common settings on osspc16 , osspc17 and osspc18 .

When installing Pgpool-II from RPM, all the Pgpool-II configuration files are in /etc/pgpool-II . In this example, we copy the sample configuration file for streaming replicaton mode.

# cp /etc/pgpool-II/pgpool.conf.sample-stream /etc/pgpool-II/pgpool.conf
        

To allow Pgpool-II to accept all incoming connections, we set listen_addresses = '*' .

listen_addresses = '*'
        

Specifiy replication delay check user and password.

sr_check_user = 'pgpool'
sr_check_password = 'pgpool'
        

Enable health check so that pgpool-II performs failover. Also, if the network is unstable, the health check fails even though the backend is running properly, failover or degenerate operation may occur. In order to prevent such incorrect detection of health check, we set health_check_max_retries = 10 .

health_check_period = 5
                                   # Health check period
                                   # Disabled (0) by default
health_check_timeout = 20
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'pgpool'
health_check_password = 'pgpool'

health_check_max_retries = 10
        

Specify the backend informations with osspc19 and osspc20 .

# - Backend Connection Settings -

backend_hostname0 = 'osspc19'
                                   # Host name or IP address to connect to for backend 0
backend_port0 = 5432
                                   # Port number for backend 0
backend_weight0 = 1
                                   # Weight for backend 0 (only in load balancing mode)
backend_data_directory0 = '/var/lib/pgsql/9.6/data'
                                   # Data directory for backend 0
backend_flag0 = 'ALLOW_TO_FAILOVER'
                                   # Controls various backend behavior
                                   # ALLOW_TO_FAILOVER or DISALLOW_TO_FAILOVER
backend_hostname1 = 'osspc20'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/9.6/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
        

7.3.5.2. Failover configuration

Specify failover_command to execute failover.sh script. The special characters %d %P %H %R in failover_command are replcaed with DB node ID of the detached node , Old primary node ID , Hostname of the new master node , Database cluster directory of the new master node .

failover_command = '/etc/pgpool-II/failover.sh %d %P %H %R'
      

Create /etc/pgpool-II/failover.sh , and set the file permisson to 755 .

# vi /etc/pgpool-II/failover.sh
# chmod 755 /etc/pgpool-II/failover.sh
      

  • /etc/pgpool-II/failover.sh

    #! /bin/sh -x
    # Execute command by failover.
    # special values:  %d = node id
    #                  %h = host name
    #                  %p = port number
    #                  %D = database cluster path
    #                  %m = new master node id
    #                  %M = old master node id
    #                  %H = new master node host name
    #                  %P = old primary node id
    #                  %R = new master database cluster path
    #                  %r = new master port number
    #                  %% = '%' character
    
    falling_node=$1          # %d
    old_primary=$2           # %P
    new_primary=$3           # %H
    pgdata=$4                # %R
    
    pghome=/usr/pgsql-9.6
    log=/var/log/pgpool/failover.log
    
    date >> $log
    echo "failed_node_id=$falling_node new_primary=$new_primary" >> $log
    
    if [ $falling_node = $old_primary ]; then
        if [ $UID -eq 0 ]
        then
            su postgres -c "ssh -T postgres@$new_primary $pghome/bin/pg_ctl promote -D $pgdata"
        else
            ssh -T postgres@$new_primary $pghome/bin/pg_ctl promote -D $pgdata
        fi
        exit 0;
    fi;
    exit 0;
              

7.3.5.3. Pgpool-II Online Recovery Configurations

Next, in order to perform online recovery with Pgpool-II we specify the PostgreSQL user name and online recovery command recovery_1st_stage . Then, we create recovery_1st_stage and pgpool_remote_start in database cluster directory of PostgreSQL primary server, and set the file permissions to 755 .

recovery_user = 'postgres'
                                   # Online recovery user
recovery_password = 'postgres'
                                   # Online recovery password

recovery_1st_stage_command = 'recovery_1st_stage'
        
$ vi /var/lib/pgsql/9.6/data/recovery_1st_stage
$ vi /var/lib/pgsql/9.6/data/pgpool_remote_start
$ chmod 755 /var/lib/pgsql/9.6/data/recovery_1st_stage
$ chmod 755 /var/lib/pgsql/9.6/data/pgpool_remote_start
        

  • /var/lib/pgsql/9.6/data/recovery_1st_stage

    #!/bin/bash -x
    # Recovery script for streaming replication.
    
    pgdata=$1
    remote_host=$2
    remote_pgdata=$3
    port=$4
    
    pghome=/usr/pgsql-9.6
    archivedir=/var/lib/pgsql/archivedir
    hostname=$(hostname)
    
    ssh -T postgres@$remote_host "
    rm -rf $remote_pgdata
    $pghome/bin/pg_basebackup -h $hostname -U repl -D $remote_pgdata -x -c fast
    rm -rf $archivedir/*
    
    cd $remote_pgdata
    cp postgresql.conf postgresql.conf.bak
    sed -e 's/#*hot_standby = off/hot_standby = on/' postgresql.conf.bak > postgresql.conf
    rm -f postgresql.conf.bak
    cat > recovery.conf << EOT
    standby_mode = 'on'
    primary_conninfo = 'host="$hostname" port=$port user=repl'
    restore_command = 'scp $hostname:$archivedir/%f %p'
    EOT
    "
                
  • /var/lib/pgsql/9.6/data/pgpool_remote_start

    #! /bin/sh -x
    
    pghome=/usr/pgsql-9.6
    remote_host=$1
    remote_pgdata=$2
    
    # Start recovery target PostgreSQL server
    ssh -T $remote_host $pghome/bin/pg_ctl -w -D $remote_pgdata start > /dev/null 2>&1 < /dev/null &
              

In order to use the online recovery functionality, the functions of pgpool_recovery , pgpool_remote_start , pgpool_switch_xlog are required, so we need install pgpool_recovery on template1 of PostgreSQL server osspc19 .

# su - postgres
$ psql template1
=# CREATE EXTENSION pgpool_recovery;
        

7.3.5.4. Client Authentication Configuration

Because in the section Before Starting , we already set PostgreSQL authentication method to md5 , it is necessary to set a client authentication by Pgpool-II to connect to backend nodes. When installing from RPM, the Pgpool-II configuration file pool_hba.conf is in /etc/pgpool-II . By default, pool_hba authentication is disabled, and set enable_pool_hba = on to enable it.

enable_pool_hba = on
        

The format of pool_hba.conf file follows very closely PostgreSQL's pg_hba.conf format. Set pgpool and postgres user's authentication method to md5 .

host    all         pgpool           0.0.0.0/0          md5
host    all         postgres         0.0.0.0/0          md5
        

To use md5 authentication, we need to register the user name and password in file pool_passwd . Execute command pg_md5 --md5auth --username=<user name> <password> to regist user name and MD5-hashed password in file pool_passwd . If pool_passwd doesn't exist yet, it will be created in the same directory as pgpool.conf .

# pg_md5 --md5auth --username=pgpool <password of pgpool user>
# pg_md5 --md5auth --username=postgres <password of postgres user>
        

7.3.5.5. Watchdog Configuration

Enable watchdog functionality on osspc16 , osspc17 , osspc18 .

use_watchdog = on
        

Specify virtual IP address that accepts connections from clients on osspc16 , osspc17 , osspc18 . Ensure that the IP address set to virtual IP isn't used yet.

delegate_IP = '133.137.174.153'
        

Specify the hostname and port number of each Pgpool-II server.

  • osspc16

    wd_hostname = 'osspc16'
    wd_port = 9000
                
  • osspc17

    wd_hostname = 'osspc17'
    wd_port = 9000
                
  • osspc18

    wd_hostname = 'osspc18'
    wd_port = 9000
                

Specify the hostname, Pgpool-II port number, and watchdog port number of monitored Pgpool-II servers on each Pgpool-II server.

  • osspc16

    # - Other pgpool Connection Settings -
    
    other_pgpool_hostname0 = 'osspc17'
                                        # Host name or IP address to connect to for other pgpool 0
                                        # (change requires restart)
    other_pgpool_port0 = 9999
                                        # Port number for other pgpool 0
                                        # (change requires restart)
    other_wd_port0 = 9000
                                        # Port number for other watchdog 0
                                        # (change requires restart)
    other_pgpool_hostname1 = 'osspc18'
    other_pgpool_port1 = 9999
    other_wd_port1 = 9000
                
  • osspc17

    # - Other pgpool Connection Settings -
    
    other_pgpool_hostname0 = 'osspc16'
                                        # Host name or IP address to connect to for other pgpool 0
                                        # (change requires restart)
    other_pgpool_port0 = 9999
                                        # Port number for other pgpool 0
                                        # (change requires restart)
    other_wd_port0 = 9000
                                        # Port number for other watchdog 0
                                        # (change requires restart)
    other_pgpool_hostname1 = 'osspc18'
    other_pgpool_port1 = 9999
    other_wd_port1 = 9000
                
  • osspc18

    # - Other pgpool Connection Settings -
    
    other_pgpool_hostname0 = 'osspc16'
                                        # Host name or IP address to connect to for other pgpool 0
                                        # (change requires restart)
    other_pgpool_port0 = 9999
                                        # Port number for other pgpool 0
                                        # (change requires restart)
    other_wd_port0 = 9000
                                        # Port number for other watchdog 0
                                        # (change requires restart)
    other_pgpool_hostname1 = 'osspc17'
    other_pgpool_port1 = 9999
    other_wd_port1 = 9000
                

Specify the hostname and port number of destination for sending heartbeat signal on osspc16 , osspc17 , osspc18 .

  • osspc16

    heartbeat_destination0 = 'osspc17'
                                        # Host name or IP address of destination 0
                                        # for sending heartbeat signal.
                                        # (change requires restart)
    heartbeat_destination_port0 = 9694
                                        # Port number of destination 0 for sending
                                        # heartbeat signal. Usually this is the
                                        # same as wd_heartbeat_port.
                                        # (change requires restart)
    heartbeat_device0 = ''
                                        # Name of NIC device (such like 'eth0')
                                        # used for sending/receiving heartbeat
                                        # signal to/from destination 0.
                                        # This works only when this is not empty
                                        # and pgpool has root privilege.
                                        # (change requires restart)
    
    heartbeat_destination1 = 'osspc18'
    heartbeat_destination_port1 = 9694
    heartbeat_device1 = ''
    
                
  • osspc17

    heartbeat_destination0 = 'osspc16'
                                        # Host name or IP address of destination 0
                                        # for sending heartbeat signal.
                                        # (change requires restart)
    heartbeat_destination_port0 = 9694
                                        # Port number of destination 0 for sending
                                        # heartbeat signal. Usually this is the
                                        # same as wd_heartbeat_port.
                                        # (change requires restart)
    heartbeat_device0 = ''
                                        # Name of NIC device (such like 'eth0')
                                        # used for sending/receiving heartbeat
                                        # signal to/from destination 0.
                                        # This works only when this is not empty
                                        # and pgpool has root privilege.
                                        # (change requires restart)
    
    heartbeat_destination1 = 'osspc18'
    heartbeat_destination_port1 = 9694
    heartbeat_device1 = ''
    
                
  • osspc18

    heartbeat_destination0 = 'osspc16'
                                        # Host name or IP address of destination 0
                                        # for sending heartbeat signal.
                                        # (change requires restart)
    heartbeat_destination_port0 = 9694
                                        # Port number of destination 0 for sending
                                        # heartbeat signal. Usually this is the
                                        # same as wd_heartbeat_port.
                                        # (change requires restart)
    heartbeat_device0 = ''
                                        # Name of NIC device (such like 'eth0')
                                        # used for sending/receiving heartbeat
                                        # signal to/from destination 0.
                                        # This works only when this is not empty
                                        # and pgpool has root privilege.
                                        # (change requires restart)
    
    heartbeat_destination1 = 'osspc17'
    heartbeat_destination_port1 = 9694
    heartbeat_device1 = ''
                

7.3.5.6. PCP Command Configuration

Because user authentication is required to use the PCP command, we specify user name and md5 encrypted password in pcp.conf . Here we create the encrypted password for user postgres , and add <username: encrypted password&gl in /etc/pgpool-II/pcp.conf .

# pg_md5 -p
Password: (input password)
(paste the md5 encrypted password to pcp.conf)

# vi /etc/pgpool-II/pcp.conf
(add password entry)
user name:md5 encrypted password
        

The settings of Pgpool-II is completed.

7.3.6. Starting/Stopping Pgpool-II

Next we start Pgpool-II . Before starting Pgpool-II , please start PostgreSQL servers first. Also, when stopping PostgreSQL , it is necessary to stop Pgpool-II first.

  • Starting Pgpool-II

    In section Before Starting , we already set the auto-start of Pgpool-II . To start Pgpool-II , restart the whole system or execute the following command.

    # systemctl start pgpool.service
              
  • Stopping Pgpool-II

    # systemctl stop pgpool.service
              

7.3.7. Log

Use journalctl command to see Pgpool-II logs.

# journalctl -a | grep pgpool
      

7.3.8. How to use

Let's start to use Pgpool-II . First, let's start Pgpool-II on osspc16 , osspc17 , osspc18 by using the following command.

# systemctl start pgpool.service
      

7.3.8.1. Set up PostgreSQL standby server

First, we should set up PostgreSQL standby server by using Pgpool-II online recovery functionality. Ensure that recovery_1st_stage and pgpool_remote_start scripts used by pcp_recovery_node command are in database cluster directory of PostgreSQL primary server ( osspc19 ).

# pcp_recovery_node -h 133.137.174.153 -p 9898 -U postgres -n 1
      

After executing pcp_recovery_node command, vertify that osspc20 is started as a PostgreSQL standby server.

# psql -h 133.137.174.153 -p 9999 -U pgpool postgres

postgres=> show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | osspc19  | 5432 | up     | 0.500000  | primary | 0          | true              | 0
 1       | osspc20  | 5432 | up     | 0.500000  | standby | 0          | false             | 0
      

7.3.8.2. Switching active/standby watchdog

Confirm the watchdog status by using pcp_watchdog_info . The Pgpool-II server which is started first run as MASTER .

# pcp_watchdog_info -h 133.137.174.153 -p 9898 -U postgres
Password:
3 YES osspc16:9999 Linux osspc16 osspc16

osspc16:9999 Linux osspc16 osspc16 9999 9000 4 MASTER  #The Pgpool-II server started first becames "MASTER".
osspc17:9999 Linux osspc17 osspc17 9999 9000 7 STANDBY #run as standby
osspc18:9999 Linux osspc18 osspc18 9999 9000 7 STANDBY #run as standby
        

Stop active server osspc16 , then osspc17 or osspc18 will be promoted to active server. To stop osspc16 , we can stop Pgpool-II service or shutdown the whole system. Here, we stop Pgpool-II service.

[root@osspc16 ~]# systemctl stop pgpool.service
[root@osspc16 ~]# pcp_watchdog_info -h 133.137.174.153 -p 9898 -U postgres
Password:
3 YES osspc17:9999 Linux osspc17 osspc17

osspc17:9999 Linux osspc17 osspc17 9999 9000 4 MASTER     #osspc17 is promoted to MASTER
osspc16:9999 Linux osspc16 osspc16 9999 9000 10 SHUTDOWN  #osspc16 is stopped
osspc18:9999 Linux osspc18 osspc18 9999 9000 7 STANDBY    #osspc18 runs as STANDBY
        

Start Pgpool-II ( osspc16 ) which we have stopped again, and vertify that osspc16 runs as a standby.

[root@osspc16 ~]# systemctl start pgpool.service
[root@osspc16 ~]# pcp_watchdog_info -h 133.137.174.153 -p 9898 -U postgres
Password:
3 YES osspc17:9999 Linux osspc17 osspc17

osspc17:9999 Linux osspc17 osspc17 9999 9000 4 MASTER
osspc16:9999 Linux osspc16 osspc16 9999 9000 7 STANDBY
osspc18:9999 Linux osspc18 osspc18 9999 9000 7 STANDBY
        

7.3.8.3. Failover

First, use psql to connect to PostgreSQL via virtual IP, and verify the backend informations.

# psql -h 133.137.174.153 -p 9999 -U pgpool postgres

postgres=> show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | osspc19  | 5432 | up     | 0.500000  | primary | 0          | true              | 0
 1       | osspc20  | 5432 | up     | 0.500000  | standby | 0          | false             | 0
        

Next, stop primary PostgreSQL server osspc19 , and verify automatic failover.

$ pg_ctl -D /var/lib/pgsql/9.6/data -m immediate stop
        

After stopping PostgreSQL on osspc19 , failover occurs and PostgreSQL on osspc20 becomes new primary DB.

# psql -h 133.137.174.153 -p 9999 -U pgpool postgres

postgres=> show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | osspc19  | 5432 | down   | 0.500000  | standby | 0          | false             | 0
 1       | osspc20  | 5432 | up     | 0.500000  | primary | 0          | true              | 0
        

7.3.8.4. Online Recovery

Here, we use Pgpool-II online recovery functionality to restore osspc19 (old primary server) as a standby. Before restoring the old primary server, please ensure that recovery_1st_stage and pgpool_remote_start scripts exist in database cluster directory of current primary server osspc20 .

# pcp_recovery_node -h 133.137.174.153 -p 9898 -U postgres -n 0
        

Then verify that osspc19 is started as a standby.

# psql -h 133.137.174.153 -p 9999 -U pgpool postgres

postgres=> show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | osspc19  | 5432 | up     | 0.500000  | standby | 0          | false             | 0
 1       | osspc20  | 5432 | up     | 0.500000  | primary | 0          | true              | 0