Pgpool-II + Watchdog Setup Example
pgpool-II 4.1.0 Documentation | |||
---|---|---|---|
Prev | Up | Chapter 8. Configuration Examples | Next |
This section shows an example of streaming replication configuration using Pgpool-II . In this example, we use 3 Pgpool-II servers to manage PostgreSQL servers to create a robust cluster system and avoid the single point of failure or split brain.
PostgreSQL 11 is used in this configuration example, all scripts have also been tested with PostgreSQL 12.
8.3.1. Requirements
We assume that all the Pgpool-II servers and the PostgreSQL servers are in the same subnet.
8.3.2. Cluster System Configuration
We use 3 servers with CentOS 7.4. Let these servers be server1 , server2 , server3 . We install PostgreSQL and Pgpool-II on each server.
Note: The roles of Active , Standy , Primary , Standby are not fixed and may be changed by further operations.
Table 8-2. Hostname and IP address
Hostname | IP Address | Virtual IP |
---|---|---|
server1 | 192.168.137.101 | 192.168.137.150 |
server2 | 192.168.137.102 | |
server3 | 192.168.137.103 |
Table 8-3. PostgreSQL version and Configuration
Item | Value | Detail |
---|---|---|
PostgreSQL Version | 11.1 | - |
port | 5432 | - |
$PGDATA | /var/lib/pgsql/11/data | - |
Archive mode | on | /var/lib/pgsql/archivedir |
Replication Slots | Enable | - |
Start automatically | Disable | - |
Table 8-4. Pgpool-II version and Configuration
Item | Value | Detail |
---|---|---|
Pgpool-II Version | 4.1.0 | - |
port | 9999 | Pgpool-II accepts connections |
9898 | PCP process accepts connections | |
9000 | watchdog accepts connections | |
9694 | UDP port for receiving Watchdog's heartbeat signal | |
Config file | /etc/pgpool-II/pgpool.conf | Pgpool-II config file |
Pgpool-II start user | postgres (Pgpool-II 4.1 or later) | Pgpool-II 4.0 or before, the default startup user is root |
Running mode | streaming replication mode | - |
Watchdog | on | Life check method: heartbeat |
Start automatically | Disable | - |
8.3.3. Installation
In this example, we install Pgpool-II 4.1 and PostgreSQL 11.1 by using RPM packages.
Install PostgreSQL by using PostgreSQL YUM repository.
# yum install https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-7-x86_64/pgdg-centos11-11-2.noarch.rpm # yum install postgresql11 postgresql11-libs postgresql11-devel postgresql11-server
Install Pgpool-II by using Pgpool-II YUM repository.
# yum install http://www.pgpool.net/yum/rpms/4.1/redhat/rhel-7-x86_64/pgpool-II-release-4.1-1.noarch.rpm # yum install pgpool-II-pg11-*
8.3.4. Before Starting
Before you start the configuration process, please check the following prerequisites.
-
Set up PostgreSQL streaming replication on the primary server. In this example, we use WAL archiving.
First, we create the directory /var/lib/pgsql/archivedir to store WAL segments on all servers. In this example, only Primary node archives WAL locally.
[all servers]# su - postgres [all servers]$ mkdir /var/lib/pgsql/archivedir
Then we edit the configuration file $PGDATA/postgresql.conf on server1 (primary) as follows. Enable wal_log_hints to use pg_rewind . Since the Primary may become a Standby later, we set hot_standby = on .
listen_addresses = '*' archive_mode = on archive_command = 'cp "%p" "/var/lib/pgsql/archivedir/%f"' max_wal_senders = 10 max_replication_slots = 10 wal_level = replica hot_standby = on wal_log_hints = on
We use the online recovery functionality of Pgpool-II to setup standby server after the primary server is started.
-
Because of the security reasons, we create a user repl solely used for replication purpose, and a user pgpool for streaming replication delay check and health check of Pgpool-II .
Table 8-5. Users
User Name Passowrd Detail repl repl PostgreSQL replication user pgpool pgpool Pgpool-II health check and replication delay check user postgres postgres User running online recovery [server1]# psql -U postgres -p 5432 postgres=# SET password_encryption = 'scram-sha-256'; postgres=# CREATE ROLE pgpool WITH LOGIN; postgres=# CREATE ROLE repl WITH REPLICATION LOGIN; postgres=# \password pgpool postgres=# \password repl postgres=# \password postgres
If you want to show "replication_state" and "replication_sync_state" column in SHOW POOL NODES command result, role pgpool needs to be PostgreSQL super user or or in pg_monitor group ( Pgpool-II 4.1 or later). Grant pg_monitor to pgpool :
GRANT pg_monitor TO pgpool;
Note: If you plan to use detach_false_primary ( Pgpool-II 4.0 or later), role "pgpool" needs to be PostgreSQL super user or or in "pg_monitor" group to use this feature.
Assuming that all the Pgpool-II servers and the PostgreSQL servers are in the same subnet and edit pg_hba.conf to enable scram-sha-256 authentication method.
host all all samenet scram-sha-256 host replication all samenet scram-sha-256
-
To use the automated failover and online recovery of Pgpool-II , the settings that allow passwordless SSH to all backend servers between Pgpool-II execution user (default root user) and postgres user and between postgres user and postgres user are necessary. Execute the following command on all servers to set up passwordless SSH . The generated key file name is id_rsa_pgpool .
[all servers]# cd ~/.ssh [all servers]# ssh-keygen -t rsa -f id_rsa_pgpool [all servers]# ssh-copy-id -i id_rsa_pgpool.pub postgres@server1 [all servers]# ssh-copy-id -i id_rsa_pgpool.pub postgres@server2 [all servers]# ssh-copy-id -i id_rsa_pgpool.pub postgres@server3 [all servers]# su - postgres [all servers]$ cd ~/.ssh [all servers]$ ssh-keygen -t rsa -f id_rsa_pgpool [all servers]$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server1 [all servers]$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server2 [all servers]$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server3
After setting, use ssh postgres@serverX -i ~/.ssh/id_rsa_pgpool command to make sure that you can log in without entering a password. Edit /etc/ssh/sshd_config if necessary and restart sshd.
-
To allow repl user without specifying password for streaming replication and online recovery, and execute pg_rewind using postgres , we create the .pgpass file in postgres user's home directory and change the permisson to 600 on each PostgreSQL server.
[all servers]# su - postgres [all servers]$ vi /var/lib/pgsql/.pgpass server1:5432:replication:repl:<repl user password> server2:5432:replication:repl:<repl user passowrd> server3:5432:replication:repl:<repl user passowrd> server1:5432:postgres:postgres:<postgres user passowrd> server2:5432:postgres:postgres:<postgres user passowrd> server3:5432:postgres:postgres:<postgres user passowrd> [all servers]$ chmod 600 /var/lib/pgsql/.pgpass
-
When connect to Pgpool-II and PostgreSQL servers, the target port must be accessible by enabling firewall management softwares. Following is an example for CentOS/RHEL7 .
[all servers]# firewall-cmd --permanent --zone=public --add-service=postgresql [all servers]# firewall-cmd --permanent --zone=public --add-port=9999/tcp --add-port=9898/tcp --add-port=9000/tcp --add-port=9694/udp [all servers]# firewall-cmd --reload
8.3.5. Pgpool-II Configuration
8.3.5.1. Common Settings
Here are the common settings on server1 , server2 and server3 .
When installing Pgpool-II from RPM, all the Pgpool-II configuration files are in /etc/pgpool-II . In this example, we copy the sample configuration file for streaming replicaton mode.
# cp -p /etc/pgpool-II/pgpool.conf.sample-stream /etc/pgpool-II/pgpool.conf
To allow Pgpool-II to accept all incoming connections, we set listen_addresses = '*' .
listen_addresses = '*'
Specify replication delay check user and password. In this example, we leave sr_check_user empty, and create the entry in pool_passwd . From Pgpool-II 4.0, if these parameters are left blank, Pgpool-II will first try to get the password for that specific user from sr_check_password file before using the empty password.
sr_check_user = 'pgpool' sr_check_password = ''
Enable health check so that Pgpool-II performs failover. Also, if the network is unstable, the health check fails even though the backend is running properly, failover or degenerate operation may occur. In order to prevent such incorrect detection of health check, we set health_check_max_retries = 3 . Specify health_check_user and health_check_password in the same way like sr_check_user and sr_check_password .
health_check_period = 5 # Health check period # Disabled (0) by default health_check_timeout = 30 # Health check timeout # 0 means no timeout health_check_user = 'pgpool' health_check_password = '' health_check_max_retries = 3
Specify the PostgreSQL backend informations. Multiple backends can be specified by adding a number at the end of the parameter name.
# - Backend Connection Settings - backend_hostname0 = 'server1' # Host name or IP address to connect to for backend 0 backend_port0 = 5432 # Port number for backend 0 backend_weight0 = 1 # Weight for backend 0 (only in load balancing mode) backend_data_directory0 = '/var/lib/pgsql/11/data' # Data directory for backend 0 backend_flag0 = 'ALLOW_TO_FAILOVER' # Controls various backend behavior # ALLOW_TO_FAILOVER or DISALLOW_TO_FAILOVER backend_hostname1 = 'server2' backend_port1 = 5432 backend_weight1 = 1 backend_data_directory1 = '/var/lib/pgsql/11/data' backend_flag1 = 'ALLOW_TO_FAILOVER' backend_hostname2 = 'server3' backend_port2 = 5432 backend_weight2 = 1 backend_data_directory2 = '/var/lib/pgsql/11/data' backend_flag2 = 'ALLOW_TO_FAILOVER'
To show "replication_state" and "replication_sync_state" column in SHOW POOL NODES command result, backend_application_name parameter is required. Here we specify each backend's hostname in these parameters. ( Pgpool-II 4.1 or later)
... backend_application_name0 = 'server1' ... backend_application_name1 = 'server2' ... backend_application_name2 = 'server3'
8.3.5.2. Failover configuration
Specify failover.sh script to be executed after failover in failover_command parameter. If we use 3 PostgreSQL servers, we need to specify follow_master_command to run after failover on the primary node failover. In case of two PostgreSQL servers, follow_master_command setting is not necessary.
Pgpool-II replaces the following special characters with the backend specific information while executing the scripts. See failover_command for more details about each character.
failover_command = '/etc/pgpool-II/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S' follow_master_command = '/etc/pgpool-II/follow_master.sh %d %h %p %D %m %M %H %P %r %R'
Note: %N and %S are added in Pgpool-II 4.1. Please note that these characters cannot be specified if using Pgpool-II 4.0 or earlier.
Create /etc/pgpool-II/failover.sh , and add execute permission.
# vi /etc/pgpool-II/failover.sh # vi /etc/pgpool-II/follow_master.sh # chmod +x /etc/pgpool-II/{failover.sh,follow_master.sh}
-
/etc/pgpool-II/failover.sh
#!/bin/bash # This script is run by failover_command. set -o xtrace exec > >(logger -i -p local1.info) 2>&1 # Special values: # %d = failed node id # %h = failed node hostname # %p = failed node port number # %D = failed node database cluster path # %m = new master node id # %H = new master node hostname # %M = old master node id # %P = old primary node id # %r = new master port number # %R = new master database cluster path # %N = old primary node hostname # %S = old primary node port number # %% = '%' character FAILED_NODE_ID="$1" FAILED_NODE_HOST="$2" FAILED_NODE_PORT="$3" FAILED_NODE_PGDATA="$4" NEW_MASTER_NODE_ID="$5" NEW_MASTER_NODE_HOST="$6" OLD_MASTER_NODE_ID="$7" OLD_PRIMARY_NODE_ID="$8" NEW_MASTER_NODE_PORT="$9" NEW_MASTER_NODE_PGDATA="${10}" OLD_PRIMARY_NODE_HOST="${11}" OLD_PRIMARY_NODE_PORT="${12}" PGHOME=/usr/pgsql-11 logger -i -p local1.info failover.sh: start: failed_node_id=$FAILED_NODE_ID old_primary_node_id=$OLD_PRIMARY_NODE_ID failed_host=$FAILED_NODE_HOST new_master_host=$NEW_MASTER_NODE_HOST ## If there's no master node anymore, skip failover. if [ $NEW_MASTER_NODE_ID -lt 0 ]; then logger -i -p local1.info failover.sh: All nodes are down. Skipping failover. exit 0 fi ## Test passwrodless SSH ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls /tmp > /dev/null if [ $? -ne 0 ]; then logger -i -p local1.info failover.sh: passwrodless SSH to postgres@${NEW_MASTER_NODE_HOST} failed. Please setup passwrodless SSH. exit 1 fi ## If Standby node is down, skip failover. if [ $FAILED_NODE_ID -ne $OLD_PRIMARY_NODE_ID ]; then logger -i -p local1.info failover.sh: Standby node is down. Skipping failover. ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$OLD_PRIMARY_NODE_HOST -i ~/.ssh/id_rsa_pgpool " ${PGHOME}/bin/psql -p $OLD_PRIMARY_NODE_PORT -c \"SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')\" " if [ $? -ne 0 ]; then logger -i -p local1.error failover.sh: drop replication slot "${FAILED_NODE_HOST}" failed exit 1 fi exit 0 fi ## Promote Standby node. logger -i -p local1.info failover.sh: Primary node is down, promote standby node ${NEW_MASTER_NODE_HOST}. ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ${PGHOME}/bin/pg_ctl -D ${NEW_MASTER_NODE_PGDATA} -w promote if [ $? -ne 0 ]; then logger -i -p local1.error failover.sh: new_master_host=$NEW_MASTER_NODE_HOST promote failed exit 1 fi logger -i -p local1.info failover.sh: end: new_master_node_id=$NEW_MASTER_NODE_ID started as the primary node exit 0
-
/etc/pgpool-II/follow_master.sh
#!/bin/bash # This script is run after failover_command to synchronize the Standby with the new Primary. # First try pg_rewind. If pg_rewind failed, use pg_basebackup. set -o xtrace exec > >(logger -i -p local1.info) 2>&1 # Special values: # %d = failed node id # %h = failed node hostname # %p = failed node port number # %D = failed node database cluster path # %m = new master node id # %H = new master node hostname # %M = old master node id # %P = old primary node id # %r = new master port number # %R = new master database cluster path # %N = old primary node hostname # %S = old primary node port number # %% = '%' character FAILED_NODE_ID="$1" FAILED_NODE_HOST="$2" FAILED_NODE_PORT="$3" FAILED_NODE_PGDATA="$4" NEW_MASTER_NODE_ID="$5" OLD_MASTER_NODE_ID="$6" NEW_MASTER_NODE_HOST="$7" OLD_PRIMARY_NODE_ID="$8" NEW_MASTER_NODE_PORT="$9" NEW_MASTER_NODE_PGDATA="${10}" PGHOME=/usr/pgsql-11 ARCHIVEDIR=/var/lib/pgsql/archivedir REPLUSER=repl PCP_USER=pgpool PGPOOL_PATH=/usr/bin PCP_PORT=9898 logger -i -p local1.info follow_master.sh: start: Standby node ${FAILED_NODE_ID} ## Test passwrodless SSH ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls /tmp > /dev/null if [ $? -ne 0 ]; then logger -i -p local1.info follow_master.sh: passwrodless SSH to postgres@${NEW_MASTER_NODE_HOST} failed. Please setup passwrodless SSH. exit 1 fi ## Get PostgreSQL major version PGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'` if [ $PGVERSION -ge 12 ]; then RECOVERYCONF=${FAILED_NODE_PGDATA}/myrecovery.conf else RECOVERYCONF=${FAILED_NODE_PGDATA}/recovery.conf fi ## Check the status of Standby ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ${PGHOME}/bin/pg_ctl -w -D ${FAILED_NODE_PGDATA} status ## If Standby is running, synchronize it with the new Primary. if [ $? -eq 0 ]; then logger -i -p local1.info follow_master.sh: pg_rewind for $FAILED_NODE_ID # Create replication slot "${FAILED_NODE_HOST}" ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool " ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_create_physical_replication_slot('${FAILED_NODE_HOST}');\" " ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool " set -o errexit ${PGHOME}/bin/pg_ctl -w -m f -D ${FAILED_NODE_PGDATA} stop cat > ${RECOVERYCONF} << EOT primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass''' recovery_target_timeline = 'latest' restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p' primary_slot_name = '${FAILED_NODE_HOST}' EOT if [ ${PGVERSION} -ge 12 ]; then touch ${FAILED_NODE_PGDATA}/standby.signal else echo \"standby_mode = 'on'\" >> ${RECOVERYCONF} fi ${PGHOME}/bin/pg_rewind -D ${FAILED_NODE_PGDATA} --source-server=\"user=postgres host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT}\" " if [ $? -ne 0 ]; then logger -i -p local1.error follow_master.sh: end: pg_rewind failed. Try pg_basebackup. ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool " set -o errexit # Execute pg_basebackup rm -rf ${FAILED_NODE_PGDATA} rm -rf ${ARCHIVEDIR}/* ${PGHOME}/bin/pg_basebackup -h ${NEW_MASTER_NODE_HOST} -U $REPLUSER -p ${NEW_MASTER_NODE_PORT} -D ${FAILED_NODE_PGDATA} -X stream if [ ${PGVERSION} -ge 12 ]; then sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \ -e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${FAILED_NODE_PGDATA}/postgresql.conf fi cat > ${RECOVERYCONF} << EOT primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass''' recovery_target_timeline = 'latest' restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p' primary_slot_name = '${FAILED_NODE_HOST}' EOT if [ ${PGVERSION} -ge 12 ]; then touch ${FAILED_NODE_PGDATA}/standby.signal else echo \"standby_mode = 'on'\" >> ${RECOVERYCONF} fi " if [ $? -ne 0 ]; then # drop replication slot ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool " ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')\" " logger -i -p local1.error follow_master.sh: end: pg_basebackup failed exit 1 fi fi # start Standby node on ${FAILED_NODE_HOST} ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool $PGHOME/bin/pg_ctl -l /dev/null -w -D ${FAILED_NODE_PGDATA} start # If start Standby successfully, attach this node if [ $? -eq 0 ]; then # Run pcp_attact_node to attach Standby node to Pgpool-II. ${PGPOOL_PATH}/pcp_attach_node -w -h localhost -U $PCP_USER -p ${PCP_PORT} -n ${FAILED_NODE_ID} if [ $? -ne 0 ]; then logger -i -p local1.error follow_master.sh: end: pcp_attach_node failed exit 1 fi # If start Standby failed, drop replication slot "${FAILED_NODE_HOST}" else ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool \ ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c "SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')" logger -i -p local1.error follow_master.sh: end: follow master command failed exit 1 fi else logger -i -p local1.info follow_master.sh: failed_nod_id=${FAILED_NODE_ID} is not running. skipping follow master command exit 0 fi logger -i -p local1.info follow_master.sh: end: follow master command complete exit 0
8.3.5.3. Pgpool-II Online Recovery Configurations
Next, in order to perform online recovery with Pgpool-II we specify the PostgreSQL user name and online recovery command recovery_1st_stage . Because Superuser privilege in PostgreSQL is required for performing online recovery, we specify postgres user in recovery_user . Then, we create recovery_1st_stage and pgpool_remote_start in database cluster directory of PostgreSQL primary server (server1), and add execute permission.
recovery_user = 'postgres' # Online recovery user recovery_password = '' # Online recovery password recovery_1st_stage_command = 'recovery_1st_stage'
[server1]# su - postgres [server1]$ vi /var/lib/pgsql/11/data/recovery_1st_stage [server1]$ vi /var/lib/pgsql/11/data/pgpool_remote_start [server1]$ chmod +x /var/lib/pgsql/11/data/{recovery_1st_stage,pgpool_remote_start}
-
/var/lib/pgsql/11/data/recovery_1st_stage
#!/bin/bash # This script is executed by "recovery_1st_stage" to recovery a Standby node. set -o xtrace exec > >(logger -i -p local1.info) 2>&1 PRIMARY_NODE_PGDATA="$1" DEST_NODE_HOST="$2" DEST_NODE_PGDATA="$3" PRIMARY_NODE_PORT="$4" DEST_NODE_ID="$5" DEST_NODE_PORT="$6" PRIMARY_NODE_HOST=$(hostname) PGHOME=/usr/pgsql-11 ARCHIVEDIR=/var/lib/pgsql/archivedir REPLUSER=repl logger -i -p local1.info recovery_1st_stage: start: pg_basebackup for Standby node $DEST_NODE_ID ## Test passwrodless SSH ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${DEST_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls /tmp > /dev/null if [ $? -ne 0 ]; then logger -i -p local1.info recovery_1st_stage: passwrodless SSH to postgres@${DEST_NODE_HOST} failed. Please setup passwrodless SSH. exit 1 fi ## Get PostgreSQL major version PGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'` if [ $PGVERSION -ge 12 ]; then RECOVERYCONF=${DEST_NODE_PGDATA}/myrecovery.conf else RECOVERYCONF=${DEST_NODE_PGDATA}/recovery.conf fi ## Create replication slot "${DEST_NODE_HOST}" ${PGHOME}/bin/psql -p ${PRIMARY_NODE_PORT} << EOQ SELECT pg_create_physical_replication_slot('${DEST_NODE_HOST}'); EOQ ## Execute pg_basebackup to recovery Standby node ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$DEST_NODE_HOST -i ~/.ssh/id_rsa_pgpool " set -o errexit rm -rf $DEST_NODE_PGDATA rm -rf $ARCHIVEDIR/* ${PGHOME}/bin/pg_basebackup -h $PRIMARY_NODE_HOST -U $REPLUSER -p $PRIMARY_NODE_PORT -D $DEST_NODE_PGDATA -X stream if [ ${PGVERSION} -ge 12 ]; then sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \ -e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${DEST_NODE_PGDATA}/postgresql.conf fi cat > ${RECOVERYCONF} << EOT primary_conninfo = 'host=${PRIMARY_NODE_HOST} port=${PRIMARY_NODE_PORT} user=${REPLUSER} application_name=${DEST_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass''' recovery_target_timeline = 'latest' restore_command = 'scp ${PRIMARY_NODE_HOST}:${ARCHIVEDIR}/%f %p' primary_slot_name = '${DEST_NODE_HOST}' EOT if [ ${PGVERSION} -ge 12 ]; then touch ${DEST_NODE_PGDATA}/standby.signal else echo \"standby_mode = 'on'\" >> ${RECOVERYCONF} fi sed -i \"s/#*port = .*/port = ${DEST_NODE_PORT}/\" ${DEST_NODE_PGDATA}/postgresql.conf " if [ $? -ne 0 ]; then ${PGHOME}/bin/psql -p ${PRIMARY_NODE_PORT} << EOQ SELECT pg_drop_replication_slot('${DEST_NODE_HOST}'); EOQ logger -i -p local1.error recovery_1st_stage: end: pg_basebackup failed. online recovery failed exit 1 fi logger -i -p local1.info recovery_1st_stage: end: recovery_1st_stage complete exit 0
-
/var/lib/pgsql/11/data/pgpool_remote_start
#!/bin/bash # This script is run after recovery_1st_stage to start Standby node. set -o xtrace exec > >(logger -i -p local1.info) 2>&1 PGHOME=/usr/pgsql-11 DEST_NODE_HOST="$1" DEST_NODE_PGDATA="$2" logger -i -p local1.info pgpool_remote_start: start: remote start Standby node $DEST_NODE_HOST ## Test passwrodless SSH ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${DEST_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls /tmp > /dev/null if [ $? -ne 0 ]; then logger -i -p local1.info pgpool_remote_start: passwrodless SSH to postgres@${DEST_NODE_HOST} failed. Please setup passwrodless SSH. exit 1 fi ## Start Standby node ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$DEST_NODE_HOST -i ~/.ssh/id_rsa_pgpool " $PGHOME/bin/pg_ctl -l /dev/null -w -D $DEST_NODE_PGDATA start " if [ $? -ne 0 ]; then logger -i -p local1.error pgpool_remote_start: $DEST_NODE_HOST PostgreSQL start failed. exit 1 fi logger -i -p local1.info pgpool_remote_start: end: $DEST_NODE_HOST PostgreSQL started successfully. exit 0
In order to use the online recovery functionality, the functions of
pgpool_recovery
,
pgpool_remote_start
,
pgpool_switch_xlog
are required, so we need install
pgpool_recovery
on template1 of
PostgreSQL
server
server1
.
[server1]# su - postgres [server1]$ psql template1 -c "CREATE EXTENSION pgpool_recovery"
8.3.5.4. Client Authentication Configuration
Because in the section Before Starting , we already set PostgreSQL authentication method to scram-sha-256 , it is necessary to set a client authentication by Pgpool-II to connect to backend nodes. When installing with RPM, the Pgpool-II configuration file pool_hba.conf is in /etc/pgpool-II . By default, pool_hba authentication is disabled, set enable_pool_hba = on to enable it.
enable_pool_hba = on
The format of pool_hba.conf file follows very closely PostgreSQL's pg_hba.conf format. Set pgpool and postgres user's authentication method to scram-sha-256 .
host all pgpool 0.0.0.0/0 scram-sha-256 host all postgres 0.0.0.0/0 scram-sha-256
Note: Please note that in Pgpool-II 4.0 only AES encrypted password or clear text password can be specified in health_check_password , sr_check_password , wd_lifecheck_password , recovery_password in pgpool.conf .
The default password file name for authentication is pool_passwd . To use scram-sha-256 authentication, the decryption key to decrypt the passwords is required. We create the .pgpoolkey file in root user's home directory.
[all servers]# echo 'some string' > ~/.pgpoolkey [all servers]# chmod 600 ~/.pgpoolkey
Execute command pg_enc -m -k /path/to/.pgpoolkey -u username -p to regist user name and AES encrypted password in file pool_passwd . If pool_passwd doesn't exist yet, it will be created in the same directory as pgpool.conf .
[all servers]# pg_enc -m -k /root/.pgpoolkey -u pgpool -p db password: [pgpool user's password] [all servers]# pg_enc -m -k /root/.pgpoolkey -u postgres -p db password: [postgres user's passowrd] # cat /etc/pgpool-II/pool_passwd pgpool:AESheq2ZMZjynddMWk5sKP/Rw== postgres:AESHs/pWL5rtXy2IwuzroHfqg==
8.3.5.5. Watchdog Configuration
Enable watchdog functionality on server1 , server2 , server3 .
use_watchdog = on
Specify virtual IP address that accepts connections from clients on server1 , server2 , server3 . Ensure that the IP address set to virtual IP isn't used yet.
delegate_IP = '192.168.137.150'
To bring up/down the virtual IP and send the ARP requests, we set if_up_cmd , if_down_cmd and arping_cmd . The network interface used in this example is "enp0s8". Since root privilege is required to execute if_up/down_cmd or arping_cmd command, use setuid on these command or allow Pgpool-II startup user, postgres user (Pgpool-II 4.1 or later) to run sudo command without a password. If installed from RPM, the postgres user has been configured to run ip/arping via sudo without a password.
if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev enp0s8 label enp0s8:0' if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev enp0s8' arping_cmd = '/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I enp0s8'
Set if_cmd_path and arping_path according to the command path. If if_up/down_cmd or arping_cmd starts with "/", these parameters will be ignored.
if_cmd_path = '/sbin' arping_path = '/usr/sbin'
Specify the hostname and port number of each Pgpool-II server.
-
server1
wd_hostname = 'server1' wd_port = 9000
-
server2
wd_hostname = 'server2' wd_port = 9000
-
server3
wd_hostname = 'server3' wd_port = 9000
Specify the hostname, Pgpool-II port number, and watchdog port number of monitored Pgpool-II servers on each Pgpool-II server.
-
server1
# - Other pgpool Connection Settings - other_pgpool_hostname0 = 'server2' # Host name or IP address to connect to for other pgpool 0 # (change requires restart) other_pgpool_port0 = 9999 # Port number for other pgpool 0 # (change requires restart) other_wd_port0 = 9000 # Port number for other watchdog 0 # (change requires restart) other_pgpool_hostname1 = 'server3' other_pgpool_port1 = 9999 other_wd_port1 = 9000
-
server2
# - Other pgpool Connection Settings - other_pgpool_hostname0 = 'server1' # Host name or IP address to connect to for other pgpool 0 # (change requires restart) other_pgpool_port0 = 9999 # Port number for other pgpool 0 # (change requires restart) other_wd_port0 = 9000 # Port number for other watchdog 0 # (change requires restart) other_pgpool_hostname1 = 'server3' other_pgpool_port1 = 9999 other_wd_port1 = 9000
-
server3
# - Other pgpool Connection Settings - other_pgpool_hostname0 = 'server1' # Host name or IP address to connect to for other pgpool 0 # (change requires restart) other_pgpool_port0 = 9999 # Port number for other pgpool 0 # (change requires restart) other_wd_port0 = 9000 # Port number for other watchdog 0 # (change requires restart) other_pgpool_hostname1 = 'server2' other_pgpool_port1 = 9999 other_wd_port1 = 9000
Specify the hostname and port number of destination for sending heartbeat signal on server1 , server2 , server3 .
-
server1
heartbeat_destination0 = 'server2' # Host name or IP address of destination 0 # for sending heartbeat signal. # (change requires restart) heartbeat_destination_port0 = 9694 # Port number of destination 0 for sending # heartbeat signal. Usually this is the # same as wd_heartbeat_port. # (change requires restart) heartbeat_device0 = '' # Name of NIC device (such like 'eth0') # used for sending/receiving heartbeat # signal to/from destination 0. # This works only when this is not empty # and pgpool has root privilege. # (change requires restart) heartbeat_destination1 = 'server3' heartbeat_destination_port1 = 9694 heartbeat_device1 = ''
-
server2
heartbeat_destination0 = 'server1' # Host name or IP address of destination 0 # for sending heartbeat signal. # (change requires restart) heartbeat_destination_port0 = 9694 # Port number of destination 0 for sending # heartbeat signal. Usually this is the # same as wd_heartbeat_port. # (change requires restart) heartbeat_device0 = '' # Name of NIC device (such like 'eth0') # used for sending/receiving heartbeat # signal to/from destination 0. # This works only when this is not empty # and pgpool has root privilege. # (change requires restart) heartbeat_destination1 = 'server3' heartbeat_destination_port1 = 9694 heartbeat_device1 = ''
-
server3
heartbeat_destination0 = 'server1' # Host name or IP address of destination 0 # for sending heartbeat signal. # (change requires restart) heartbeat_destination_port0 = 9694 # Port number of destination 0 for sending # heartbeat signal. Usually this is the # same as wd_heartbeat_port. # (change requires restart) heartbeat_device0 = '' # Name of NIC device (such like 'eth0') # used for sending/receiving heartbeat # signal to/from destination 0. # This works only when this is not empty # and pgpool has root privilege. # (change requires restart) heartbeat_destination1 = 'server2' heartbeat_destination_port1 = 9694 heartbeat_device1 = ''
8.3.5.6. /etc/sysconfig/pgpool Configuration
If you want to ignore the pgpool_status file at startup of Pgpool-II , add "- D" to the start option OPTS to /etc/sysconfig/pgpool .
[all servers]# vi /etc/sysconfig/pgpool ... OPTS=" -D -n"
8.3.5.7. Logging
In the example, we output Pgpool-II 's log to syslog .
log_destination = 'syslog' # Where to log # Valid values are combinations of stderr, # and syslog. Default to stderr. syslog_facility = 'LOCAL1' # Syslog local facility. Default to LOCAL0
Create Pgpool-II log file.
[all servers]# mkdir /var/log/pgpool-II [all servers]# touch /var/log/pgpool-II/pgpool.log
Edit config file of syslog /etc/rsyslog.conf .
[all servers]# vi /etc/rsyslog.conf ... *.info;mail.none;authpriv.none;cron.none;LOCAL1.none /var/log/messages LOCAL1.* /var/log/pgpool-II/pgpool.log
Setting logrotate same as /var/log/messages .
[all servers]# vi /etc/logrotate.d/syslog ... /var/log/messages /var/log/pgpool-II/pgpool.log /var/log/secure
Restart rsyslog service.
[all servers]# systemctl restart rsyslog
8.3.5.8. PCP Command Configuration
Since user authentication is required to use the PCP command, specify user name and md5 encrypted password in pcp.conf . Here we create the encrypted password for pgpool user, and add " username:encrypted password " in /etc/pgpool-II/pcp.conf .
[all servers]# echo 'pgpool:'`pg_md5 PCP passowrd` >> /etc/pgpool-II/pcp.conf
8.3.5.9. .pcppass
Since follow_master_command script has to execute PCP command without entering the password, we create .pcppass in the home directory of Pgpool-II startup user (root user).
[all servers]# echo 'localhost:9898:pgpool:pgpool' > ~/.pcppass [all servers]# chmod 600 ~/.pcppass
The settings of Pgpool-II is completed.
8.3.6. Starting/Stopping Pgpool-II
Next we start Pgpool-II . Before starting Pgpool-II , please start PostgreSQL servers first. Also, when stopping PostgreSQL , it is necessary to stop Pgpool-II first.
-
Starting Pgpool-II
In section Before Starting , we already set the auto-start of Pgpool-II . To start Pgpool-II , restart the whole system or execute the following command.
# systemctl start pgpool.service
-
Stopping Pgpool-II
# systemctl stop pgpool.service
8.3.7. How to use
Let's start to use Pgpool-II . First, let's start Pgpool-II on server1 , server2 , server3 by using the following command.
# systemctl start pgpool.service
8.3.7.1. Set up PostgreSQL standby server
First, we should set up PostgreSQL standby server by using Pgpool-II online recovery functionality. Ensure that recovery_1st_stage and pgpool_remote_start scripts used by pcp_recovery_node command are in database cluster directory of PostgreSQL primary server ( server1 ).
# pcp_recovery_node -h 192.168.137.150 -p 9898 -U pgpool -n 1 Password: pcp_recovery_node -- Command Successful # pcp_recovery_node -h 192.168.137.150 -p 9898 -U pgpool -n 2 Password: pcp_recovery_node -- Command Successful
After executing pcp_recovery_node command, vertify that server2 and server3 are started as PostgreSQL standby server.
# psql -h 192.168.137.150 -p 9999 -U pgpool postgres -c "show pool_nodes" Password for user pgpool node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 | server1 | 5432 | up | 0.333333 | primary | 0 | false | 0 | | | 2019-08-06 11:13:17 1 | server2 | 5432 | up | 0.333333 | standby | 0 | true | 0 | streaming | async | 2019-08-06 11:13:25 2 | server3 | 5432 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2019-08-06 11:14:20 (3 rows)
8.3.7.2. Switching active/standby watchdog
Confirm the watchdog status by using pcp_watchdog_info . The Pgpool-II server which is started first run as MASTER .
# pcp_watchdog_info -h 192.168.137.150 -p 9898 -U pgpool Password: 3 YES server1:9999 Linux server1 server1 server1:9999 Linux server1 server1 9999 9000 4 MASTER #The Pgpool-II server started first becames "MASTER". server2:9999 Linux server2 server2 9999 9000 7 STANDBY #run as standby server3:9999 Linux server3 server3 9999 9000 7 STANDBY #run as standby
Stop active server server1 , then server2 or server3 will be promoted to active server. To stop server1 , we can stop Pgpool-II service or shutdown the whole system. Here, we stop Pgpool-II service.
[server1]# systemctl stop pgpool.service # pcp_watchdog_info -p 9898 -h 192.168.137.150 -U pgpool Password: 3 YES server2:9999 Linux server2 server2 server2:9999 Linux server2 server2 9999 9000 4 MASTER #server2 is promoted to MASTER server1:9999 Linux server1 server1 9999 9000 10 SHUTDOWN #server1 is stopped server3:9999 Linux server3 server3 9999 9000 7 STANDBY #server3 runs as STANDBY
Start Pgpool-II ( server1 ) which we have stopped again, and vertify that server1 runs as a standby.
[server1]# systemctl start pgpool.service [server1]# pcp_watchdog_info -p 9898 -h 192.168.137.150 -U pgpool Password: 3 YES server2:9999 Linux server2 server2 server2:9999 Linux server2 server2 9999 9000 4 MASTER server1:9999 Linux server1 server1 9999 9000 7 STANDBY server3:9999 Linux server3 server3 9999 9000 7 STANDBY
8.3.7.3. Failover
First, use psql to connect to PostgreSQL via virtual IP, and verify the backend informations.
# psql -h 192.168.137.150 -p 9999 -U pgpool postgres -c "show pool_nodes" Password for user pgpool: node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 | server1 | 5432 | up | 0.333333 | primary | 0 | false | 0 | | | 2019-08-06 11:13:17 1 | server2 | 5432 | up | 0.333333 | standby | 0 | true | 0 | streaming | async | 2019-08-06 11:13:25 2 | server3 | 5432 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2019-08-06 11:14:20 (3 rows)
Next, stop primary PostgreSQL server server1 , and verify automatic failover.
[server1]$ pg_ctl -D /var/lib/pgsql/11/data -m immediate stop
After stopping PostgreSQL on server1 , failover occurs and PostgreSQL on server2 becomes new primary DB.
# psql -h 192.168.137.150 -p 9999 -U pgpool postgres -c "show pool_nodes" Password for user pgpool: node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 | server1 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2019-08-06 11:36:03 1 | server2 | 5432 | up | 0.333333 | primary | 0 | true | 0 | | | 2019-08-06 11:36:03 2 | server3 | 5432 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2019-08-06 11:36:15 (3 rows)
server3 is running as standby of new primary server2 .
[server3]# psql -h server3 -p 5432 -U pgpool postgres -c "select pg_is_in_recovery()" pg_is_in_recovery ------------------- t [server2]# psql -h server2 -p 5432 -U pgpool postgres -c "select pg_is_in_recovery()" pg_is_in_recovery ------------------- f [server2]# psql -h server2 -p 5432 -U pgpool postgres -c "select * from pg_stat_replication" -x -[ RECORD 1 ]----+------------------------------ pid | 11059 usesysid | 16392 usename | repl application_name | server3 client_addr | 192.168.137.103 client_hostname | client_port | 48694 backend_start | 2019-08-06 11:36:07.479161+09 backend_xmin | state | streaming sent_lsn | 0/75000148 write_lsn | 0/75000148 flush_lsn | 0/75000148 replay_lsn | 0/75000148 write_lag | flush_lag | replay_lag | sync_priority | 0 sync_state | async reply_time | 2019-08-06 11:42:59.823961+09
8.3.7.4. Online Recovery
Here, we use Pgpool-II online recovery functionality to restore server1 (old primary server) as a standby. Before restoring the old primary server, please ensure that recovery_1st_stage and pgpool_remote_start scripts exist in database cluster directory of current primary server server2 .
# pcp_recovery_node -h 192.168.137.150 -p 9898 -U pgpool -n 0 Password: pcp_recovery_node -- Command Successful
Then verify that server1 is started as a standby.
# psql -h 192.168.137.150 -p 9999 -U pgpool postgres -c "show pool_nodes" Password for user pgpool: node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 | server1 | 5432 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2019-08-06 11:48:05 1 | server2 | 5432 | up | 0.333333 | primary | 0 | false | 0 | | | 2019-08-06 11:36:03 2 | server3 | 5432 | up | 0.333333 | standby | 0 | true | 0 | streaming | async | 2019-08-06 11:36:15 (3 rows)