Testing Online Recovery

Pgpool-II allows to recover a downed node by technique called "Online Recovery". This copies data from the primary node to a standby node so that it sync with the primary. This may take long time and database may be updated during the process. That's no problem because in the streaming configuration, the standby will receive WAL log and applies it to catch up the primary. To test online recovery, let's start with previous cluster, where node 0 is in down state.

$ pcp_recovery_node -p 11001 0
Password: 
pcp_recovery_node -- Command Successful

$ psql -p 11000 -c "show pool_nodes" test
 node_id | hostname | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------
 0       | /tmp     | 11002 | up     | 0.500000  | standby | 2173       | true              | 0
 1       | /tmp     | 11003 | up     | 0.500000  | primary | 0          | false             | 0
(2 rows)
      

pcp_recovery_node is one of control commands coming with Pgpool-II installation. The argument -p is to specify the port number assigned to the command, which is 11001 set by pgpool_setup . The second argument is the target node id. After executing the command, node 0 returned to "up" status.

The script executed by pcp_recovery_node is specified as "recovery_1st_stage_command" in pgpool.conf . Here is the file installed by pgpool_setup .

#! /bin/sh
psql=/usr/local/pgsql/bin/psql
DATADIR_BASE=/home/t-ishii/tmp/Tutorial
PGSUPERUSER=t-ishii

master_db_cluster=$1
recovery_node_host_name=$2
DEST_CLUSTER=$3
PORT=$4

log=$DATADIR_BASE/log/recovery.log

$psql -p $PORT -c "SELECT pg_start_backup('Streaming Replication', true)" postgres

echo "source: $master_db_cluster dest: $DEST_CLUSTER" > $log

rsync -C -a -c --delete --exclude postgresql.conf --exclude postmaster.pid \
--exclude postmaster.opts --exclude pg_log \
--exclude recovery.conf --exclude recovery.done \
--exclude pg_xlog \
$master_db_cluster/ $DEST_CLUSTER/

rm -fr $DEST_CLUSTER/pg_xlog 
mkdir $DEST_CLUSTER/pg_xlog
chmod 700 $DEST_CLUSTER/pg_xlog
rm $DEST_CLUSTER/recovery.done
cat > $DEST_CLUSTER/recovery.conf $lt;$lt;REOF
standby_mode          = 'on'
primary_conninfo      = 'port=$PORT user=$PGSUPERUSER'
recovery_target_timeline='latest'
REOF

$psql -p $PORT -c "SELECT pg_stop_backup()" postgres