Integrating external lifecheck with watchdog

Pgpool-II watchdog process uses the BSD sockets for communicating with all the Pgpool-II processes and the same BSD socket can also be used by any third party system to provide the lifecheck function for local and remote Pgpool-II watchdog nodes. The BSD socket file name for IPC is constructed by appending Pgpool-II wd_port after "s.PGPOOLWD_CMD." string and the socket file is placed in the wd_ipc_socket_dir directory.

2.2.1. Watchdog IPC command packet format

The watchdog IPC command packet consists of three fields. Below table details the message fields and description.

Table 2-1. Watchdog IPC command packet format

Field Type Description
TYPE BYTE1 Command Type
LENGTH INT32 in network byte order The length of data to follow
DATA DATA in JSON format Command data in JSON format

2.2.2. Watchdog IPC result packet format

The watchdog IPC command result packet consists of three fields. Below table details the message fields and description.

Table 2-2. Watchdog IPC result packet format

Field Type Description
TYPE BYTE1 Command Type
LENGTH INT32 in network byte order The length of data to follow
DATA DATA in JSON format Command result data in JSON format

2.2.3. Watchdog IPC command packet types

The first byte of the IPC command packet sent to watchdog process and the result returned by watchdog process is identified as the command or command result type. The below table lists all valid types and their meanings

Table 2-3. Watchdog IPC command packet types

Name Byte Value Type Description
REGISTER FOR NOTIFICATIONS '0' Command packet Command to register the current connection to receive watchdog notifications
NODE STATUS CHANGE '2' Command packet Command to inform watchdog about node status change of watchdog node
GET NODES LIST '3' Command packet Command to get the list of all configured watchdog nodes
NODES LIST DATA '4' Result packet The JSON data in packet contains the list of all configured watchdog nodes
CLUSTER IN TRANSITION '7' Result packet Watchdog returns this packet type when it is not possible to process the command because the cluster is transitioning.
RESULT BAD '8' Result packet Watchdog returns this packet type when the IPC command fails
RESULT OK '9' Result packet Watchdog returns this packet type when IPC command succeeds

2.2.4. External lifecheck IPC packets and data

"GET NODES LIST" ,"NODES LIST DATA" and "NODE STATUS CHANGE" IPC messages of watchdog can be used to integration an external lifecheck systems. Note that the built-in lifecheck of pgpool also uses the same channel and technique.

2.2.4.1. Getting list of configured watchdog nodes

Any third party lifecheck system can send the "GET NODES LIST" packet on watchdog IPC socket with a JSON data containing the authorization key and value if wd_authkey is set or empty packet data when wd_authkey is not configured to get the "NODES LIST DATA" result packet.

The result packet returned by watchdog for the "GET NODES LIST" will contains the list of all configured watchdog nodes to do health check on in the JSON format. The JSON of the watchdog nodes contains the "WatchdogNodes" Array of all watchdog nodes. Each watchdog JSON node contains the "ID" , "NodeName" , "HostName" , "DelegateIP" , "WdPort" and "PgpoolPort" for each node.

      -- The example JSON data contained in "NODES LIST DATA"

      {
      "NodeCount":3,
      "WatchdogNodes":
      [
      {
      "ID":0,
      "State":1,
      "NodeName":"Linux_ubuntu_9999",
      "HostName":"watchdog-host1",
      "DelegateIP":"172.16.5.133",
      "WdPort":9000,
      "PgpoolPort":9999
      },
      {
      "ID":1,
      "State":1,
      "NodeName":"Linux_ubuntu_9991",
      "HostName":"watchdog-host2",
      "DelegateIP":"172.16.5.133",
      "WdPort":9000,
      "PgpoolPort":9991
      },
      {
      "ID":2,
      "State":1,
      "NodeName":"Linux_ubuntu_9992",
      "HostName":"watchdog-host3",
      "DelegateIP":"172.16.5.133",
      "WdPort":9000,
      "PgpoolPort":9992
      }
      ]
      }

      -- Note that ID 0 is always reserved for local watchdog node

     

After getting the configured watchdog nodes information from the watchdog the external lifecheck system can proceed with the health checking of watchdog nodes, and when it detects some status change of any node it can inform that to watchdog using the "NODE STATUS CHANGE" IPC messages of watchdog. The data in the message should contain the JSON with the node ID of the node whose status is changed (The node ID must be same as returned by watchdog for that node in WatchdogNodes list) and the new status of node.

      -- The example JSON to inform pgpool-II watchdog about health check
      failed on node with ID 1 will look like

      {
      "NodeID":1,
      "NodeStatus":1,
      "Message":"optional message string to log by watchdog for this event"
      "IPCAuthKey":"wd_authkey configuration parameter value"
      }

      -- NodeStatus values meanings are as follows
      NODE STATUS DEAD  =  1
      NODE STATUS ALIVE =  2