Release 3.5.5

Release Date: 2016-12-26

A.125.1. Bug fixes

  • Tightening up the watchdog security. (Muhammad Usama)

    Now wd_authkey uses the HMAC SHA-256 hashing.

  • Add pgpool_adm extension in Pgpool-II RPM. (Bo Peng)

  • Fix occasional segfault when query cache is enabled. (bug 263) (Tatsuo Ishii)

  • Fix packet kind does not match error in extended protocol. (bug 231) (Tatsuo Ishii)

    According to the bug231, the bug seem to bite you if all of following conditions are met:

    • Streaming replication mode

    • Load balance node is not node 0

    • Extended protocol is used

    • SELECT is executed, the statement is closed, then a transaction command is executed

    The sequence of how the problem bites is:

    1. SELECT executes on statement S1 on the load balance node 1

    2. Frontend send Close statement

    3. Pgpool-II forward it to backend 1

    4. Frontend sends Parse, Bind, Execute of COMMIT

    5. Pgpool-II forward it to backend 0 & 1

    6. Frontend sends sync message

    7. Pgpool-II forward it to backend 0 & 1

    8. Backend 0 replies back Parse complete ("1"), while backend 1 replies back close complete ("3") because of #3.

    9. Kind mismatch occurs

    The solution is, in #3, let Pgpool-II wait for response from backend 1, but do not read the response message. Later on Pgpool-II's state machine will read the response from it before the sync message is sent in #6. With this, backend 1 will reply back "1" in #8, and the kind mismatch error does not occur.

    Also, fix not calling pool_set_doing_extended_query_message() when receives Close message. (I don't know why it was missed).

    New regression test "067.bug231" was added.

  • Fix a race condition in a signal handler per bug 265. (Tatsuo Ishii)

    In child.c there's signal handler which calls elog. Since the signal handler is not blocked against other signals while processing, deadlock could occur in the system calls in the pgpool shutdown sequence. To fix the problem, now the signal handler is blocked by using POOL_SETMASK.

    Ideally we should avoid calling elog in signal handlers though.

  • Back porting the improved failover command propagation mechanism from Pgpool-II 3.6 (Muhammad Usama)

    Overhauling the design of how failover, failback and promote node commands are propagated to the watchdog nodes. Previously the watchdog on pgpool-II node that needs to perform the node command (failover, failback or promote node) used to broadcast the failover command to all attached pgpool-II nodes. And this sometimes makes the synchronization issues, especially when the watchdog cluster contains a large number of nodes and consequently the failover command sometimes gets executed by more than one pgpool-II.

    Now with this commit all the node commands are forwarded to the master/coordinator watchdog, which in turn propagates to all standby nodes. Apart from above the commit also changes the failover command interlocking mechanism and now only the master/coordinator node can become the lock holder so the failover commands will only get executed on the master/coordinator node.

  • Do not cancel a query when the query resulted in an error other than in native replication mode. (Tatsuo Ishii)

    It was intended to keep the consistency, but there's no point in other than native replication mode.

  • Remove obsoleted option "-c" in pgpool command. (Tatsuo Ishii)

    Also fix typo in the help message.

  • Fix authentication failed error when PCP command is cancelled. (bug 252) (Muhammad Usama)

  • Change the default value of search_primary_node_timeout from 10 to 300. (Tatsuo Ishii)

    Prior default value 10 seconds is sometimes too short for a standby to be promoted.

  • Fix the case when all backends are down then 1 node attached. (bug 248) (Tatsuo Ishii)

    When all backends are down, no connection is accepted. Then 1 PostgreSQL becomes up, and attach the node using pcp_attach_node. It successfully finishes. However, when a new connection arrives, still the connection is refused because pgpool child process looks into the cached status, in which the recovered node is still in down status if mode is streaming replication mode (native replication and other modes are fine). Solution is, if all nodes are down, force to restart all pgpool child.

  • Fix for: [pgpool-general: 4997] Avoiding downtime when pgpool changes require a restart (Muhammad Usama)

    To fix this, The verification mechanism of configuration parameter values is reversed, previously the standby nodes used to verify their parameter values against the respective values on the master pgpool-II node and when the inconsistency was found the FATAL error was thrown, now with this commit the verification responsibility is delegated to the master pgpool-II node. Now the master node will verify the configuration parameter values of each joining standby node against its local values and will produce a WARNING message instead of an error in case of a difference. This way the nodes having the different configurations will also be allowed to join the watchdog cluster and the user has to manually look out for the configuration inconsistency warnings in the master pgpool-II log to avoid the surprises at the time of pgpool-II master switch over.

  • Add compiler flag "-fno-strict-aliasing" in configure.ac to fix compiler error. (Tatsuo Ishii)

  • Do not use random() while generating MD5 salt. (Tatsuo Ishii)

    random() should not be used in security related applications. To replace random() , import PostmasterRandom() from PostgreSQL. Also store current time at the start up of Pgpool-II main process for later use.

  • Don't ignore sync message from frontend when query cache is enabled. (Tatsuo Ishii)