Having multiple PostgreSQL servers running as primary can result in transactions lost due to diverging timelines. This situation is also called a split-brain problem. To avoid split-brain Patroni needs to ensure PostgreSQL will not accept any transaction commits after leader key expires in the DCS. Under normal circumstances Patroni will try to achieve this by stopping PostgreSQL when leader lock update fails for any reason. However, this may fail to happen due to various reasons:
-
Patroni has crashed due to a bug, out-of-memory condition or by being accidentally killed by a system administrator.
-
Shutting down PostgreSQL is too slow.
-
Patroni does not get to run due to high load on the system, the VM being paused by the hypervisor, or other infrastructure issues.
To guarantee correct behavior under these conditions Patroni supports watchdog devices. Watchdog devices are software or hardware mechanisms that will reset the whole system when they do not get a keepalive heartbeat within a specified timeframe. This adds an additional layer of fail safe in case usual Patroni split-brain protection mechanisms fail.
Patroni will try to activate the watchdog before promoting PostgreSQL to primary. If watchdog activation fails and watchdog mode is
required
then the node will refuse to become leader. When deciding to participate in leader election Patroni will also check that watchdog configuration will allow it to become leader at all. After demoting PostgreSQL (for example due to a manual failover) Patroni will disable the watchdog again. Watchdog will also be disabled while Patroni is in paused state.
By default Patroni will set up the watchdog to expire 5 seconds before TTL expires. With the default setup of
loop_wait=10
and
ttl=30
this gives HA loop at least 15 seconds (
ttl
-
safety_margin
-
loop_wait
) to complete before the system gets forcefully reset. By default accessing DCS is configured to time out after 10 seconds. This means that when DCS is unavailable, for example due to network issues, Patroni and PostgreSQL will have at least 5 seconds (
ttl
-
safety_margin
-
loop_wait
-
retry_timeout
) to come to a state where all client connections are terminated.
Safety margin is the amount of time that Patroni reserves for time between leader key update and watchdog keepalive. Patroni will try to send a keepalive immediately after confirmation of leader key update. If Patroni process is suspended for extended amount of time at exactly the right moment the keepalive may be delayed for more than the safety margin without triggering the watchdog. This results in a window of time where watchdog will not trigger before leader key expiration, invalidating the guarantee. To be absolutely sure that watchdog will trigger under all circumstances set up the watchdog to expire after half of TTL by setting
safety_margin
to -1 to set watchdog timeout to
ttl
//
2
. If you need this guarantee you probably should increase
ttl
and/or reduce
loop_wait
and
retry_timeout
.
Currently watchdogs are only supported using Linux watchdog device interface.
Setting up software watchdog on Linux
Default Patroni configuration will try to use
/dev/watchdog
on Linux if it is accessible to Patroni. For most use cases using software watchdog built into the Linux kernel is secure enough.
To enable software watchdog issue the following commands as root before starting Patroni:
modprobe softdog
# Replace postgres with the user you will be running patroni under
chown postgres /dev/watchdog
For testing it may be helpful to disable rebooting by adding
soft_noboot=1
to the modprobe command line. In this case the watchdog will just log a line in kernel ring buffer, visible via
dmesg
.
Patroni will log information about the watchdog when it is successfully enabled.