29.3. Logical Replication Failover
To allow subscriber nodes to continue replicating data from the publisher
node even when the publisher node goes down, there must be a physical standby
corresponding to the publisher node. The logical slots on the primary server
corresponding to the subscriptions can be synchronized to the standby server by
specifying
failover = true
when creating subscriptions. See
Section 47.2.3
for details.
Enabling the
failover
parameter ensures a seamless transition of those subscriptions after the
standby is promoted. They can continue subscribing to publications on the
new primary server.
Because the slot synchronization logic copies asynchronously, it is
necessary to confirm that replication slots have been synced to the standby
server before the failover happens. To ensure a successful failover, the
standby server must be ahead of the subscriber. This can be achieved by
configuring
synchronized_standby_slots
.
To confirm that the standby server is indeed ready for failover, follow these steps to verify that all necessary logical replication slots have been synchronized to the standby server:
-
On the subscriber node, use the following SQL to identify which replication slots should be synced to the standby that we plan to promote. This query will return the relevant replication slots associated with the failover-enabled subscriptions.
test_sub=# SELECT array_agg(quote_literal(s.subslotname)) AS slots FROM pg_subscription s WHERE s.subfailover AND s.subslotname IS NOT NULL; slots ------- {'sub1','sub2','sub3'} (1 row)
-
On the subscriber node, use the following SQL to identify which table synchronization slots should be synced to the standby that we plan to promote. This query needs to be run on each database that includes the failover-enabled subscription(s). Note that the table sync slot should be synced to the standby server only if the table copy is finished (See Section 51.55 ). We don't need to ensure that the table sync slots are synced in other scenarios as they will either be dropped or re-created on the new primary server in those cases.
test_sub=# SELECT array_agg(quote_literal(slot_name)) AS slots FROM ( SELECT CONCAT('pg_', srsubid, '_sync_', srrelid, '_', ctl.system_identifier) AS slot_name FROM pg_control_system() ctl, pg_subscription_rel r, pg_subscription s WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover ); slots ------- {'pg_16394_sync_16385_7394666715149055164'} (1 row)
-
Check that the logical replication slots identified above exist on the standby server and are ready for failover.
test_standby=# SELECT slot_name, (synced AND NOT temporary AND NOT conflicting) AS failover_ready FROM pg_replication_slots WHERE slot_name IN ('sub1','sub2','sub3', 'pg_16394_sync_16385_7394666715149055164'); slot_name | failover_ready --------------------------------------------+---------------- sub1 | t sub2 | t sub3 | t pg_16394_sync_16385_7394666715149055164 | t (4 rows)
If all the slots are present on the standby server and the result
(
failover_ready
) of the above SQL query is true, then
existing subscriptions can continue subscribing to publications now on the
new primary server.