Implement state machine to manage synchronous_standby_names GUC and /sync key in DCS.

exception patroni.quorum. QuorumError ( value : Any ) View on GitHub

Bases: PatroniException

Exception indicating that the quorum state is broken.

class patroni.quorum. QuorumStateResolver ( leader : str , quorum : int , voters : Collection [ str ] , numsync : int , sync : Collection [ str ] , numsync_confirmed : int , active : Collection [ str ] , sync_wanted : int , leader_wanted : str ) View on GitHub

Bases: object

Calculates a list of state transitions and yields them as Transition named tuples.

Synchronous replication state is set in two places:

  • PostgreSQL configuration sets how many and which nodes are needed for a commit to succeed, abbreviated as numsync and sync set here;

  • DCS contains information about how many and which nodes need to be interrogated to be sure to see an wal position containing latest confirmed commit, abbreviated as quorum and voters set.

Note

Both of above pairs have the meaning "ANY n OF set".

The number of nodes needed for commit to succeed, numsync , is also called the replication factor.

To guarantee zero transaction loss on failover we need to keep the invariant that at all times any subset of nodes that can acknowledge a commit overlaps with any subset of nodes that can achieve quorum to promote a new leader. Given a desired replication factor and a set of nodes able to participate in sync replication there is one optimal state satisfying this condition. Given the node set active , the optimal state is:

sync = voters = active

numsync = min(sync_wanted, len(active))

quorum = len(active) - numsync

We need to be able to produce a series of state changes that take the system to this desired state from any other arbitrary state given arbitrary changes is node availability, configuration and interrupted transitions.

To keep the invariant the rule to follow is that when increasing numsync or quorum , we need to perform the increasing operation first. When decreasing either, the decreasing operation needs to be performed later. In other words:

  • If a user increases synchronous_node_count configuration, first we increase synchronous_standby_names ( numsync ), then we decrease quorum field in the /sync key;

  • If a user decreases synchronous_node_count configuration, first we increase quorum field in the /sync key, then we decrease synchronous_standby_names ( numsync ).

Order of adding or removing nodes from sync and voters depends on the state of synchronous_standby_names .

When adding new nodes:

if ``sync`` (``synchronous_standby_names``) is empty:
    add new nodes first to ``sync`` and then to ``voters`` when ``numsync_confirmed`` > ``0``.
else:
    add new nodes first to ``voters`` and then to ``sync``.

When removing nodes:

if ``sync`` (``synchronous_standby_names``) will become empty after removal:
    first remove nodes from ``voters`` and then from ``sync``.
else:
    first remove nodes from ``sync`` and then from ``voters``.
    Make ``voters`` empty if ``numsync_confirmed`` == ``0``.
Variables :
  • leader – name of the leader, according to the /sync key.

  • quorum quorum value from the /sync key, the minimal number of nodes we need see when doing the leader race.

  • voters sync_standby value from the /sync key, set of node names we will be running the leader race against.

  • numsync – the number of synchronous nodes from the synchronous_standby_names .

  • sync – set of node names listed in the synchronous_standby_names .

  • numsync_confirmed – the number of nodes that are confirmed to reach "safe" LSN after they were added to the synchronous_standby_names .

  • active – set of node names that are replicating from the primary (according to pg_stat_replication ) and are eligible to be listed in synchronous_standby_names .

  • sync_wanted – desired number of synchronous nodes ( synchronous_node_count from the global configuration).

  • leader_wanted – the desired leader (could be different from the leader right after a failover).

__add_new_nodes ( ) Iterator [ Transition ]

Add new active nodes to synchronous_standby_names and to /sync key.

Yields :

transitions as Transition objects.

__handle_non_steady_cases ( ) Iterator [ Transition ]

Handle cases when set of transitions produced on previous run was interrupted.

Yields :

transitions as Transition objects.

__handle_replication_factor_change ( ) Iterator [ Transition ]

Handle change of the replication factor ( sync_wanted , aka synchronous_node_count ).

Yields :

transitions as Transition objects.

__init__ ( leader : str , quorum : int , voters : Collection [ str ] , numsync : int , sync : Collection [ str ] , numsync_confirmed : int , active : Collection [ str ] , sync_wanted : int , leader_wanted : str ) None View on GitHub

Instantiate :class: QuorumStateResolver based on input parameters.

Parameters :
  • leader – name of the leader, according to the /sync key.

  • quorum quorum value from the /sync key, the minimal number of nodes we need see when doing the leader race.

  • voters sync_standby value from the /sync key, set of node names we will be running the leader race against.

  • numsync – the number of synchronous nodes from the synchronous_standby_names .

  • sync – Set of node names listed in the synchronous_standby_names .

  • numsync_confirmed – the number of nodes that are confirmed to reach "safe" LSN after they were added to the synchronous_standby_names .

  • active – set of node names that are replicating from the primary (according to pg_stat_replication ) and are eligible to be listed in synchronous_standby_names .

  • sync_wanted – desired number of synchronous nodes ( synchronous_node_count from the global configuration).

  • leader_wanted – the desired leader (could be different from the leader right after a failover).

__remove_gone_nodes ( ) Iterator [ Transition ]

Remove inactive nodes from synchronous_standby_names and from /sync key.

Yields :

transitions as Transition objects.

_generate_transitions ( ) Iterator [ Transition ] View on GitHub

Produce a set of changes to safely transition from the current state to the desired.

Yields :

transitions as Transition objects.

check_invariants ( ) None View on GitHub

Checks invariant of synchronous_standby_names and /sync key in DCS.

See also

Check QuorumStateResolver ’s docstring for more information.

Raises :

QuorumError : in case of broken state

quorum_update ( quorum : int , voters : CaseInsensitiveSet , leader : str | None = None , adjust_quorum : bool | None = True ) Iterator [ Transition ] View on GitHub

Updates quorum , voters and optionally leader fields.

Parameters :
  • quorum – the new value for quorum , could be adjusted depending on values of numsync_confirmed and adjust_quorum .

  • voters – the new value for voters , could be adjusted if numsync_confirmed == 0 .

  • leader – the new value for leader , optional.

  • adjust_quorum – if set to True the quorum requirement will be increased by the difference between numsync and numsync_confirmed .

Yields :

the new state of the /sync key as a Transition object.

Raises :

QuorumError in case of invalid data or if the invariant after transition could not be satisfied.

sync_update ( numsync : int , sync : CaseInsensitiveSet ) Iterator [ Transition ] View on GitHub

Updates numsync and sync fields.

Parameters :
  • numsync – the new value for numsync .

  • sync – the new value for sync :

Yields :

the new state of synchronous_standby_names as a Transition object.

Raises :

QuorumError in case of invalid data or if invariant after transition could not be satisfied

class patroni.quorum. Transition ( transition_type : str , leader : str , num : int , names : CaseInsensitiveSet ) View on GitHub

Bases: NamedTuple

Object describing transition of /sync or synchronous_standby_names to the new state.

Note

Object attributes represent the new state.

Variables :
  • transition_type

    possible values:

    • sync - indicates that we needed to update synchronous_standby_names .

    • quorum - indicates that we need to update /sync key in DCS.

    • restart - caller should stop iterating over transitions and restart QuorumStateResolver .

  • leader – the new value of the leader field in the /sync key.

  • num – the new value of the synchronous nodes count in synchronous_standby_names or value of the quorum field in the /sync key for transition_type values sync and quorum respectively.

  • names – the new value of node names listed in synchronous_standby_names or value of voters field in the /sync key for transition_type values sync and quorum respectively.

_asdict ( )

Return a new dict which maps field names to their values.

_field_defaults = {}
_fields = ('transition_type', 'leader', 'num', 'names')
classmethod _make ( iterable )

Make a new Transition object from a sequence or iterable

_replace ( ** kwds )

Return a new Transition object replacing specified fields with new values

leader : str

Alias for field number 1

names : CaseInsensitiveSet

Alias for field number 3

num : int

Alias for field number 2

transition_type : str

Alias for field number 0