Implement state machine to manage
synchronous_standby_names
GUC and
/sync
key in DCS.
- exception patroni.quorum. QuorumError ( value : Any ) View on GitHub
-
Bases:
PatroniExceptionException indicating that the quorum state is broken.
- class patroni.quorum. QuorumStateResolver ( leader : str , quorum : int , voters : Collection [ str ] , numsync : int , sync : Collection [ str ] , numsync_confirmed : int , active : Collection [ str ] , sync_wanted : int , leader_wanted : str ) View on GitHub
-
Bases:
objectCalculates a list of state transitions and yields them as
Transitionnamed tuples.Synchronous replication state is set in two places:
-
PostgreSQL configuration sets how many and which nodes are needed for a commit to succeed, abbreviated as
numsyncandsyncset here; -
DCS contains information about how many and which nodes need to be interrogated to be sure to see an wal position containing latest confirmed commit, abbreviated as
quorumandvotersset.
Note
Both of above pairs have the meaning "ANY n OF set".
The number of nodes needed for commit to succeed,
numsync, is also called the replication factor.To guarantee zero transaction loss on failover we need to keep the invariant that at all times any subset of nodes that can acknowledge a commit overlaps with any subset of nodes that can achieve quorum to promote a new leader. Given a desired replication factor and a set of nodes able to participate in sync replication there is one optimal state satisfying this condition. Given the node set
active, the optimal state is:sync = voters = active numsync = min(sync_wanted, len(active)) quorum = len(active) - numsync
We need to be able to produce a series of state changes that take the system to this desired state from any other arbitrary state given arbitrary changes is node availability, configuration and interrupted transitions.
To keep the invariant the rule to follow is that when increasing
numsyncorquorum, we need to perform the increasing operation first. When decreasing either, the decreasing operation needs to be performed later. In other words:-
If a user increases
synchronous_node_countconfiguration, first we increasesynchronous_standby_names(numsync), then we decreasequorumfield in the/synckey; -
If a user decreases
synchronous_node_countconfiguration, first we increasequorumfield in the/synckey, then we decreasesynchronous_standby_names(numsync).
Order of adding or removing nodes from
syncandvotersdepends on the state ofsynchronous_standby_names.When adding new nodes:
if ``sync`` (``synchronous_standby_names``) is empty: add new nodes first to ``sync`` and then to ``voters`` when ``numsync_confirmed`` > ``0``. else: add new nodes first to ``voters`` and then to ``sync``.When removing nodes:
if ``sync`` (``synchronous_standby_names``) will become empty after removal: first remove nodes from ``voters`` and then from ``sync``. else: first remove nodes from ``sync`` and then from ``voters``. Make ``voters`` empty if ``numsync_confirmed`` == ``0``.- Variables :
-
-
leader – name of the leader, according to the
/synckey. -
quorum –
quorumvalue from the/synckey, the minimal number of nodes we need see when doing the leader race. -
voters –
sync_standbyvalue from the/synckey, set of node names we will be running the leader race against. -
numsync – the number of synchronous nodes from the
synchronous_standby_names. -
sync – set of node names listed in the
synchronous_standby_names. -
numsync_confirmed – the number of nodes that are confirmed to reach "safe" LSN after they were added to the
synchronous_standby_names. -
active – set of node names that are replicating from the primary (according to
pg_stat_replication) and are eligible to be listed insynchronous_standby_names. -
sync_wanted – desired number of synchronous nodes (
synchronous_node_countfrom the global configuration). -
leader_wanted – the desired leader (could be different from the
leaderright after a failover).
-
- __add_new_nodes ( ) Iterator [ Transition ]
-
Add new active nodes to
synchronous_standby_namesand to/synckey.- Yields :
-
transitions as
Transitionobjects.
- __handle_non_steady_cases ( ) Iterator [ Transition ]
-
Handle cases when set of transitions produced on previous run was interrupted.
- Yields :
-
transitions as
Transitionobjects.
- __handle_replication_factor_change ( ) Iterator [ Transition ]
-
Handle change of the replication factor (
sync_wanted, akasynchronous_node_count).- Yields :
-
transitions as
Transitionobjects.
- __init__ ( leader : str , quorum : int , voters : Collection [ str ] , numsync : int , sync : Collection [ str ] , numsync_confirmed : int , active : Collection [ str ] , sync_wanted : int , leader_wanted : str ) None View on GitHub
-
Instantiate :class:
QuorumStateResolverbased on input parameters.- Parameters :
-
-
leader – name of the leader, according to the
/synckey. -
quorum –
quorumvalue from the/synckey, the minimal number of nodes we need see when doing the leader race. -
voters –
sync_standbyvalue from the/synckey, set of node names we will be running the leader race against. -
numsync – the number of synchronous nodes from the
synchronous_standby_names. -
sync – Set of node names listed in the
synchronous_standby_names. -
numsync_confirmed – the number of nodes that are confirmed to reach "safe" LSN after they were added to the
synchronous_standby_names. -
active – set of node names that are replicating from the primary (according to
pg_stat_replication) and are eligible to be listed insynchronous_standby_names. -
sync_wanted – desired number of synchronous nodes (
synchronous_node_countfrom the global configuration). -
leader_wanted – the desired leader (could be different from the leader right after a failover).
-
- __remove_gone_nodes ( ) Iterator [ Transition ]
-
Remove inactive nodes from
synchronous_standby_namesand from/synckey.- Yields :
-
transitions as
Transitionobjects.
- _generate_transitions ( ) Iterator [ Transition ] View on GitHub
-
Produce a set of changes to safely transition from the current state to the desired.
- Yields :
-
transitions as
Transitionobjects.
- check_invariants ( ) None View on GitHub
-
Checks invariant of
synchronous_standby_namesand/synckey in DCS.See also
Check
QuorumStateResolver’s docstring for more information.- Raises :
-
QuorumError: in case of broken state
- quorum_update ( quorum : int , voters : CaseInsensitiveSet , leader : str | None = None , adjust_quorum : bool | None = True ) Iterator [ Transition ] View on GitHub
-
Updates
quorum,votersand optionallyleaderfields.- Parameters :
-
-
quorum – the new value for
quorum, could be adjusted depending on values ofnumsync_confirmedand adjust_quorum . -
voters – the new value for
voters, could be adjusted ifnumsync_confirmed==0. -
leader – the new value for
leader, optional. -
adjust_quorum – if set to
Truethe quorum requirement will be increased by the difference betweennumsyncandnumsync_confirmed.
-
- Yields :
-
the new state of the
/synckey as aTransitionobject. - Raises :
-
QuorumErrorin case of invalid data or if the invariant after transition could not be satisfied.
- sync_update ( numsync : int , sync : CaseInsensitiveSet ) Iterator [ Transition ] View on GitHub
-
Updates
numsyncandsyncfields.- Parameters :
-
-
numsync – the new value for
numsync. -
sync – the new value for
sync:
-
- Yields :
-
the new state of
synchronous_standby_namesas aTransitionobject. - Raises :
-
QuorumErrorin case of invalid data or if invariant after transition could not be satisfied
-
- class patroni.quorum. Transition ( transition_type : str , leader : str , num : int , names : CaseInsensitiveSet ) View on GitHub
-
Bases:
NamedTupleObject describing transition of
/syncorsynchronous_standby_namesto the new state.Note
Object attributes represent the new state.
- Variables :
-
-
transition_type –
possible values:
-
sync- indicates that we needed to updatesynchronous_standby_names. -
quorum- indicates that we need to update/synckey in DCS. -
restart- caller should stop iterating over transitions and restartQuorumStateResolver.
-
-
leader – the new value of the
leaderfield in the/synckey. -
num – the new value of the synchronous nodes count in
synchronous_standby_namesor value of thequorumfield in the/synckey fortransition_typevaluessyncandquorumrespectively. -
names – the new value of node names listed in
synchronous_standby_namesor value ofvotersfield in the/synckey fortransition_typevaluessyncandquorumrespectively.
-
- _asdict ( )
-
Return a new dict which maps field names to their values.
- _field_defaults = {}
- _fields = ('transition_type', 'leader', 'num', 'names')
- classmethod _make ( iterable )
-
Make a new Transition object from a sequence or iterable
- _replace ( ** kwds )
-
Return a new Transition object replacing specified fields with new values
- leader : str
-
Alias for field number 1
- names : CaseInsensitiveSet
-
Alias for field number 3
- num : int
-
Alias for field number 2
- transition_type : str
-
Alias for field number 0