Implement state machine to manage
synchronous_standby_names
GUC and
/sync
key in DCS.
- exception patroni.quorum. QuorumError ( value : Any ) View on GitHub
-
Bases:
PatroniException
Exception indicating that the quorum state is broken.
- class patroni.quorum. QuorumStateResolver ( leader : str , quorum : int , voters : Collection [ str ] , numsync : int , sync : Collection [ str ] , numsync_confirmed : int , active : Collection [ str ] , sync_wanted : int , leader_wanted : str ) View on GitHub
-
Bases:
object
Calculates a list of state transitions and yields them as
Transition
named tuples.Synchronous replication state is set in two places:
-
PostgreSQL configuration sets how many and which nodes are needed for a commit to succeed, abbreviated as
numsync
andsync
set here; -
DCS contains information about how many and which nodes need to be interrogated to be sure to see an wal position containing latest confirmed commit, abbreviated as
quorum
andvoters
set.
Note
Both of above pairs have the meaning "ANY n OF set".
The number of nodes needed for commit to succeed,
numsync
, is also called the replication factor.To guarantee zero transaction loss on failover we need to keep the invariant that at all times any subset of nodes that can acknowledge a commit overlaps with any subset of nodes that can achieve quorum to promote a new leader. Given a desired replication factor and a set of nodes able to participate in sync replication there is one optimal state satisfying this condition. Given the node set
active
, the optimal state is:sync = voters = active numsync = min(sync_wanted, len(active)) quorum = len(active) - numsync
We need to be able to produce a series of state changes that take the system to this desired state from any other arbitrary state given arbitrary changes is node availability, configuration and interrupted transitions.
To keep the invariant the rule to follow is that when increasing
numsync
orquorum
, we need to perform the increasing operation first. When decreasing either, the decreasing operation needs to be performed later. In other words:-
If a user increases
synchronous_node_count
configuration, first we increasesynchronous_standby_names
(numsync
), then we decreasequorum
field in the/sync
key; -
If a user decreases
synchronous_node_count
configuration, first we increasequorum
field in the/sync
key, then we decreasesynchronous_standby_names
(numsync
).
Order of adding or removing nodes from
sync
andvoters
depends on the state ofsynchronous_standby_names
.When adding new nodes:
if ``sync`` (``synchronous_standby_names``) is empty: add new nodes first to ``sync`` and then to ``voters`` when ``numsync_confirmed`` > ``0``. else: add new nodes first to ``voters`` and then to ``sync``.
When removing nodes:
if ``sync`` (``synchronous_standby_names``) will become empty after removal: first remove nodes from ``voters`` and then from ``sync``. else: first remove nodes from ``sync`` and then from ``voters``. Make ``voters`` empty if ``numsync_confirmed`` == ``0``.
- Variables :
-
-
leader – name of the leader, according to the
/sync
key. -
quorum –
quorum
value from the/sync
key, the minimal number of nodes we need see when doing the leader race. -
voters –
sync_standby
value from the/sync
key, set of node names we will be running the leader race against. -
numsync – the number of synchronous nodes from the
synchronous_standby_names
. -
sync – set of node names listed in the
synchronous_standby_names
. -
numsync_confirmed – the number of nodes that are confirmed to reach "safe" LSN after they were added to the
synchronous_standby_names
. -
active – set of node names that are replicating from the primary (according to
pg_stat_replication
) and are eligible to be listed insynchronous_standby_names
. -
sync_wanted – desired number of synchronous nodes (
synchronous_node_count
from the global configuration). -
leader_wanted – the desired leader (could be different from the
leader
right after a failover).
-
- __add_new_nodes ( ) Iterator [ Transition ]
-
Add new active nodes to
synchronous_standby_names
and to/sync
key.- Yields :
-
transitions as
Transition
objects.
- __handle_non_steady_cases ( ) Iterator [ Transition ]
-
Handle cases when set of transitions produced on previous run was interrupted.
- Yields :
-
transitions as
Transition
objects.
- __handle_replication_factor_change ( ) Iterator [ Transition ]
-
Handle change of the replication factor (
sync_wanted
, akasynchronous_node_count
).- Yields :
-
transitions as
Transition
objects.
- __init__ ( leader : str , quorum : int , voters : Collection [ str ] , numsync : int , sync : Collection [ str ] , numsync_confirmed : int , active : Collection [ str ] , sync_wanted : int , leader_wanted : str ) None View on GitHub
-
Instantiate :class:
QuorumStateResolver
based on input parameters.- Parameters :
-
-
leader – name of the leader, according to the
/sync
key. -
quorum –
quorum
value from the/sync
key, the minimal number of nodes we need see when doing the leader race. -
voters –
sync_standby
value from the/sync
key, set of node names we will be running the leader race against. -
numsync – the number of synchronous nodes from the
synchronous_standby_names
. -
sync – Set of node names listed in the
synchronous_standby_names
. -
numsync_confirmed – the number of nodes that are confirmed to reach "safe" LSN after they were added to the
synchronous_standby_names
. -
active – set of node names that are replicating from the primary (according to
pg_stat_replication
) and are eligible to be listed insynchronous_standby_names
. -
sync_wanted – desired number of synchronous nodes (
synchronous_node_count
from the global configuration). -
leader_wanted – the desired leader (could be different from the leader right after a failover).
-
- __remove_gone_nodes ( ) Iterator [ Transition ]
-
Remove inactive nodes from
synchronous_standby_names
and from/sync
key.- Yields :
-
transitions as
Transition
objects.
- _generate_transitions ( ) Iterator [ Transition ] View on GitHub
-
Produce a set of changes to safely transition from the current state to the desired.
- Yields :
-
transitions as
Transition
objects.
- check_invariants ( ) None View on GitHub
-
Checks invariant of
synchronous_standby_names
and/sync
key in DCS.See also
Check
QuorumStateResolver
’s docstring for more information.- Raises :
-
QuorumError
: in case of broken state
- quorum_update ( quorum : int , voters : CaseInsensitiveSet , leader : str | None = None , adjust_quorum : bool | None = True ) Iterator [ Transition ] View on GitHub
-
Updates
quorum
,voters
and optionallyleader
fields.- Parameters :
-
-
quorum – the new value for
quorum
, could be adjusted depending on values ofnumsync_confirmed
and adjust_quorum . -
voters – the new value for
voters
, could be adjusted ifnumsync_confirmed
==0
. -
leader – the new value for
leader
, optional. -
adjust_quorum – if set to
True
the quorum requirement will be increased by the difference betweennumsync
andnumsync_confirmed
.
-
- Yields :
-
the new state of the
/sync
key as aTransition
object. - Raises :
-
QuorumError
in case of invalid data or if the invariant after transition could not be satisfied.
- sync_update ( numsync : int , sync : CaseInsensitiveSet ) Iterator [ Transition ] View on GitHub
-
Updates
numsync
andsync
fields.- Parameters :
-
-
numsync – the new value for
numsync
. -
sync – the new value for
sync
:
-
- Yields :
-
the new state of
synchronous_standby_names
as aTransition
object. - Raises :
-
QuorumError
in case of invalid data or if invariant after transition could not be satisfied
-
- class patroni.quorum. Transition ( transition_type : str , leader : str , num : int , names : CaseInsensitiveSet ) View on GitHub
-
Bases:
NamedTuple
Object describing transition of
/sync
orsynchronous_standby_names
to the new state.Note
Object attributes represent the new state.
- Variables :
-
-
transition_type –
possible values:
-
sync
- indicates that we needed to updatesynchronous_standby_names
. -
quorum
- indicates that we need to update/sync
key in DCS. -
restart
- caller should stop iterating over transitions and restartQuorumStateResolver
.
-
-
leader – the new value of the
leader
field in the/sync
key. -
num – the new value of the synchronous nodes count in
synchronous_standby_names
or value of thequorum
field in the/sync
key fortransition_type
valuessync
andquorum
respectively. -
names – the new value of node names listed in
synchronous_standby_names
or value ofvoters
field in the/sync
key fortransition_type
valuessync
andquorum
respectively.
-
- _asdict ( )
-
Return a new dict which maps field names to their values.
- _field_defaults = {}
- _fields = ('transition_type', 'leader', 'num', 'names')
- classmethod _make ( iterable )
-
Make a new Transition object from a sequence or iterable
- _replace ( ** kwds )
-
Return a new Transition object replacing specified fields with new values
- leader : str
-
Alias for field number 1
- names : CaseInsensitiveSet
-
Alias for field number 3
- num : int
-
Alias for field number 2
- transition_type : str
-
Alias for field number 0