High Availability

One of the great things about PostgreSQL is its reliability: it is very stable and typically “just works.” However, there are certain things that can happen in the environment that PostgreSQL is deployed in that can affect its uptime, including:

The database storage disk fails or some other hardware failure occurs
The network on which the database resides becomes unreachable
The host operating system becomes unstable and crashes
A key database file becomes corrupted
A data center is lost

There may also be downtime events that are due to the normal case of operations, such as performing a minor upgrade, security patching of operating system, hardware upgrade, or other maintenance.

Fortunately, the Crunchy PostgreSQL Operator is prepared for this.

PostgreSQL Operator High-Availability Overview

The Crunchy PostgreSQL Operator supports a distributed-consensus based high-availability (HA) system that keeps its managed PostgreSQL clusters up and running, even if the PostgreSQL Operator disappears. Additionally, it leverages Kubernetes specific features such as Pod Anti-Affinity to limit the surface area that could lead to a PostgreSQL cluster becoming unavailable. The PostgreSQL Operator also supports automatic healing of failed primaries and leverages the efficient pgBackRest “delta restore” method, which eliminates the need to fully reprovision a failed cluster!

This tutorial will cover the “howtos” of high availbility. For more information on the topic, please review the detailed high availability architecture section.

Create a HA PostgreSQL Cluster

High availability is enabled in the PostgreSQL Operator by default so long as you have more than one replica. To create a high availability PostgreSQL cluster, you can execute the following command:

pgo create cluster hippo --replica-count=1

Scale a PostgreSQL Cluster

You can scale an existing PostgreSQL cluster to add HA to it by using the pgo scale command:

pgo scale hippo

Scale Down a PostgreSQL Cluster

To scale down a PostgreSQL cluster, you will have to provide a target of which instance you want to scale down. You can do this with the pgo scaledown command:

pgo scaledown hippo --query

which will yield something similar to:

Cluster: hippo
REPLICA             	STATUS    	NODE      	REPLICATION LAG     	PENDING RESTART
hippo-ojnd          	running   	node01    	           0 MB     	          false

Once you have determined which instance you want to scale down, you can run the following command:

pgo scaledown hippo --target=hippo-ojnd

Manual Failover

Each PostgreSQL cluster will manage its own availability. If you wish to manually fail over, you will need to use the pgo failover command. First, determine which instance you want to fail over to:

pgo failover hippo --query

which will yield something similar to:

Cluster: hippo
REPLICA             	STATUS    	NODE      	REPLICATION LAG     	PENDING RESTART
hippo-ojnd          	running   	node01    	           0 MB     	          false

Once you have determine your failover target, you can run the following command:

pgo failover hippo --target==hippo-ojnd

Synchronous Replication

If you have a write sensitive workload and wish to use synchronous replication, you can create your PostgreSQL cluster with synchronous replication turned on:

pgo create cluster hippo --sync-replication

Please understand the tradeoffs of synchronous replication before using it.

Pod Anti-Affinity and Node Affinity

To leran how to use pod anti-affinity and node affinity, please refer to the high availability architecture documentation

Next Steps

Backups, restores, point-in-time-recoveries: disaster recovery is a big topic! We’ll learn about you can perform disaster recovery and more in the PostgreSQL Operator.