4.2.0
Crunchy Data announces the release of the PostgreSQL Operator 4.2.0 on December, 31, 2019.
The focus of the 4.2.0 release of the PostgreSQL Operator was on the resiliency and uptime of the PostgreSQL clusters that the PostgreSQL Operator manages, with an emphasis on high-availability and removing the Operator from being a single-point-of-failure in the HA process. This release introduces support for a distributed-consensus based high-availability approach using Kubernetes distributed consensus store as the backing, which, in other words, allows for the PostgreSQL clusters to manage their own availability and not the PostgreSQL Operator. This is accomplished by leveraging the open source high-availability framework Patroni as well as the open source, high-performant PostgreSQL disaster recovery management tool pgBackRest.
To accomplish this, we have introduced a new container called crunchy-postgres-ha
(and for geospatial workloads, crunchy-postgres-gis-ha
). If you are upgrading from an older version of the PostgreSQL Operator, you will need to modify your installation to use these containers.
Included in the PostgreSQL Operator 4.2.0 introduces the following new features:
- An improved PostgreSQL HA (high-availability) solution using distributed consensus that is backed by Kubernetes. This includes:
- Elimination of the PostgreSQL Operator as the arbiter that decides when a cluster should fail over
- Support for Pod anti-affinity, which indicates to Kubernetes schedule pods (e.g. PostgreSQL instances) on separate nodes
- Failed primaries now automatically heal, which significantly reduces the time in which they can rejoin the cluster.
- Introduction of synchronous replication for workloads that are sensitive to transaction loss (with a tradeoff of performance and potentially availability)
- Standardization of physical backups and restores on pgBackRest, with native support for
pg_basebackup
removed. - Introduction of the ability to clone PostgreSQL clusters using the
pgo clone
command. This feature copies the pgBackRest repository from a cluster and creates a new, single instance primary as its own cluster. - Allow one to use their own certificate authority (CA) when interfacing with the Operator API, and to specify usage of the CA from the
pgo
command-line interface (CLI)
The container building process has been optimized, with build speed ups reported to be 70% faster.
The Postgres Operator 4.2.0 release also includes the following software versions upgrades:
- The PostgreSQL containers now use versions 12.1, 11.6, 10.11, 9.6.16, and 9.5.20.
- pgBackRest is upgraded to use version 2.20
- pgBouncer is upgraded to use version 1.12
- Patroni uses version 1.6.3
PostgreSQL Operator is tested with Kubernetes 1.13 - 1.15, OpenShift 3.11+, Google Kubernetes Engine (GKE), and VMware Enterprise PKS 1.3+. We have taken steps to ensure the PostgreSQL Operator is compatible with Kubernetes 1.16+, but did not test thoroughly on it for this release. Cursory testing indicates that the PostgreSQL Operator is compatible with Kubernetes 1.16 and beyond, but we advise that you run your own tests.
Major Features
High-Availability & Disaster Recovery
PostgreSQL Operator 4.2.0 makes significant enhancements to the high-availability and disaster recovery capabilities of the PostgreSQL Operator by moving to a distributed-consensus based model for maintaining availability, standardizing around pgBackRest for backups and restores, and removing the Operator itself as a single-point-of-failure in relation to PostgreSQL cluster resiliency.
As the high-availability environment introduced by PostgreSQL Operator 4.2.0 is now the default, setting up a HA cluster is as easy as:
pgo create cluster hacluster
pgo scale hacluster --replica-count=2
If you wish to disable high-availability for a cluster, you can use the following command:
pgo create cluster boringcluster --disable-autofail
New Required HA PostgreSQL Containers: crunchy-postgres-ha
and crunchy-postgres-gis-ha
Using the PostgreSQL Operator 4.2.0 requires replacing your crunchy-postgres
and crunchy-postgres-gis
containers with the crunchy-postgres-ha
and crunchy-postgres-gis-ha
containres respectively. The underlying PostgreSQL installations in the container remain the same but are now optimized for Kubernetes environments to provide the new high-availability functionality.
A major change to this container is that the PostgreSQL process is now managed by Patroni. This allows a PostgreSQL cluster that is deployed by the PostgreSQL Operator to manage its own uptime and availability, to elect a new leader in the event of a downtime scenario, and to automatically heal after a failover event.
Upgrading to these new containers is as simple as modifying your CRD ccpimage
parameter to use crunchy-postgres-ha
to use the HA enabled containers. Please see our upgrade instructions to select your preferred upgrade strategy.
pgBackRest Standardization
pgBackRest is now the only backup and restore method supported by the PostgreSQL Operator. This has allowed for the following features:
- Faster creation of new replicas when a scale up request is made
- Automatic healing of PostgreSQL instances after a failover event, leveraging the pgBackRest delta restore feature. This allows for a significantly shorter healing process
- The ability to clone PostgreSQL clusters
As part of this standardization, one change to note is that after a PostgreSQL cluster is created, the PostgreSQL Operator will schedule a full backup of the cluster. This is to ensure that a new replica can be created from a pgBackRest backup. If this initial backup fails, no new replicas will be provisioned.
When upgrading from an earlier version, please ensure that you have at least one pgBackRest full backup in your backup repository.
Pod Anti-Affinity
PostgreSQL Operator 4.2.0 adds support for Kubernetes pod anti-affinity, which provides guidance on how Kubernetes should schedule pods relative to each other. This is helpful in high-availability architectures to ensure that PostgreSQL pods are spread out in order to withstand node failures. For example, in a setup with two PostgreSQL instances, you would not want both instances scheduled to the same node: if that node goes down or becomes unreachable, then your cluster will be unavailable!
The way the PostgreSQL Operator uses pod anti-affinity is that it tries to ensure that none of the managed pods within the same cluster are scheduled to the same node. These include:
- Any PostgreSQL instances
- The pod that manages pgBackRest repository
- If deployed, any pgBouncer pods
This helps improve the likelihood that a cluster can remain up even if a downtime event occurs.
There are three options available for pod anti-affinity:
preferred
: Kubernetes will try to schedule any pods within a PostgreSQL cluster to different nodes, but in the event it must schedule two pods on the same node, it will. This is the default optionrequired
: Kubernetes will schedule pods within a PostgreSQL cluster to different nodes, but in the event it cannot schedule a pod to a different node, it will not schedule the pod until a different node is available. While this guarantees that no pod will share the same node, it can also lead to downtime events as well.disabled
: Pod anti-affinity is not used.
These options can be combined with the existing node affinity functionality (--node-label
) to group the scheduling of pods to particular node labels!
Synchronous Replication
PostgreSQL Operator 4.2 introduces support for synchronous replication by leveraging the “synchronous mode” functionality provided by Patroni. Synchronous replication is useful for workloads that are sensitive to losing transactions, as PostgreSQL will not consider a transaction to be committed until it is committed to all synchronous replicas connected to a primary. This provides a higher guarantee of data consistency and, when a healthy synchronous replica is present, a guarantee of the most up-to-date data during a failover event.
This comes at a cost of performance as PostgreSQL: as PostgreSQL has to wait for a transaction to be committed on all synchronous replicas, a connected client will have to wait longer than if the transaction only had to be committed on the primary (which is how asynchronous replication works). Additionally, there is a potential impact to availability: if a synchronous replica crashes, any writes to the primary will be blocked until a replica is promoted to become a new synchronous replica of the primary.
You can enable synchronous replication by using the --sync-replication
flag with the pgo create
command.
Updated pgo CLI Flags
pgo create
now has a CLI flag for pod anti-affinity called--pod-anti-affinity
, which accepts the valuesrequired
,preferred
, anddisabled
pgo create --sync-replication
specifies to create a PostgreSQL HA cluster with synchronous replication
Global Configuration
To support high-availability there are some new settings that you can manage from your pgo.yaml
file:
DisableAutofail
- when set totrue
, this will disable the new HA functionality in any newly created PostgreSQL clusters. By default, this isfalse
.DisableReplicaStartFailReinit
- when set totrue
, this will disable attempting to re-provision a PostgreSQL replica when it is stuck in a “start failed” state. By default, thisfalse
.PodAntiAffinity
- Determines the type of pod anti-affinity rules to apply to the pods within a newly PostgreSQL cluster. If set torequired
, pods within a PostgreSQL cluster must be scheduled on different nodes, otherwise a pod will fail to schedule. If set topreferred
, Kubernetes will make a best effort to schedule pods of the same PostgreSQL cluster on different nodes. If set todisabled
, this feature is disabled. By default, this ispreferred
.SyncReplication
- If set totrue
, enables synchronous replication in newly created PostgreSQL clusters. Default tofalse
.
pgo clone
PostgreSQL Operator 4.2.0 introduces the ability to clone the data from one PostgreSQL cluster into a brand new PostgreSQL cluster. The command to do so is simple:
pgo clone oldcluster newcluster
After the command is executed, the PostgreSQL Operator checks to see if a) the oldcluster
exists and b) the newcluster
does not exist. If both of these conditions hold, the PostgreSQL Operator creates two new PVCs the match the specs of the oldcluster
PostgreSQL data PVC (PrimaryStorage
) and its pgBackRest repository PVC (BackrestStorage
).
If these PVCs are successfully created, the PostgreSQL Operator will copy the contents of the pgBackRest repository from the oldcluster
to the one setup for the newcluster
by means of a Kubernetes Job that is running rsync
provided by the pgo-backrest-repo-sync
container. We are able to do this because all changes to the pgBackRest repository are atomic.
If this successfully completes, the PostgreSQL Operator then runs a pgBackRest restore job to restore the PostgreSQL cluster. On a successful restore, the new PostgreSQL cluster is then scheduled and runs in recovery mode until it reaches a consistent state, and then comes online as a brand new cluster
To optimize the time it takes to restore for a clone, we recommend taking a backup of the cluster you want to clone. You can do this with the pgo backup
command, and choose if you want to take a full, differential, or incremental backup.
Future work will be focused on additional options, such as being able to clone a PostgreSQL cluster to a particular point-in-time (so long as the backup is available to support it) and supporting other pgo create
flags.
Schedule Backups With Retention Policies
While the PostgreSQL Operator has had the ability to schedule full, incremental, and differential pgBackRest backups for awhile, it has not been possible to set the retention policy on these backups. Backup retention policies allow users to manage their backup storage whle maintaining enough backups to be able to recover to a specific point-in-time, or perform forensic work on data in a particular state.
For example, one can schedule a full backup to occur nightly at midnight and keep up to 21 full backups (e.g. a 21 day retention policy):
pgo create schedule mycluster --schedule="0 0 * * *" --schedule-type="pgbackrest" --pgbackrest-backup-type=full --schedule-opts="--repo1-retention-full=21"
Breaking Changes
Feature Removals
- Physical backups using
pg_basebackup
are no longer supported. Any command-line option that references using this method has been removed. The API endpoints where one can specify apg_basebackup
remain, but will be removed in a future release (likely the next one). - Removed the
pgo-lspvc
container. This container was used with thepgo show pvc
and performed searches on the mounted filesystem. This would cause issues both on environments that could not support a PVC being mounted multiple times, and for underlying volumes that contained numerous files. Now,pgo show pvc
only lists the PVCs associated with the PostgreSQL clusters managed by the PostgreSQL Operator. - Native support for pgpool has been removed.
Command Line (pgo
)
pgo create cluster
- The
--pgbackrest
option is removed as it is no longer needed. pgBackRest is enabled by default
pgo delete cluster
The default behavior for pgo delete cluster
has changed so that all backups and PostgreSQL data are deleted by default.
To keep a backup after a cluster is deleted, one can use the --keep-backups
flag with pgo delete cluster
, and to keep the PostgreSQL data directory, one can specify the --keep-data
flag. There is a plan to remove the --keep-data
flag in a future release, though this has not been determined yet.
The -b
, --delete-backups
, -d
, and --delete-data
flags are all deprecated and will be removed in the next release.
pgo scaledown
With this release, pgo scaledown
will delete the PostgreSQL data directory of the replica by default. To keep the PostgreSQL directory after the replica has scaled down, one can use the --keep-data
flag.
pgo test
pgo test
is optimized to provide faster results about the availability of a PostgreSQL cluster. Instead of attempting to make PostgreSQL connections to each PostgreSQL instance with each user, pgo test
now checks the availability of the service endpoints for each PostgreSQL cluster as well as the output of the PostgreSQL readiness checks, which check the connectivity of a PostgreSQL cluster.
Both the API and the output of pgo test
are modified for this optimization.
Additional apiserver Changes
- An authorization failure in the
apiserver
(i.e. not having the correct RBAC permission for apgouser
) will return a status code of403
instead of401
- The pgorole permissions now support the
"*"
permission to specify all pgorole RBAC permissions are granted to a pgouser. Users upgrading from an earlier version should note this change if they want to assign their users to access new features.
Additional Features
pgo (Operator CLI)
Support the pgBackRest options for backup retention, including
--repo1-retention-full
,--repo1-retention-diff
,--repo1-retention-archive
,--repo1-retention-archive-type
, which can be added in the--backup-opts
flag in thepgo backup
command. For example:# create a pgBackRest incremental backup with one full backup being retained and two differential backups being retained, along with incremental backups associated with each pgo backup mycluster --backup-type="pgbackrest" --backup-opts="--type=incr --repo1-retention-diff=2 --repo1-retention-full=1" # create a pgBackRest full backup where 2 other full backups are retained, with WAL archive retained for full and differential backups pgo backup mycluster --backup-opts="--type=full --repo1-retention-full=2 --repo1-retention-archive=4 --repo1-retention-archive-type=diff"
Allow for users to define S3 credentials and other options for pgBackRest backups on a per-cluster basis, in addition to leveraging the globally provided templates. This introduces the following flags on the
pgo create cluster
command:--pgbackrest-s3-bucket
- specifics the AWS S3 bucket that should be utilized--pgbackrest-s3-endpoint
specifies the S3 endpoint that should be utilized--pgbackrest-s3-key
- specifies the AWS S3 key that should be utilized--pgbackrest-s3-key-secret
- specifies the AWS S3 key secret that should be utilized--pgbackrest-s3-region
- specifies the AWS S3 region that should be utilized
Add the
--disable-tls
flag to thepgo
command-line client, as to be compatible with the Operator API server that is deployed withDISABLE_TLS
enabled.Improved output for the
pgo scaledown --query
and andpgo failover --query
commands, including providing easy-to-understand results on replication lagContainerized
pgo
via thepgo-client
container. This can be installed from the Ansible installer using thepgo_client_container_install
flag, and it installs into the same namespace as the PostgreSQL Operator. You can connect to the container viakubectl exec
and execute pgo commands!
Builds
- Refactor the Dockerfiles to rely on a “base” definition for ease of management and to ensure consistent builds across all container images during a full
make
- Selecting which image builder to use is now argument based using the
IMGBUILDER
environmental variable. Default isbuildah
- Optimize
yum clean
invocation to occur on same line as theRUN
, which leads to smaller image builds.
Installation
- Add the
pgo_noauth_routes
(Ansible) /NOAUTH_ROUTES
(Bash) configuration variables to disable TLS/BasicAuth authentication on particular API routes. This defaults to'/health'
- Add the
pgo_tls_ca_store
Ansible /TLS_CA_TRUST
(Bash) configuration variables to specify a PEM encoded list of trusted certificate authorities (CA) for the Operator to use when authenticating API requests over TLS - Add the
pgo_add_os_ca_store
/ADD_OS_TRUSTSTORE
(Bash) to specify to use the trusted CA that is provided by the operating system. Defaults totrue
Configuration
- Enable individual ConfigMap files to be customized without having to upload every single ConfigMap file available in
pgo-config
. Patch by Conor Quin (@Conor-Quin) - Add
EXCLUDE_OS_TRUST
environmental variable that allows thepgo
client to specify that it does not want to use the trusted certificate authorities (CA) provided by the operating system.
Miscellaneous
- Migrated Kubernetes API groups using API version
extensions/v1beta1
to their respective updated API groups/versions. This improves compatibility with Kubernetes 1.16 and beyond. Original patch by Lucas Bickel (@hairmare) - Add a Kubernetes Service Account to every Pod that is managed by the PostgreSQL Operator
- Add support for private repositories using
imagePullSecret
. This can be configured during installation by setting thepgo_image_pull_secret
andpgo_image_pull_secret_manifest
in the inventory file using Ansible installer, or with thePGO_IMAGE_PULL_SECRET
andPGO_IMAGE_PULL_SECRET_MANIFEST
environmental variables using the Bash installer. The “pull secret” is the name of the pull secret, whereas the manifest is what is used to define the secret - The pgorole permissions now support the
"*"
permission to specify all pgorole RBAC permissions are granted to a pgouser - Policies that are executed using
pgo apply
andpgo create cluster --policies
are now executed over a UNIX socket directly on the Pod of the primary PostgreSQL instance. Reported by @yuanlinios - A new sidecar,
crunchyadm
, is available for running managemnet commands on a PostgreSQL cluster. As this is experimental, this feature is disabled by default.
Fixes
- Update the YAML library to v2.2.4 to mitigate issues presented in CVE-2019-11253
- Specify the
pgbackrest
user by its ID number (2000) in the backrest-repo container image so that containers instantiated with therunAsNonRoot
option enabled recognize thepgbackrest
user as non-root. - Ensure any Kubernetes Secret associated with a PostgreSQL backup is deleted when the
--delete-backups
flag is specified onpgo delete cluster
- The pgBouncer pod can now support connecting to databases that are added after a PostgreSQL cluster is deployed
- Remove misleading error messages from the logs that were caused by the readiness/liveness probes on the
apiserver
andevent
containers in thepostgres-operator
pod - Several fixes to the cleanup of a PostgreSQL cluster after a deletion event (e.g.
pgo delete cluster
) to ensure data is safely removed. This includes ensuring schedules managed bypgo schedule
are removed, as well as PostgreSQL cluster and backup data - Skip the HTTP Basic Authorization check if the
BasicAuth
parameter inpgo.yaml
is set tofalse
- Ensure all available backup types are displayed in the
pgo schedule
are listed (full, incr, diff) - Ensure schedule tasks create with
pgo create schedule
are deleted whenpgo delete cluster
is called - Fix the missing readiness/liveness probes used to check the status of the
apiserver
andevent
containers in thepostgres-operator
pod - Remove misleading error messages from the logs that were caused by the readiness/liveness probes on the
apiserver
andevent
containers in thepostgres-operator
pod - Fix a race condition where the
pgo-rmdata
job could fail when doing its final pass on deleting PVCs. This became noticeable after adding in the task to clean up any configmaps that a PostgreSQL cluster relied on - Improved logging around authorization failures in the apiserver