4.2.0

Crunchy Data announces the release of the PostgreSQL Operator 4.2.0 on December, 31, 2019.

The focus of the 4.2.0 release of the PostgreSQL Operator was on the resiliency and uptime of the PostgreSQL clusters that the PostgreSQL Operator manages, with an emphasis on high-availability and removing the Operator from being a single-point-of-failure in the HA process. This release introduces support for a distributed-consensus based high-availability approach using Kubernetes distributed consensus store as the backing, which, in other words, allows for the PostgreSQL clusters to manage their own availability and not the PostgreSQL Operator. This is accomplished by leveraging the open source high-availability framework Patroni as well as the open source, high-performant PostgreSQL disaster recovery management tool pgBackRest.

To accomplish this, we have introduced a new container called crunchy-postgres-ha (and for geospatial workloads, crunchy-postgres-gis-ha). If you are upgrading from an older version of the PostgreSQL Operator, you will need to modify your installation to use these containers.

Included in the PostgreSQL Operator 4.2.0 introduces the following new features:

  • An improved PostgreSQL HA (high-availability) solution using distributed consensus that is backed by Kubernetes. This includes:
    • Elimination of the PostgreSQL Operator as the arbiter that decides when a cluster should fail over
    • Support for Pod anti-affinity, which indicates to Kubernetes schedule pods (e.g. PostgreSQL instances) on separate nodes
    • Failed primaries now automatically heal, which significantly reduces the time in which they can rejoin the cluster.
    • Introduction of synchronous replication for workloads that are sensitive to transaction loss (with a tradeoff of performance and potentially availability)
  • Standardization of physical backups and restores on pgBackRest, with native support for pg_basebackup removed.
  • Introduction of the ability to clone PostgreSQL clusters using the pgo clone command. This feature copies the pgBackRest repository from a cluster and creates a new, single instance primary as its own cluster.
  • Allow one to use their own certificate authority (CA) when interfacing with the Operator API, and to specify usage of the CA from the pgo command-line interface (CLI)

The container building process has been optimized, with build speed ups reported to be 70% faster.

The Postgres Operator 4.2.0 release also includes the following software versions upgrades:

  • The PostgreSQL containers now use versions 12.1, 11.6, 10.11, 9.6.16, and 9.5.20.
  • pgBackRest is upgraded to use version 2.20
  • pgBouncer is upgraded to use version 1.12
  • Patroni uses version 1.6.3

PostgreSQL Operator is tested with Kubernetes 1.13 - 1.15, OpenShift 3.11+, Google Kubernetes Engine (GKE), and VMware Enterprise PKS 1.3+. We have taken steps to ensure the PostgreSQL Operator is compatible with Kubernetes 1.16+, but did not test thoroughly on it for this release. Cursory testing indicates that the PostgreSQL Operator is compatible with Kubernetes 1.16 and beyond, but we advise that you run your own tests.

Major Features

High-Availability & Disaster Recovery

PostgreSQL Operator 4.2.0 makes significant enhancements to the high-availability and disaster recovery capabilities of the PostgreSQL Operator by moving to a distributed-consensus based model for maintaining availability, standardizing around pgBackRest for backups and restores, and removing the Operator itself as a single-point-of-failure in relation to PostgreSQL cluster resiliency.

As the high-availability environment introduced by PostgreSQL Operator 4.2.0 is now the default, setting up a HA cluster is as easy as:

pgo create cluster hacluster
pgo scale hacluster --replica-count=2

If you wish to disable high-availability for a cluster, you can use the following command:

pgo create cluster boringcluster --disable-autofail

New Required HA PostgreSQL Containers: crunchy-postgres-ha and crunchy-postgres-gis-ha

Using the PostgreSQL Operator 4.2.0 requires replacing your crunchy-postgres and crunchy-postgres-gis containers with the crunchy-postgres-ha and crunchy-postgres-gis-ha containres respectively. The underlying PostgreSQL installations in the container remain the same but are now optimized for Kubernetes environments to provide the new high-availability functionality.

A major change to this container is that the PostgreSQL process is now managed by Patroni. This allows a PostgreSQL cluster that is deployed by the PostgreSQL Operator to manage its own uptime and availability, to elect a new leader in the event of a downtime scenario, and to automatically heal after a failover event.

Upgrading to these new containers is as simple as modifying your CRD ccpimage parameter to use crunchy-postgres-ha to use the HA enabled containers. Please see our upgrade instructions to select your preferred upgrade strategy.

pgBackRest Standardization

pgBackRest is now the only backup and restore method supported by the PostgreSQL Operator. This has allowed for the following features:

  • Faster creation of new replicas when a scale up request is made
  • Automatic healing of PostgreSQL instances after a failover event, leveraging the pgBackRest delta restore feature. This allows for a significantly shorter healing process
  • The ability to clone PostgreSQL clusters

As part of this standardization, one change to note is that after a PostgreSQL cluster is created, the PostgreSQL Operator will schedule a full backup of the cluster. This is to ensure that a new replica can be created from a pgBackRest backup. If this initial backup fails, no new replicas will be provisioned.

When upgrading from an earlier version, please ensure that you have at least one pgBackRest full backup in your backup repository.

Pod Anti-Affinity

PostgreSQL Operator 4.2.0 adds support for Kubernetes pod anti-affinity, which provides guidance on how Kubernetes should schedule pods relative to each other. This is helpful in high-availability architectures to ensure that PostgreSQL pods are spread out in order to withstand node failures. For example, in a setup with two PostgreSQL instances, you would not want both instances scheduled to the same node: if that node goes down or becomes unreachable, then your cluster will be unavailable!

The way the PostgreSQL Operator uses pod anti-affinity is that it tries to ensure that none of the managed pods within the same cluster are scheduled to the same node. These include:

  • Any PostgreSQL instances
  • The pod that manages pgBackRest repository
  • If deployed, any pgBouncer pods

This helps improve the likelihood that a cluster can remain up even if a downtime event occurs.

There are three options available for pod anti-affinity:

  • preferred: Kubernetes will try to schedule any pods within a PostgreSQL cluster to different nodes, but in the event it must schedule two pods on the same node, it will. This is the default option
  • required: Kubernetes will schedule pods within a PostgreSQL cluster to different nodes, but in the event it cannot schedule a pod to a different node, it will not schedule the pod until a different node is available. While this guarantees that no pod will share the same node, it can also lead to downtime events as well.
  • disabled: Pod anti-affinity is not used.

These options can be combined with the existing node affinity functionality (--node-label) to group the scheduling of pods to particular node labels!

Synchronous Replication

PostgreSQL Operator 4.2 introduces support for synchronous replication by leveraging the “synchronous mode” functionality provided by Patroni. Synchronous replication is useful for workloads that are sensitive to losing transactions, as PostgreSQL will not consider a transaction to be committed until it is committed to all synchronous replicas connected to a primary. This provides a higher guarantee of data consistency and, when a healthy synchronous replica is present, a guarantee of the most up-to-date data during a failover event.

This comes at a cost of performance as PostgreSQL: as PostgreSQL has to wait for a transaction to be committed on all synchronous replicas, a connected client will have to wait longer than if the transaction only had to be committed on the primary (which is how asynchronous replication works). Additionally, there is a potential impact to availability: if a synchronous replica crashes, any writes to the primary will be blocked until a replica is promoted to become a new synchronous replica of the primary.

You can enable synchronous replication by using the --sync-replication flag with the pgo create command.

Updated pgo CLI Flags

  • pgo create now has a CLI flag for pod anti-affinity called --pod-anti-affinity, which accepts the values required, preferred, and disabled
  • pgo create --sync-replication specifies to create a PostgreSQL HA cluster with synchronous replication

Global Configuration

To support high-availability there are some new settings that you can manage from your pgo.yaml file:

  • DisableAutofail - when set to true, this will disable the new HA functionality in any newly created PostgreSQL clusters. By default, this is false.
  • DisableReplicaStartFailReinit - when set to true, this will disable attempting to re-provision a PostgreSQL replica when it is stuck in a “start failed” state. By default, this false.
  • PodAntiAffinity - Determines the type of pod anti-affinity rules to apply to the pods within a newly PostgreSQL cluster. If set to required, pods within a PostgreSQL cluster must be scheduled on different nodes, otherwise a pod will fail to schedule. If set to preferred, Kubernetes will make a best effort to schedule pods of the same PostgreSQL cluster on different nodes. If set to disabled, this feature is disabled. By default, this is preferred.
  • SyncReplication - If set to true, enables synchronous replication in newly created PostgreSQL clusters. Default to false.

pgo clone

PostgreSQL Operator 4.2.0 introduces the ability to clone the data from one PostgreSQL cluster into a brand new PostgreSQL cluster. The command to do so is simple:

pgo clone oldcluster newcluster

After the command is executed, the PostgreSQL Operator checks to see if a) the oldcluster exists and b) the newcluster does not exist. If both of these conditions hold, the PostgreSQL Operator creates two new PVCs the match the specs of the oldcluster PostgreSQL data PVC (PrimaryStorage) and its pgBackRest repository PVC (BackrestStorage).

If these PVCs are successfully created, the PostgreSQL Operator will copy the contents of the pgBackRest repository from the oldcluster to the one setup for the newcluster by means of a Kubernetes Job that is running rsync provided by the pgo-backrest-repo-sync container. We are able to do this because all changes to the pgBackRest repository are atomic.

If this successfully completes, the PostgreSQL Operator then runs a pgBackRest restore job to restore the PostgreSQL cluster. On a successful restore, the new PostgreSQL cluster is then scheduled and runs in recovery mode until it reaches a consistent state, and then comes online as a brand new cluster

To optimize the time it takes to restore for a clone, we recommend taking a backup of the cluster you want to clone. You can do this with the pgo backup command, and choose if you want to take a full, differential, or incremental backup.

Future work will be focused on additional options, such as being able to clone a PostgreSQL cluster to a particular point-in-time (so long as the backup is available to support it) and supporting other pgo create flags.

Schedule Backups With Retention Policies

While the PostgreSQL Operator has had the ability to schedule full, incremental, and differential pgBackRest backups for awhile, it has not been possible to set the retention policy on these backups. Backup retention policies allow users to manage their backup storage whle maintaining enough backups to be able to recover to a specific point-in-time, or perform forensic work on data in a particular state.

For example, one can schedule a full backup to occur nightly at midnight and keep up to 21 full backups (e.g. a 21 day retention policy):

pgo create schedule mycluster --schedule="0 0 * * *" --schedule-type="pgbackrest" --pgbackrest-backup-type=full --schedule-opts="--repo1-retention-full=21"

Breaking Changes

Feature Removals

  • Physical backups using pg_basebackup are no longer supported. Any command-line option that references using this method has been removed. The API endpoints where one can specify a pg_basebackup remain, but will be removed in a future release (likely the next one).
  • Removed the pgo-lspvc container. This container was used with the pgo show pvc and performed searches on the mounted filesystem. This would cause issues both on environments that could not support a PVC being mounted multiple times, and for underlying volumes that contained numerous files. Now, pgo show pvc only lists the PVCs associated with the PostgreSQL clusters managed by the PostgreSQL Operator.
  • Native support for pgpool has been removed.

Command Line (pgo)

pgo create cluster

  • The --pgbackrest option is removed as it is no longer needed. pgBackRest is enabled by default

pgo delete cluster

The default behavior for pgo delete cluster has changed so that all backups and PostgreSQL data are deleted by default.

To keep a backup after a cluster is deleted, one can use the --keep-backups flag with pgo delete cluster, and to keep the PostgreSQL data directory, one can specify the --keep-data flag. There is a plan to remove the --keep-data flag in a future release, though this has not been determined yet.

The -b, --delete-backups, -d, and --delete-data flags are all deprecated and will be removed in the next release.

pgo scaledown

With this release, pgo scaledown will delete the PostgreSQL data directory of the replica by default. To keep the PostgreSQL directory after the replica has scaled down, one can use the --keep-data flag.

pgo test

pgo test is optimized to provide faster results about the availability of a PostgreSQL cluster. Instead of attempting to make PostgreSQL connections to each PostgreSQL instance with each user, pgo test now checks the availability of the service endpoints for each PostgreSQL cluster as well as the output of the PostgreSQL readiness checks, which check the connectivity of a PostgreSQL cluster.

Both the API and the output of pgo test are modified for this optimization.

Additional apiserver Changes

  • An authorization failure in the apiserver (i.e. not having the correct RBAC permission for a pgouser) will return a status code of 403 instead of 401
  • The pgorole permissions now support the "*" permission to specify all pgorole RBAC permissions are granted to a pgouser. Users upgrading from an earlier version should note this change if they want to assign their users to access new features.

Additional Features

pgo (Operator CLI)

  • Support the pgBackRest options for backup retention, including --repo1-retention-full, --repo1-retention-diff, --repo1-retention-archive, --repo1-retention-archive-type, which can be added in the --backup-opts flag in the pgo backup command. For example:

    # create a pgBackRest incremental backup with one full backup being retained and two differential backups being retained, along with incremental backups associated with each
    pgo backup mycluster --backup-type="pgbackrest" --backup-opts="--type=incr --repo1-retention-diff=2 --repo1-retention-full=1"
    
    # create a pgBackRest full backup where 2 other full backups are retained, with WAL archive retained for full and differential backups
    pgo backup mycluster --backup-opts="--type=full --repo1-retention-full=2 --repo1-retention-archive=4 --repo1-retention-archive-type=diff"
    
  • Allow for users to define S3 credentials and other options for pgBackRest backups on a per-cluster basis, in addition to leveraging the globally provided templates. This introduces the following flags on the pgo create cluster command:

    • --pgbackrest-s3-bucket - specifics the AWS S3 bucket that should be utilized
    • --pgbackrest-s3-endpoint specifies the S3 endpoint that should be utilized
    • --pgbackrest-s3-key - specifies the AWS S3 key that should be utilized
    • --pgbackrest-s3-key-secret- specifies the AWS S3 key secret that should be utilized
    • --pgbackrest-s3-region - specifies the AWS S3 region that should be utilized
  • Add the --disable-tls flag to the pgo command-line client, as to be compatible with the Operator API server that is deployed with DISABLE_TLS enabled.

  • Improved output for the pgo scaledown --query and and pgo failover --query commands, including providing easy-to-understand results on replication lag

  • Containerized pgo via the pgo-client container. This can be installed from the Ansible installer using the pgo_client_container_install flag, and it installs into the same namespace as the PostgreSQL Operator. You can connect to the container via kubectl exec and execute pgo commands!

Builds

  • Refactor the Dockerfiles to rely on a “base” definition for ease of management and to ensure consistent builds across all container images during a full make
  • Selecting which image builder to use is now argument based using the IMGBUILDER environmental variable. Default is buildah
  • Optimize yum clean invocation to occur on same line as the RUN, which leads to smaller image builds.

Installation

  • Add the pgo_noauth_routes (Ansible) / NOAUTH_ROUTES (Bash) configuration variables to disable TLS/BasicAuth authentication on particular API routes. This defaults to '/health'
  • Add the pgo_tls_ca_store Ansible / TLS_CA_TRUST (Bash) configuration variables to specify a PEM encoded list of trusted certificate authorities (CA) for the Operator to use when authenticating API requests over TLS
  • Add the pgo_add_os_ca_store / ADD_OS_TRUSTSTORE (Bash) to specify to use the trusted CA that is provided by the operating system. Defaults to true

Configuration

  • Enable individual ConfigMap files to be customized without having to upload every single ConfigMap file available in pgo-config. Patch by Conor Quin (@Conor-Quin)
  • Add EXCLUDE_OS_TRUST environmental variable that allows the pgo client to specify that it does not want to use the trusted certificate authorities (CA) provided by the operating system.

Miscellaneous

  • Migrated Kubernetes API groups using API version extensions/v1beta1 to their respective updated API groups/versions. This improves compatibility with Kubernetes 1.16 and beyond. Original patch by Lucas Bickel (@hairmare)
  • Add a Kubernetes Service Account to every Pod that is managed by the PostgreSQL Operator
  • Add support for private repositories using imagePullSecret. This can be configured during installation by setting the pgo_image_pull_secret and pgo_image_pull_secret_manifest in the inventory file using Ansible installer, or with the PGO_IMAGE_PULL_SECRET and PGO_IMAGE_PULL_SECRET_MANIFEST environmental variables using the Bash installer. The “pull secret” is the name of the pull secret, whereas the manifest is what is used to define the secret
  • The pgorole permissions now support the "*" permission to specify all pgorole RBAC permissions are granted to a pgouser
  • Policies that are executed using pgo apply and pgo create cluster --policies are now executed over a UNIX socket directly on the Pod of the primary PostgreSQL instance. Reported by @yuanlinios
  • A new sidecar, crunchyadm, is available for running managemnet commands on a PostgreSQL cluster. As this is experimental, this feature is disabled by default.

Fixes

  • Update the YAML library to v2.2.4 to mitigate issues presented in CVE-2019-11253
  • Specify the pgbackrest user by its ID number (2000) in the backrest-repo container image so that containers instantiated with the runAsNonRoot option enabled recognize the pgbackrest user as non-root.
  • Ensure any Kubernetes Secret associated with a PostgreSQL backup is deleted when the --delete-backups flag is specified on pgo delete cluster
  • The pgBouncer pod can now support connecting to databases that are added after a PostgreSQL cluster is deployed
  • Remove misleading error messages from the logs that were caused by the readiness/liveness probes on the apiserver and event containers in the postgres-operator pod
  • Several fixes to the cleanup of a PostgreSQL cluster after a deletion event (e.g. pgo delete cluster) to ensure data is safely removed. This includes ensuring schedules managed by pgo schedule are removed, as well as PostgreSQL cluster and backup data
  • Skip the HTTP Basic Authorization check if the BasicAuth parameter in pgo.yaml is set to false
  • Ensure all available backup types are displayed in the pgo schedule are listed (full, incr, diff)
  • Ensure schedule tasks create with pgo create schedule are deleted when pgo delete cluster is called
  • Fix the missing readiness/liveness probes used to check the status of the apiserver and event containers in the postgres-operator pod
  • Remove misleading error messages from the logs that were caused by the readiness/liveness probes on the apiserver and event containers in the postgres-operator pod
  • Fix a race condition where the pgo-rmdata job could fail when doing its final pass on deleting PVCs. This became noticeable after adding in the task to clean up any configmaps that a PostgreSQL cluster relied on
  • Improved logging around authorization failures in the apiserver