Optional Backups

Info

FEATURE AVAILABILITY: Available in v5.7.0 and above

Because Crunchy Postgres for Kubernetes (CPK) was originally designed for production use, disaster recovery was built-in from day one. This was achieved largely through required backups.

However, there are use-cases where you may not want backups. For instance, you might want to start up a temporary PostgresCluster for testing purposes and not want to dedicate resources to backups.

For this use-case and others, CPK v5.7+ allows backups to be turned on or off for each PostgresCluster.

Running without backups: a few considerations

Running a PostgresCluster without backups means some features are no longer available.

First, and most importantly: without backups, there is no practical recovery mechanism. If you run a cluster with backups and accidentally drop an important table, you can restore an older backup and recover that table. If you don't have backups, you don't have that recovery option. For this reason, we really do not recommend running a cluster without backups outside of a few use-cases (temporary test clusters, etc.).

Second, for replicas, a PostgresCluster without backups will use pg_basebackup to initially create the replica and stream additional changes from the primary. Because of this, when starting a replica, it may speed up the process to run checkpoint on the primary first.

Third, you cannot clone a cluster with no backups, since cloning relies on backups. But you can still delete a cluster and retain the pgdata volume and re-use that volume as described in our Data Migration guide.

Fourth, when setting up a standby cluster, you cannot use any repo-based streaming, but you can stream from the primary as described in our streaming tutorial.

Fifth, when monitoring a PostgresCluster without backups, the pgbackrest-related metrics will be blank, as expected.

Optional Backups: a user guide

Starting a PostgresCluster without backups

With CPK v5.7+, nothing has changed about starting a cluster with backups: you need to have a defined spec.backups section in your cluster spec.

In order to start a cluster without backups, you can simply remove the spec.backups section.

The spec.backups section used to be required, and if you are running CPK v5.6 or older, you will get an error from the Kubernetes API saying that the spec is invalid.

However, if you are running CPK v5.7+, a PostgresCluster without a spec.backups field is valid, and will result in a PostgresCluster being created without backups.

Turning on backups

In order to turn on backups when a cluster doesn't have them, you simply need to fill in the spec.backups section with your requirements.

To learn more about backup options, see our tutorial on configuring backups for your Postgres cluster.

Once the spec.backups section is filled in, CPK will start reconciling the required Kubernetes objects for regular backups.

Turning off backups

Starting a cluster without backups only requires that you remove or leave blank the spec.backups section. But turning off backups requires an additional annotation be added to the PostgresCluster.

Why? Because turning off backups means removing that backup data; and acts that remove data require additional confirmation.

In this case, to confirm that you want your backups removed, add this annotation to your cluster:

postgres-operator.crunchydata.com/authorizeBackupRemoval="true"

A sample command to add this annotation is

kubectl annotate postgrescluster \<CLUSTER_NAME\> postgres-operator.crunchydata.com/authorizeBackupRemoval="true"

Adding that annotation to your cluster will remove the backups and all associated Kubernetes objects: the PersistentVolume that held the data, the StatefulSet that represented the repo-host, the RBAC Kubernetes objects that allowed the expected access, etc.

Note: CPK will only remove Kubernetes-local data. If you are using cloud-based backups for a PostgresCluster and you turn off backups for that cluster, CPK will stop backing up to the cloud--but we do not remove cloud-based backups. You are responsible for cleaning the, e.g., S3 buckets in that case if you want to remove them.

If you remove the spec.backups section from a cluster that previously had backups BUT have not yet added the annotation, CPK will pause reconciling that cluster. You can check for this in the cluster status, which will have a message saying that CPK has paused progess on that cluster because the annotation is missing. At this point, you can either add the annotation to remove backups or re-add the spec.backups section.

Note: After the backups are removed, it is a best practice to remove the annotation. That way, if you turn on and then off backups at a later date, you will have the opportunity to confirm that you want the backups removed.

How we achieve this

In order to make backups optional, we made two changes to the operator and the PostgresCluster CRD:

  1. We made the spec.backups section optional in the CRD.
  2. CPK now manages the archive_command depending on whether the spec.backups section is present.

By making spec.backups optional, CPK can now add or remove the Kubernetes objects related to backups, just like CPK does with monitoring or other features. (That said, see Turning off backups above for the case where CPK requires additional confirmation to reconcile and remove Kubernetes objects.)

If spec.backups is present, CPK sets the archive_command to the usual pgbackrest command that we use to archive backups. But if spec.backups is not present, CPK sets the archive_command to a command that automatically returns true. Since Postgres will attempt to archive the backup as usual and then drop the backup if it receives a true command, this means that Postgres will drop those backups as soon as they are archived.

We made the decision to change archiving behavior through setting the archive_command since this setting can be changed without restarting the Postgres process. For more on archive_command, see the Postgres docs.