Backup Configuration

An important part of a healthy Postgres cluster is maintaining backups. PGO optimizes its use of open source pgBackRest to be able to support terabyte size databases. What's more, PGO makes it convenient to perform many common and advanced actions that can occur during the lifecycle of a database, including:

  • Setting automatic backup schedules and retention policies
  • Backing data up to multiple locations
    • Support for backup storage in Kubernetes, AWS S3 (or S3-compatible systems like MinIO), Google Cloud Storage (GCS), and Azure Blob Storage
  • Taking one-off / ad hoc backups
  • Performing a "point-in-time-recovery"
  • Cloning data to a new instance

and more.

Let's explore the various disaster recovery features in PGO by first looking at how to set up backups.

Understanding Backup Configuration and Basic Operations

The backup configuration for a PGO managed Postgres cluster resides in the spec.backups.pgbackrest section of a custom resource. In addition to indicating which version of pgBackRest to use, this section allows you to configure the fundamental backup settings for your Postgres cluster, including:

  • spec.backups.pgbackrest.image - image to use for pgBackRest containers. Keep in mind the pgBackRest version used needs to be compatible with operator and Postgres images according to the compatibility matrix.
  • spec.backups.pgbackrest.configuration - additional configuration and references to Secrets that are needed for configuration of your backups. For example, this may reference a Secret that contains your S3 credentials.
  • spec.backups.pgbackrest.global - global pgBackRest configuration. An example of this may be setting the global pgBackRest logging level (e.g. log-level-console: info), or providing configuration to optimize performance.
  • spec.backups.pgbackrest.repos - information on each specific pgBackRest backup repository. This allows you to configure where and how your backups and WAL archive are stored. You can keep backups in up to four (4) different locations!

You can configure the repos section based on the backup storage system you are looking to use. There are four storage types supported in spec.backups.pgbackrest.repos:

Storage TypeDescription
azureFor use with Azure Blob Storage.
gcsFor use with Google Cloud Storage (GCS).
s3For use with Amazon S3 or any S3 compatible storage system such as MinIO.
volumeFor use with a Kubernetes Persistent Volume.

spec.backups.pgbackrest.repos.name - requires a name, and that name must follow pgBackRest's convention of assigning configuration to a specific repository using a repoN format, e.g. repo1, repo2, etc. You can customize your configuration based upon the name that you assign in the spec. Please see Set up Multiple Backup Repositories.

By default, backups are stored in a directory that follows the pattern pgbackrest/repoN where N is the number of the repo. This typically does not present issues when storing your backup information in a Kubernetes volume, but it can present complications if you are storing all of your backups in the same backup in a blob storage system like S3/GCS/Azure. You can avoid conflicts by setting the repoN-path variable in spec.backups.pgbackrest.global. The convention we recommend for setting this variable is /pgbackrest/$NAMESPACE/$CLUSTER_NAME/repoN. For example, if I have a cluster named hippo in the namespace postgres-operator, I would set the following:

spec:
  backups:
    pgbackrest:
      global:
        repo1-path: /pgbackrest/postgres-operator/hippo/repo1

As mentioned earlier, you can store backups in up to four different repositories. You can also mix and match, e.g. you could store your backups in two different S3 repositories. Each storage type does have its own required attributes that you need to set. We will cover that later in this section.

Now that we've covered the basics, let's learn how to set up our backup repositories.

Setting Up a Backup Repository

As mentioned above, PGO, the Postgres Operator from Crunchy Data, supports multiple ways to store backups. Regardless of which way you choose to store your backups, PGO will create a repo host Pod that functions as a command execution server for your pgBackRest backups. This Pod will be the primary location for running pgBackRest commands and will be configured to work with all Postgres Instances. It will also be the main storage location of your pgBackRest logs, assuming at least one Kubernetes storage volume repo is defined.

With all that in mind, let's look into each method and see how you can ensure your backups and archives are being safely stored.

Using Kubernetes Volumes

The simplest way to get started storing backups is to use a Kubernetes Volume. This was already configured as part of the create a Postgres cluster example. Let's take a closer look at some of that configuration:

- name: repo1
  volume:
    volumeClaimSpec:
      accessModes:
      - "ReadWriteOnce"
      resources:
        requests:
          storage: 1Gi

The one requirement of volume is that you need to fill out the volumeClaimSpec attribute. This attribute uses the same format as a persistent volume claim spec. In fact, we performed a similar set up when we created a Postgres cluster.

In the above example, we assume that the Kubernetes cluster is using a default storage class. If your cluster does not have a default storage class, or you wish to use a different storage class, you will have to set spec.backups.pgbackrest.repos.volume.volumeClaimSpec.storageClassName.

Using S3

Setting up backups in S3 requires a few additional modifications to your custom resource spec and either

  • the use of a Secret to protect your S3 credentials, or
  • setting up identity providers in AWS to allow pgBackRest to assume a role with permissions.

Using S3 Credentials

There is an example for creating a Postgres cluster that uses S3 for backups in the kustomize/s3 directory in the Postgres Operator examples repository. In this directory, there is a file called s3.conf.example. Copy this example file to s3.conf:

cp s3.conf.example s3.conf

Note that s3.conf is protected from commit by a .gitignore.

Open up s3.conf, you will see something similar to:

repo1-s3-key=$YOUR_AWS_S3_KEY
repo1-s3-key-secret=$YOUR_AWS_S3_KEY_SECRET

Replace the values with your AWS S3 credentials and save.

Now, open up kustomize/s3/postgres.yaml. In the s3 section, you will see something similar to:

s3:
  bucket: "$YOUR_AWS_S3_BUCKET_NAME"
  endpoint: "$YOUR_AWS_S3_ENDPOINT"
  region: "$YOUR_AWS_S3_REGION"

Again, replace these values with the values that match your S3 configuration. For endpoint, only use the domain and, if necessary, the port (e.g. s3.us-east-2.amazonaws.com).

Note that region is required by S3, as does pgBackRest. If you are using a storage system with a S3 compatibility layer that does not require region, you can fill in region with a random value.

If you are using MinIO, you may need to set the URI style to use path mode. You can do this from the global settings, e.g. for repo1:

spec:
  backups:
    pgbackrest:
      global:
        repo1-s3-uri-style: path

When your configuration is saved, you can deploy your cluster:

kubectl apply -k kustomize/s3

Watch your cluster: you will see that your backups and archives are now being stored in S3!

Using an AWS-integrated identity provider and role

If you deploy PostgresClusters to AWS Elastic Kubernetes Service, you can take advantage of their IAM role integration. When you attach a certain annotation to your PostgresCluster spec, AWS will automatically mount an AWS token and other needed environment variables. These environment variables will then be used by pgBackRest to assume the identity of a role that has permissions to upload to an S3 repository.

This method requires additional setup in AWS IAM. Use the procedure in the linked documentation for the first two steps described below:

  1. Create an OIDC provider for your EKS cluster.
  2. Create an IAM policy for bucket access and an IAM role with a trust relationship with the OIDC provider in step 1.

The third step is to associate that IAM role with a ServiceAccount, but there's no need to do that manually, as PGO does that for you. First, make a note of the IAM role's ARN.

You can then make the following changes to the files in the kustomize/s3 directory in the Postgres Operator examples repository:

1. Add the s3 section to the spec in kustomize/s3/postgres.yaml as discussed in the Using S3 Credentials section above. In addition to that, add the required eks.amazonaws.com/role-arn annotation to the PostgresCluster spec using the IAM ARN that you noted above.

For instance, given an IAM role with the ARN arn:aws:iam::123456768901:role/allow_bucket_access, you would add the following to the PostgresCluster spec:

spec:
  metadata:
    annotations:
      eks.amazonaws.com/role-arn: "arn:aws:iam::123456768901:role/allow_bucket_access"

That annotations field will get propagated to the ServiceAccounts that require it automatically.

2. Copy the s3.conf.example file to s3.conf:

cp s3.conf.example s3.conf

Update that kustomize/s3/s3.conf file so that it looks like this:

repo1-s3-key-type=web-id

That repo1-s3-key-type=web-id line will tell pgBackRest to use the IAM integration.

With those changes saved, you can deploy your cluster:

kubectl apply -k kustomize/s3

And watch as it spins up and backs up to S3 using pgBackRest's IAM integration.

Using Google Cloud Storage (GCS)

Similar to S3, setting up backups in Google Cloud Storage (GCS) requires a few additional modifications to your custom resource spec and the use of a Secret to protect your GCS credentials.

There is an example for creating a Postgres cluster that uses GCS for backups in the kustomize/gcs directory in the Postgres Operator examples repository. In order to configure this example to use GCS for backups, you will need do two things.

First, copy your GCS key secret (which is a JSON file) into kustomize/gcs/gcs-key.json. Note that a .gitignore directive prevents you from committing this file.

Next, open the postgres.yaml file and edit spec.backups.pgbackrest.repos.gcs.bucket to the name of the GCS bucket that you want to back up to.

Save this file, and then run:

kubectl apply -k kustomize/gcs

Watch your cluster: you will see that your backups and archives are now being stored in GCS!

Using Azure Blob Storage

Similar to the above, setting up backups in Azure Blob Storage requires a few additional modifications to your custom resource spec and the use of a Secret to protect your Azure Storage credentials.

There is an example for creating a Postgres cluster that uses Azure for backups in the kustomize/azure directory in the Postgres Operator examples repository. In this directory, there is a file called azure.conf.example. Copy this example file to azure.conf:

cp azure.conf.example azure.conf

Note that azure.conf is protected from commit by a .gitignore.

Open up azure.conf, you will see something similar to:

repo1-azure-account=$YOUR_AZURE_ACCOUNT
repo1-azure-key=$YOUR_AZURE_KEY

Replace the values with your Azure credentials and save.

Now, open up kustomize/azure/postgres.yaml. In the azure section, you will see something similar to:

azure:
  container: "$YOUR_AZURE_CONTAINER"

Again, replace these values with the values that match your Azure configuration.

When your configuration is saved, you can deploy your cluster:

kubectl apply -k kustomize/azure

Watch your cluster: you will see that your backups and archives are now being stored in Azure!

Set Up Multiple Backup Repositories

It is possible to store backups in multiple locations. For example, you may want to keep your backups both within your Kubernetes cluster and S3. There are many reasons for doing this:

  • It is typically faster to heal Postgres instances when your backups are closer
  • You can set different backup retention policies based upon your available storage
  • You want to ensure that your backups are distributed geographically

and more.

PGO lets you store your backups in up to four locations simultaneously. You can mix and match: for example, you can store backups both locally and in GCS, or store your backups in two different GCS repositories. Note that regardless of how many repo Volumes are defined, only one repo host Pod will be created.

The multi-backup-repo example in the Postgres Operator examples repository sets up backups in four different locations using each storage type. You can modify this example to match your desired backup topology.

Additional Notes

While storing Postgres archives (write-ahead log [WAL] files) occurs in parallel when saving data to multiple pgBackRest repos, you cannot take parallel backups to different repos at the same time. PGO will ensure that all backups are taken serially. Future work in pgBackRest will address parallel backups to different repos. Please don't confuse this with parallel backup: pgBackRest does allow for backups to use parallel processes when storing them to a single repo!

Encryption

You can encrypt your backups using AES-256 encryption using the CBC mode. This can be used independent of any encryption that may be supported by an external backup system.

To encrypt your backups, you need to set the cipher type and provide a passphrase. The passphrase should be long and random (e.g. the pgBackRest documentation recommends openssl rand -base64 48). The passphrase should be kept in a Secret.

Let's use our hippo cluster as an example. Let's create a new directory. First, create a file called pgbackrest-secrets.conf in this directory. It should look something like this:

repo1-cipher-pass=your-super-secure-encryption-key-passphrase

This contains the passphrase used to encrypt your data.

Next, create a kustomization.yaml file that looks like this:

namespace: postgres-operator

secretGenerator: - name: hippo-pgbackrest-secrets
  files:
  - pgbackrest-secrets.conf

generatorOptions:   disableNameSuffixHash: true

resources: - postgres.yaml

Finally, create the manifest for the Postgres cluster in a file named postgres.yaml that is similar to the following:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo
spec:
  postgresVersion: 16
  instances:
    - dataVolumeClaimSpec:
        accessModes:
          - 'ReadWriteOnce'
        resources:
          requests:
            storage: 1Gi
  backups:
    pgbackrest:
      configuration:
        - secret:
            name: hippo-pgbackrest-secrets
      global:
        repo1-cipher-type: aes-256-cbc
      repos:
        - name: repo1
          volume:
            volumeClaimSpec:
              accessModes:
                - 'ReadWriteOnce'
              resources:
                requests:
                  storage: 1Gi

Notice the reference to the Secret that contains the encryption key:

spec:
  backups:
    pgbackrest:
      configuration:
        - secret:
            name: hippo-pgbackrest-secrets

as well as the configuration for enabling AES-256 encryption using the CBC mode:

spec:
  backups:
    pgbackrest:
      global:
        repo1-cipher-type: aes-256-cbc

You can now create a Postgres cluster that has encrypted backups!

Limitations

Currently the encryption settings cannot be changed on backups after they are established.

Custom Backup Configuration

Most of your backup configuration can be configured through the spec.backups.pgbackrest.global attribute, or through information that you supply in the ConfigMap or Secret that you refer to in spec.backups.pgbackrest.configuration. You can also provide additional Secret values if need be, e.g. repo1-cipher-pass for encrypting backups.

The full list of pgBackRest configuration options is available at https://pgbackrest.org/configuration.html.

Warning

Some pgBackRest options require write access to paths with adequate storage capacity within your container. For example, if you enable archive-async, make sure you also add a proper spool-path.

Reducing Primary Instance Load with the Backup from Standby Option

Info

FEATURE AVAILABILITY: Available in v5.7.0 and above

You can now configure the pgBackRest Backup from Standby Option in order to reduce the load on the primary Postgres Instance Pod. The necessary settings can be configured as follows:

spec:
  instances:
    - name: instance1
      replicas: 2
...
  backups:
    pgbackrest:
      global:
        backup-standby: "y"

Warning

As shown above, the backup-standby option will require at least one Postgres Instance replica. If at least one replica is not accessible when taking a backup, it will fail with the following error, "ERROR: [056]: unable to find standby cluster - cannot proceed."

As described in the pgBackRest documentation, configuring the backup-standby option causes the vast majority of the backup files to be pulled from a replica Postgres Instance (that is, a "standby database") rather than all of them coming from the primary Postgres Instance (the "primary database"). Additionally, this pgBackRest backup Job will always execute on the repo host Pod. Taken together, this will greatly reduce the load on the primary Postgres Instance when performing a backup.

IPv6 Support

If you are running your cluster in an IPv6-only environment, you will need to add an annotation to your PostgresCluster so that PGO knows to set pgBackRest's tls-server-address to an IPv6 address. Otherwise, tls-server-address will be set to 0.0.0.0, making pgBackRest inaccessible, and backups will not run. The annotation should be added as shown below:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo
  annotations:
    postgres-operator.crunchydata.com/pgbackrest-ip-version: IPv6

Next Steps

We've now seen how to use PGO to get our backups and archives set up and safely stored. Now let's take a look at backup management and how we can do things such as set backup frequency, set retention policies, and even take one-off backups!