Backup Configuration
An important part of a healthy Postgres cluster is maintaining backups. PGO optimizes its use of open source pgBackRest to be able to support terabyte size databases. What's more, PGO makes it convenient to perform many common and advanced actions that can occur during the lifecycle of a database, including:
- Setting automatic backup schedules and retention policies
- Backing data up to multiple locations
- Support for backup storage in Kubernetes, AWS S3 (or S3-compatible systems like MinIO), Google Cloud Storage (GCS), and Azure Blob Storage
- Taking one-off / ad hoc backups
- Performing a "point-in-time-recovery"
- Cloning data to a new instance
and more.
Let's explore the various disaster recovery features in PGO by first looking at how to set up backups.
Understanding Backup Configuration and Basic Operations
The backup configuration for a PGO managed Postgres cluster resides in the spec.backups.pgbackrest
section of a custom resource. In addition to indicating which version of pgBackRest to use, this section allows you to configure the fundamental backup settings for your Postgres cluster, including:
spec.backups.pgbackrest.image
- image to use for pgBackRest containers. Keep in mind the pgBackRest version used needs to be compatible with operator and Postgres images according to the compatibility matrix.spec.backups.pgbackrest.configuration
- additional configuration and references to Secrets that are needed for configuration of your backups. For example, this may reference a Secret that contains your S3 credentials.spec.backups.pgbackrest.global
- global pgBackRest configuration. An example of this may be setting the global pgBackRest logging level (e.g.log-level-console: info
), or providing configuration to optimize performance.spec.backups.pgbackrest.repos
- information on each specific pgBackRest backup repository. This allows you to configure where and how your backups and WAL archive are stored. You can keep backups in up to four (4) different locations!
You can configure the repos
section based on the backup storage system you are looking to use. There are four storage types supported in spec.backups.pgbackrest.repos
:
Storage Type | Description |
---|---|
azure | For use with Azure Blob Storage. |
gcs | For use with Google Cloud Storage (GCS). |
s3 | For use with Amazon S3 or any S3 compatible storage system such as MinIO. |
volume | For use with a Kubernetes Persistent Volume. |
spec.backups.pgbackrest.repos.name
- requires a name, and that name must follow pgBackRest's convention of assigning configuration to a specific repository using a repoN
format, e.g. repo1
, repo2
, etc. You can customize your configuration based upon the name that you assign in the spec. Please see Set up Multiple Backup Repositories.
By default, backups are stored in a directory that follows the pattern pgbackrest/repoN
where N
is the number of the repo. This typically does not present issues when storing your backup information in a Kubernetes volume, but it can present complications if you are storing all of your backups in the same backup in a blob storage system like S3/GCS/Azure. You can avoid conflicts by setting the repoN-path
variable in spec.backups.pgbackrest.global
. The convention we recommend for setting this variable is /pgbackrest/$NAMESPACE/$CLUSTER_NAME/repoN
. For example, if I have a cluster named hippo
in the namespace postgres-operator
, I would set the following:
spec:
backups:
pgbackrest:
global:
repo1-path: /pgbackrest/postgres-operator/hippo/repo1
As mentioned earlier, you can store backups in up to four different repositories. You can also mix and match, e.g. you could store your backups in two different S3 repositories. Each storage type does have its own required attributes that you need to set. We will cover that later in this section.
Now that we've covered the basics, let's learn how to set up our backup repositories.
Setting Up a Backup Repository
As mentioned above, PGO, the Postgres Operator from Crunchy Data, supports multiple ways to store backups. Regardless of which way you choose to store your backups, PGO will create a repo host Pod that functions as a command execution server for your pgBackRest backups. This Pod will be the primary location for running pgBackRest commands and will be configured to work with all Postgres Instances. It will also be the main storage location of your pgBackRest logs, assuming at least one Kubernetes storage volume repo is defined.
With all that in mind, let's look into each method and see how you can ensure your backups and archives are being safely stored.
Using Kubernetes Volumes
The simplest way to get started storing backups is to use a Kubernetes Volume. This was already configured as part of the create a Postgres cluster example. Let's take a closer look at some of that configuration:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1Gi
The one requirement of volume is that you need to fill out the volumeClaimSpec
attribute. This attribute uses the same format as a persistent volume claim spec. In fact, we performed a similar set up when we created a Postgres cluster.
In the above example, we assume that the Kubernetes cluster is using a default storage class. If your cluster does not have a default storage class, or you wish to use a different storage class, you will have to set spec.backups.pgbackrest.repos.volume.volumeClaimSpec.storageClassName
.
Using S3
Setting up backups in S3 requires a few additional modifications to your custom resource spec and either
- the use of a Secret to protect your S3 credentials, or
- setting up identity providers in AWS to allow pgBackRest to assume a role with permissions.
Using S3 Credentials
There is an example for creating a Postgres cluster that uses S3 for backups in the kustomize/s3
directory in the Postgres Operator examples repository. In this directory, there is a file called s3.conf.example
. Copy this example file to s3.conf
:
cp s3.conf.example s3.conf
Note that s3.conf
is protected from commit by a .gitignore
.
Open up s3.conf
, you will see something similar to:
repo1-s3-key=$YOUR_AWS_S3_KEY
repo1-s3-key-secret=$YOUR_AWS_S3_KEY_SECRET
Replace the values with your AWS S3 credentials and save.
Now, open up kustomize/s3/postgres.yaml
. In the s3
section, you will see something similar to:
s3:
bucket: "$YOUR_AWS_S3_BUCKET_NAME"
endpoint: "$YOUR_AWS_S3_ENDPOINT"
region: "$YOUR_AWS_S3_REGION"
Again, replace these values with the values that match your S3 configuration. For endpoint
, only use the domain and, if necessary, the port (e.g. s3.us-east-2.amazonaws.com
).
Note that region
is required by S3, as does pgBackRest. If you are using a storage system with a S3 compatibility layer that does not require region
, you can fill in region with a random value.
If you are using MinIO, you may need to set the URI style to use path
mode. You can do this from the global settings, e.g. for repo1
:
spec:
backups:
pgbackrest:
global:
repo1-s3-uri-style: path
When your configuration is saved, you can deploy your cluster:
kubectl apply -k kustomize/s3
Watch your cluster: you will see that your backups and archives are now being stored in S3!
Using an AWS-integrated identity provider and role
If you deploy PostgresClusters to AWS Elastic Kubernetes Service, you can take advantage of their IAM role integration. When you attach a certain annotation to your PostgresCluster spec, AWS will automatically mount an AWS token and other needed environment variables. These environment variables will then be used by pgBackRest to assume the identity of a role that has permissions to upload to an S3 repository.
This method requires additional setup in AWS IAM. Use the procedure in the linked documentation for the first two steps described below:
- Create an OIDC provider for your EKS cluster.
- Create an IAM policy for bucket access and an IAM role with a trust relationship with the OIDC provider in step 1.
The third step is to associate that IAM role with a ServiceAccount, but there's no need to do that manually, as PGO does that for you. First, make a note of the IAM role's ARN
.
You can then make the following changes to the files in the kustomize/s3
directory in the Postgres Operator examples repository:
1. Add the s3
section to the spec in kustomize/s3/postgres.yaml
as discussed in the Using S3 Credentials section above. In addition to that, add the required eks.amazonaws.com/role-arn
annotation to the PostgresCluster spec using the IAM ARN
that you noted above.
For instance, given an IAM role with the ARN arn:aws:iam::123456768901:role/allow_bucket_access
, you would add the following to the PostgresCluster spec:
spec:
metadata:
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::123456768901:role/allow_bucket_access"
That annotations
field will get propagated to the ServiceAccounts that require it automatically.
2. Copy the s3.conf.example
file to s3.conf
:
cp s3.conf.example s3.conf
Update that kustomize/s3/s3.conf
file so that it looks like this:
repo1-s3-key-type=web-id
That repo1-s3-key-type=web-id
line will tell pgBackRest to use the IAM integration.
With those changes saved, you can deploy your cluster:
kubectl apply -k kustomize/s3
And watch as it spins up and backs up to S3 using pgBackRest's IAM integration.
Using Google Cloud Storage (GCS)
Similar to S3, setting up backups in Google Cloud Storage (GCS) requires a few additional modifications to your custom resource spec and the use of a Secret to protect your GCS credentials.
There is an example for creating a Postgres cluster that uses GCS for backups in the kustomize/gcs
directory in the Postgres Operator examples repository. In order to configure this example to use GCS for backups, you will need do two things.
First, copy your GCS key secret (which is a JSON file) into kustomize/gcs/gcs-key.json
. Note that a .gitignore
directive prevents you from committing this file.
Next, open the postgres.yaml
file and edit spec.backups.pgbackrest.repos.gcs.bucket
to the name of the GCS bucket that you want to back up to.
Save this file, and then run:
kubectl apply -k kustomize/gcs
Watch your cluster: you will see that your backups and archives are now being stored in GCS!
Using Azure Blob Storage
Similar to the above, setting up backups in Azure Blob Storage requires a few additional modifications to your custom resource spec and the use of a Secret to protect your Azure Storage credentials.
There is an example for creating a Postgres cluster that uses Azure for backups in the kustomize/azure
directory in the Postgres Operator examples repository. In this directory, there is a file called azure.conf.example
. Copy this example file to azure.conf
:
cp azure.conf.example azure.conf
Note that azure.conf
is protected from commit by a .gitignore
.
Open up azure.conf
, you will see something similar to:
repo1-azure-account=$YOUR_AZURE_ACCOUNT
repo1-azure-key=$YOUR_AZURE_KEY
Replace the values with your Azure credentials and save.
Now, open up kustomize/azure/postgres.yaml
. In the azure
section, you will see something similar to:
azure:
container: "$YOUR_AZURE_CONTAINER"
Again, replace these values with the values that match your Azure configuration.
When your configuration is saved, you can deploy your cluster:
kubectl apply -k kustomize/azure
Watch your cluster: you will see that your backups and archives are now being stored in Azure!
Set Up Multiple Backup Repositories
It is possible to store backups in multiple locations. For example, you may want to keep your backups both within your Kubernetes cluster and S3. There are many reasons for doing this:
- It is typically faster to heal Postgres instances when your backups are closer
- You can set different backup retention policies based upon your available storage
- You want to ensure that your backups are distributed geographically
and more.
PGO lets you store your backups in up to four locations simultaneously. You can mix and match: for example, you can store backups both locally and in GCS, or store your backups in two different GCS repositories. Note that regardless of how many repo Volumes are defined, only one repo host Pod will be created.
The multi-backup-repo example in the Postgres Operator examples repository sets up backups in four different locations using each storage type. You can modify this example to match your desired backup topology.
Additional Notes
While storing Postgres archives (write-ahead log [WAL] files) occurs in parallel when saving data to multiple pgBackRest repos, you cannot take parallel backups to different repos at the same time. PGO will ensure that all backups are taken serially. Future work in pgBackRest will address parallel backups to different repos. Please don't confuse this with parallel backup: pgBackRest does allow for backups to use parallel processes when storing them to a single repo!
Encryption
You can encrypt your backups using AES-256 encryption using the CBC mode. This can be used independent of any encryption that may be supported by an external backup system.
To encrypt your backups, you need to set the cipher type and provide a passphrase. The passphrase should be long and random (e.g. the pgBackRest documentation recommends openssl rand -base64 48
). The passphrase should be kept in a Secret.
Let's use our hippo
cluster as an example. Let's create a new directory. First, create a file called pgbackrest-secrets.conf
in this directory. It should look something like this:
repo1-cipher-pass=your-super-secure-encryption-key-passphrase
This contains the passphrase used to encrypt your data.
Next, create a kustomization.yaml
file that looks like this:
namespace: postgres-operator
secretGenerator: - name: hippo-pgbackrest-secrets
files:
- pgbackrest-secrets.conf
generatorOptions: disableNameSuffixHash: true
resources: - postgres.yaml
Finally, create the manifest for the Postgres cluster in a file named postgres.yaml
that is similar to the following:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo
spec:
postgresVersion: 16
instances:
- dataVolumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 1Gi
backups:
pgbackrest:
configuration:
- secret:
name: hippo-pgbackrest-secrets
global:
repo1-cipher-type: aes-256-cbc
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- 'ReadWriteOnce'
resources:
requests:
storage: 1Gi
Notice the reference to the Secret that contains the encryption key:
spec:
backups:
pgbackrest:
configuration:
- secret:
name: hippo-pgbackrest-secrets
as well as the configuration for enabling AES-256 encryption using the CBC mode:
spec:
backups:
pgbackrest:
global:
repo1-cipher-type: aes-256-cbc
You can now create a Postgres cluster that has encrypted backups!
Limitations
Currently the encryption settings cannot be changed on backups after they are established.
Custom Backup Configuration
Most of your backup configuration can be configured through the spec.backups.pgbackrest.global
attribute, or through information that you supply in the ConfigMap or Secret that you refer to in spec.backups.pgbackrest.configuration
. You can also provide additional Secret values if need be, e.g. repo1-cipher-pass
for encrypting backups.
The full list of pgBackRest configuration options is available at https://pgbackrest.org/configuration.html.
Warning
Some pgBackRest options require write access to paths with adequate storage capacity within your container. For example, if you enable archive-async, make sure you also add a proper spool-path.
Reducing Primary Instance Load with the Backup from Standby Option
Info
FEATURE AVAILABILITY: Available in v5.7.0 and above
You can now configure the pgBackRest Backup from Standby Option in order to reduce the load on the primary Postgres Instance Pod. The necessary settings can be configured as follows:
spec:
instances:
- name: instance1
replicas: 2
...
backups:
pgbackrest:
global:
backup-standby: "y"
Warning
As shown above, the backup-standby
option will require at least one Postgres Instance replica. If at least one replica is not accessible when taking a backup, it will fail with the following error, "ERROR: [056]: unable to find standby cluster - cannot proceed."
As described in the pgBackRest documentation, configuring the backup-standby
option causes the vast majority of the backup files to be pulled from a replica Postgres Instance (that is, a "standby database") rather than all of them coming from the primary Postgres Instance (the "primary database"). Additionally, this pgBackRest backup Job will always execute on the repo host Pod. Taken together, this will greatly reduce the load on the primary Postgres Instance when performing a backup.
IPv6 Support
If you are running your cluster in an IPv6-only environment, you will need to add an annotation to your PostgresCluster so that PGO knows to set pgBackRest's tls-server-address
to an IPv6 address. Otherwise, tls-server-address
will be set to 0.0.0.0
, making pgBackRest inaccessible, and backups will not run. The annotation should be added as shown below:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo
annotations:
postgres-operator.crunchydata.com/pgbackrest-ip-version: IPv6
Next Steps
We've now seen how to use PGO to get our backups and archives set up and safely stored. Now let's take a look at backup management and how we can do things such as set backup frequency, set retention policies, and even take one-off backups!