Migrate Data Volumes to New Clusters
There are certain cases where you may want to migrate existing volumes to a new cluster. If so, read on for an in depth look at the steps required.
Prerequisites
While your existing Postgres instance is still running, confirm that the following 3 conditions hold:
- Your volume has its
persistentVolumeReclaimPolicy
set toRetain
. - The postgres superuser exists in your Postgres instance.
- Your volume's data directory is owned by the operating system's
postgres
user, with user ID 26.
Warning
If your PVC's reclaim policy isn't set to Retain
, your data will be lost. If you don't have a postgres database user, or if the data directory isn't owned by a postgres
operating system user, the bootstrap process will fail.
Once all three of these conditions have been met, consider performing a test run to familiarize yourself with the process and identify pain points unique to your system and configuration.
Configure your PostgresCluster
In order to use existing pgData, pg_wal or pgBackRest repo volumes in a new PostgresCluster, you will need to configure the spec.dataSource.volumes
section of your PostgresCluster manifest. As shown below, there are three possible volumes you may configure: pgDataVolume
, pgWALVolume
and pgBackRestVolume
. Under each, you must define the PVC name to use in the new cluster. A directory may also be defined, as needed, for cases where the existing directory name does not match the v5 directory.
To help explain how these fields are used, we will consider a pgcluster
named "oldhippo" from PGO v4. We will assume that the pgcluster
has been deleted and only the PVCs have been left in place.
Info
Any differences in configuration or other datasources will alter this procedure significantly. Certain storage options require additional steps (see Considerations).
In a standard PGO v4.7 cluster, a primary database pod with a separate pg_wal PVC will mount its pgData PVC, named "oldhippo", at /pgdata
and its pg_wal PVC, named "oldhippo-wal", at /pgwal
within the pod's file system. In this pod, the standard pgData directory will be /pgdata/oldhippo
and the standard pg_wal directory will be /pgwal/oldhippo-wal
. The pgBackRest repo pod will mount its PVC at /backrestrepo
and the repo directory will be /backrestrepo/oldhippo-backrest-shared-repo
.
With the above in mind, we need to reference the three PVCs we wish to migrate in the dataSource.volumes
portion of the PostgresCluster spec. Additionally, to accommodate the PGO v5 file structure, we must also reference the pgData and pgBackRest repo directories. Note that the pg_wal directory does not need to be moved when migrating from v4 to v5!
Now, we just need to populate our CRD with the information described above:
spec:
dataSource:
volumes:
pgDataVolume:
pvcName: oldhippo
directory: oldhippo
pgWALVolume:
pvcName: oldhippo-wal
pgBackRestVolume:
pvcName: oldhippo-pgbr-repo
directory: oldhippo-backrest-shared-repo
To understand how to set pgDataVolume.directory
, think of subtracting the mount path of your volume from the PGDATA
path.
If your volume is mounted at "/data", and PGDATA
is set to "/data/pg15/oldhippo", you'll set pgDataVolume.directory
to "pg15/oldhippo" .
Lastly, it is very important that the PostgreSQL version and storage configuration in your PostgresCluster match exactly the existing volumes being used.
If the volumes were used with PostgreSQL 13, the spec.postgresVersion
value should be 13
and the associated spec.image
value should refer to a PostgreSQL 13 image.
Similarly, the configured data volume definitions in your PostgresCluster spec should match your existing volumes. For example, if the existing pgData PVC has a RWO access mode and is 1 Gigabyte, the relevant dataVolumeClaimSpec
should be configured as
dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1G
With the above configuration in place, your existing PVC will be used when creating your PostgresCluster. They will be given appropriate Labels and ownership references, and the necessary directory updates will be made so that your cluster is able to find the existing directories.
Considerations
Removing PGO v4 labels
When migrating data volumes from v4 to v5, PGO relabels all volumes for PGO v5, but will not remove existing PGO v4 labels. This results in PVCs that are labeled for both PGO v4 and v5, which can lead to unintended behavior.
To avoid that, you must manually remove the pg-cluster
and vendor
labels, which you can do with a kubectl
command. For instance, given a cluster named hippo
with a dedicated pgBackRest repo, the PVC will be hippo-pgbr-repo
, and the PGO v4 labels can be removed with the below command:
kubectl label pvc hippo-pgbr-repo pg-cluster- vendor-
Proper file permissions for certain storage options
Additional steps are required to set proper file permissions when using certain storage options, such as NFS and HostPath storage due to a known issue with how fsGroups are applied.
When migrating from PGO v4, this will require the user to manually set the group value of the pgBackRest repo directory, and all subdirectories, to 26
to match the postgres
group used in PGO v5. Please see this example for more information.
Additional Considerations
- An existing
pg_wal
volume is not required when thepg_wal
directory is located on the same PVC as thepgData
directory. - When using existing pg_wal volumes, an existing pgData volume must also be defined to ensure consistent naming and proper bootstrapping.
- When migrating from PGO v4 volumes, it is recommended to use the most recently available version of PGO v4.
- As there are many factors that may impact this procedure, it is strongly recommended that a test run be completed beforehand to ensure successful operation.
Putting it all together
Now that we've identified all of our volumes and required directories, we're ready to create our new cluster!
Below is a complete PostgresCluster that includes everything we've talked about. After your PostgresCluster
is created, you should remove the spec.dataSource.volumes
section.
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: oldhippo
spec:
postgresVersion: 16
dataSource:
volumes:
pgDataVolume:
pvcName: oldhippo
directory: oldhippo
pgWALVolume:
pvcName: oldhippo-wal
pgBackRestVolume:
pvcName: oldhippo-pgbr-repo
directory: oldhippo-backrest-shared-repo
instances:
- name: instance1
dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1G
walVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1G
backups:
pgbackrest:
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1G