Migrate Data Volumes to New Clusters

There are certain cases where you may want to migrate existing volumes to a new cluster. If so, read on for an in depth look at the steps required.

Prerequisites

While your existing Postgres instance is still running, confirm that the following 3 conditions hold:

  1. Your volume has its persistentVolumeReclaimPolicy set to Retain.
  2. The postgres superuser exists in your Postgres instance.
  3. Your volume's data directory is owned by the operating system's postgres user, with user ID 26.

Warning

If your PVC's reclaim policy isn't set to Retain, your data will be lost. If you don't have a postgres database user, or if the data directory isn't owned by a postgres operating system user, the bootstrap process will fail.

Once all three of these conditions have been met, consider performing a test run to familiarize yourself with the process and identify pain points unique to your system and configuration.

Configure your PostgresCluster

In order to use existing pgData, pg_wal or pgBackRest repo volumes in a new PostgresCluster, you will need to configure the spec.dataSource.volumes section of your PostgresCluster manifest. As shown below, there are three possible volumes you may configure: pgDataVolume, pgWALVolume and pgBackRestVolume. Under each, you must define the PVC name to use in the new cluster. A directory may also be defined, as needed, for cases where the existing directory name does not match the v5 directory.

To help explain how these fields are used, we will consider a pgcluster named "oldhippo" from PGO v4. We will assume that the pgcluster has been deleted and only the PVCs have been left in place.

Info

Any differences in configuration or other datasources will alter this procedure significantly. Certain storage options require additional steps (see Considerations).

In a standard PGO v4.7 cluster, a primary database pod with a separate pg_wal PVC will mount its pgData PVC, named "oldhippo", at /pgdata and its pg_wal PVC, named "oldhippo-wal", at /pgwal within the pod's file system. In this pod, the standard pgData directory will be /pgdata/oldhippo and the standard pg_wal directory will be /pgwal/oldhippo-wal. The pgBackRest repo pod will mount its PVC at /backrestrepo and the repo directory will be /backrestrepo/oldhippo-backrest-shared-repo.

With the above in mind, we need to reference the three PVCs we wish to migrate in the dataSource.volumes portion of the PostgresCluster spec. Additionally, to accommodate the PGO v5 file structure, we must also reference the pgData and pgBackRest repo directories. Note that the pg_wal directory does not need to be moved when migrating from v4 to v5!

Now, we just need to populate our CRD with the information described above:

spec:
  dataSource:
    volumes:
      pgDataVolume:
        pvcName: oldhippo
        directory: oldhippo
      pgWALVolume:
        pvcName: oldhippo-wal
      pgBackRestVolume:
        pvcName: oldhippo-pgbr-repo
        directory: oldhippo-backrest-shared-repo

To understand how to set pgDataVolume.directory, think of subtracting the mount path of your volume from the PGDATA path. If your volume is mounted at "/data", and PGDATA is set to "/data/pg15/oldhippo", you'll set pgDataVolume.directory to "pg15/oldhippo" .

Lastly, it is very important that the PostgreSQL version and storage configuration in your PostgresCluster match exactly the existing volumes being used.

If the volumes were used with PostgreSQL 13, the spec.postgresVersion value should be 13 and the associated spec.image value should refer to a PostgreSQL 13 image.

Similarly, the configured data volume definitions in your PostgresCluster spec should match your existing volumes. For example, if the existing pgData PVC has a RWO access mode and is 1 Gigabyte, the relevant dataVolumeClaimSpec should be configured as

dataVolumeClaimSpec:
  accessModes:
  - "ReadWriteOnce"
  resources:
    requests:
      storage: 1G

With the above configuration in place, your existing PVC will be used when creating your PostgresCluster. They will be given appropriate Labels and ownership references, and the necessary directory updates will be made so that your cluster is able to find the existing directories.

Considerations

Removing PGO v4 labels

When migrating data volumes from v4 to v5, PGO relabels all volumes for PGO v5, but will not remove existing PGO v4 labels. This results in PVCs that are labeled for both PGO v4 and v5, which can lead to unintended behavior.

To avoid that, you must manually remove the pg-cluster and vendor labels, which you can do with a kubectl command. For instance, given a cluster named hippo with a dedicated pgBackRest repo, the PVC will be hippo-pgbr-repo, and the PGO v4 labels can be removed with the below command:

kubectl label pvc hippo-pgbr-repo pg-cluster- vendor-

Proper file permissions for certain storage options

Additional steps are required to set proper file permissions when using certain storage options, such as NFS and HostPath storage due to a known issue with how fsGroups are applied.

When migrating from PGO v4, this will require the user to manually set the group value of the pgBackRest repo directory, and all subdirectories, to 26 to match the postgres group used in PGO v5. Please see this example for more information.

Additional Considerations

  • An existing pg_wal volume is not required when the pg_wal directory is located on the same PVC as the pgData directory.
  • When using existing pg_wal volumes, an existing pgData volume must also be defined to ensure consistent naming and proper bootstrapping.
  • When migrating from PGO v4 volumes, it is recommended to use the most recently available version of PGO v4.
  • As there are many factors that may impact this procedure, it is strongly recommended that a test run be completed beforehand to ensure successful operation.

Putting it all together

Now that we've identified all of our volumes and required directories, we're ready to create our new cluster!

Below is a complete PostgresCluster that includes everything we've talked about. After your PostgresCluster is created, you should remove the spec.dataSource.volumes section.

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: oldhippo
spec:
  postgresVersion: 16
  dataSource:
    volumes:
      pgDataVolume:
        pvcName: oldhippo
        directory: oldhippo
      pgWALVolume:
        pvcName: oldhippo-wal
      pgBackRestVolume:
        pvcName: oldhippo-pgbr-repo
        directory: oldhippo-backrest-shared-repo
  instances:
    - name: instance1
      dataVolumeClaimSpec:
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1G
      walVolumeClaimSpec:
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1G
  backups:
    pgbackrest:
      repos:
      - name: repo1
        volume:
          volumeClaimSpec:
            accessModes:
            - "ReadWriteOnce"
            resources:
              requests:
                storage: 1G