Tablespaces

A Tablespace is a PostgreSQL feature that is used to store data on a volume that is different from the primary data directory. While most workloads do not require them, tablespaces can be particularly helpful for larger data sets or utilizing particular hardware to optimize performance on a particular PostgreSQL object (a table, index, etc.). Some examples of use cases for tablespaces include:

Partitioning larger data sets across different volumes
Putting data onto archival systems
Utilizing hardware (or a storage class) for a particular database
Storing sensitive data on a volume that supports transparent data-encryption (TDE)

and others.

In order to use PostgreSQL tablespaces properly in a highly-available, distributed system, there are several considerations that need to be accounted for to ensure proper operations:

Each tablespace must have its own volume; this means that every tablespace for every replica in a system must have its own volume.
The filesystem map must be consistent across the cluster
The backup & disaster recovery management system must be able to safely backup and restore data to tablespaces

Additionally, a tablespace is a critical piece of a PostgreSQL instance: if PostgreSQL expects a tablespace to exist and it is unavailable, this could trigger a downtime scenario.

While there are certain challenges with creating a PostgreSQL cluster with high-availability along with tablespaces in a Kubernetes-based environment, the PostgreSQL Operator adds many conveniences to make it easier to use tablespaces in applications.

How Tablespaces Work in the PostgreSQL Operator

As stated above, it is important to ensure that every tablespace created has its own volume (i.e. its own persistent volume claim). This is especially true for any replicas in a cluster: you don’t want multiple PostgreSQL instances writing to the same volume, as this is a recipe for disaster!

One of the keys to working with tablespaces in a high-availability cluster is to ensure the filesystem that the tablespaces map to is consistent. Specifically, it is imperative to have the LOCATION parameter that is used by PostgreSQL to indicate where a tablespace resides to match in each instance in a cluster.

The PostgreSQL Operator achieves this by mounting all of its tablespaces to a directory called /tablespaces in the container. While each tablespace will exist in a unique PVC across all PostgreSQL instances in a cluster, each instance’s tablespaces will mount in a predictable way in /tablespaces.

The PostgreSQL Operator takes this one step further and abstracts this away from you. When your PostgreSQL cluster initialized, the tablespace definition is automatically created in PostgreSQL; you can start using it immediately! An example of this is demonstrated in the next section.

The PostgreSQL Operator ensures the availability of the tablespaces across the different lifecycle events that occur on a PostgreSQL cluster, including:

High-Availability: Data in the tablespaces is replicated across the cluster, and is available after a downtime event
Disaster Recovery: Tablespaces are backed up and are properly restored during a recovery
Clone: Tablespaces are created in any cloned cluster
Deprovisioining: Tablespaces are deleted when a PostgreSQL instance or cluster is deleted

Adding Tablespaces to a New Cluster

Tablespaces can be used in a cluster with the pgo create cluster command. The command follows this general format:

pgo create cluster hacluster \
    --tablespace=name=tablespace1:storageconfig=storageconfigname \
    --tablespace=name=tablespace2:storageconfig=storageconfigname

For example, to create tablespaces name faststorage1 and faststorage2 on PVCs that use the nfsstorage storage type, you would execute the following command:

pgo create cluster hacluster \
    --tablespace=name=faststorage1:storageconfig=nfsstorage \
    --tablespace=name=faststorage2:storageconfig=nfsstorage

Once the cluster is initialized, you can immediately interface with the tablespaces! For example, if you wanted to create a table called sensor_data on the faststorage1 tablespace, you could execute the following SQL:

CREATE TABLE sensor_data (
  sensor_id int,
  sensor_value numeric,
  created_at timestamptz DEFAULT CURRENT_TIMESTAMP
)
TABLESPACE faststorage1;

Adding Tablespaces to Existing Clusters

You can also add a tablespace to an existing PostgreSQL cluster with the pgo update cluster command. Adding a tablespace to a cluster uses a similar syntax to creating a cluster with tablespaces, for example:

pgo update cluster hacluster \
    --tablespace=name=tablespace3:storageconfig=storageconfigname

NOTE: This operation can cause downtime. In order to add a tablespace to a PostgreSQL cluster, persistent volume claims (PVCs) need to be created and mounted to each PostgreSQL instance in the cluster. The act of mounting a new PVC to a Kubernetes Deployment causes the Pods in the deployment to restart.

When the operation completes, the tablespace will be set up and accessible to use within the PostgreSQL cluster.

More Information

For more information on how tablespaces work in PostgreSQL please refer to the PostgreSQL manual.