Common pgo Client Tasks
While the full pgo
client reference will tell you
everything you need to know about how to use pgo
, it may be helpful to see
several examples on how to conduct “day-in-the-life” tasks for administrating
PostgreSQL cluster with the PostgreSQL Operator.
The below guide covers many of the common operations that are required when managing PostgreSQL clusters. The guide is broken up by different administrative topics, such as provisioning, high-availability, etc.
Setup Before Running the Examples
Many of the pgo
client commands require you to specify a namespace via the
-n
or --namespace
flag. While this is a very helpful tool when managing
PostgreSQL deployxments across many Kubernetes namespaces, this can become
onerous for the intents of this guide.
If you install the PostgreSQL Operator using the quickstart
guide, you will install the PostgreSQL Operator to a namespace called pgo
. We
can choose to always use one of these namespaces by setting the PGO_NAMESPACE
environmental variable, which is detailed in the global pgo
Client
reference,
For convenience, we will use the pgo
namespace in the examples below.
For even more convenience, we recommend setting pgo
to be the value of
the PGO_NAMESPACE
variable. In the shell that you will be executing the pgo
commands in, run the following command:
export PGO_NAMESPACE=pgo
If you do not wish to set this environmental variable, or are in an environment
where you are unable to use environmental variables, you will have to use the
--namespace
(or -n
) flag for most commands, e.g.
pgo version -n pgo
JSON Output
The default for the pgo
client commands is to output their results in a
readable format. However, there are times where it may be helpful to you to have
the format output in a machine parseable format like JSON.
Several commands support the -o
/--output
flags that delivers the results of
the command in the specified output. Presently, the only output that is
supported is json
.
As an example of using this feature, if you wanted to get the results of the
pgo test
command in JSON, you could run the following:
pgo test hacluster -o json
PostgreSQL Operator System Basics
To get started, it’s first important to understand the basics of working with the PostgreSQL Operator itself. You should know how to test if the PostgreSQL Operator is working, check the overall status of the PostgreSQL Operator, view the current configuration that the PostgreSQL Operator us using, and seeing which Kubernetes Namespaces the PostgreSQL Operator has access to.
While this may not be as fun as creating high-availability PostgreSQL clusters, these commands will help you to perform basic troubleshooting tasks in your environment.
Checking Connectivity to the PostgreSQL Operator
A common task when working with the PostgreSQL Operator is to check connectivity
to the PostgreSQL Operator. This can be accomplish with the pgo version
command:
pgo version
which, if working, will yield results similar to:
pgo client version 4.4.1
pgo-apiserver version 4.4.1
Inspecting the PostgreSQL Operator Configuration
The pgo show config
command allows you to
view the current configuration that the PostgreSQL Operator is using. This can
be helpful for troubleshooting issues such as which PostgreSQL images are being
deployed by default, which storage classes are being used, etc.
You can run the pgo show config
command by running:
pgo show config
which yields output similar to:
BasicAuth: ""
Cluster:
CCPImagePrefix: crunchydata
CCPImageTag: centos7-12.4-4.4.1
Policies: ""
Metrics: false
Badger: false
Port: "5432"
PGBadgerPort: "10000"
ExporterPort: "9187"
User: testuser
Database: userdb
PasswordAgeDays: "60"
PasswordLength: "8"
Replicas: "0"
ServiceType: ClusterIP
BackrestPort: 2022
Backrest: true
BackrestS3Bucket: ""
BackrestS3Endpoint: ""
BackrestS3Region: ""
BackrestS3URIStyle: ""
BackrestS3VerifyTLS: true
DisableAutofail: false
PgmonitorPassword: ""
EnableCrunchyadm: false
DisableReplicaStartFailReinit: false
PodAntiAffinity: preferred
SyncReplication: false
Pgo:
Audit: false
PGOImagePrefix: crunchydata
PGOImageTag: centos7-4.4.1
PrimaryStorage: nfsstorage
BackupStorage: nfsstorage
ReplicaStorage: nfsstorage
BackrestStorage: nfsstorage
Storage:
nfsstorage:
AccessMode: ReadWriteMany
Size: 1G
StorageType: create
StorageClass: ""
SupplementalGroups: "65534"
MatchLabels: ""
Viewing PostgreSQL Operator Key Metrics
The pgo status
command provides a
generalized statistical view of the overall resource consumption of the
PostgreSQL Operator. These stats include:
- The total number of PostgreSQL instances
- The total number of Persistent Volume Claims (PVC) that are allocated, along with the total amount of disk the claims specify
- The types of container images that are deployed, along with how many are deployed
- The nodes that are used by the PostgreSQL Operator
and more
You can use the pgo status
command by running:
pgo status
which yields output similar to:
Databases: 8
Claims: 8
Total Volume Size: 8Gi
Database Images:
4 crunchydata/crunchy-postgres-ha:centos7-12.4-4.4.1
4 crunchydata/pgo-backrest-repo:centos7-4.4.1
8 crunchydata/pgo-backrest:centos7-4.4.1
Databases Not Ready:
Labels (count > 1): [count] [label]
[8] [vendor=crunchydata]
[4] [pgo-backrest-repo=true]
[4] [pgouser=admin]
[4] [pgo-pg-database=true]
[4] [crunchy_collect=false]
[4] [pg-pod-anti-affinity=]
[4] [pgo-version=4.4.1]
[4] [archive-timeout=60]
[2] [pg-cluster=hacluster]
Viewing PostgreSQL Operator Managed Namespaces
The PostgreSQL Operator has the ability to manage PostgreSQL clusters across Kubernetes Namespaces. During the course of Operations, it can be helpful to know which namespaces the PostgreSQL Operator can use for deploying PostgreSQL clusters.
You can view which namespaces the PostgreSQL Operator can utilize by using
the pgo show namespace
command. To
list out the namespaces that the PostgreSQL Operator has access to, you can run
the following command:
pgo show namespace --all
which yields output similar to:
pgo username: admin
namespace useraccess installaccess
default accessible no access
kube-node-lease accessible no access
kube-public accessible no access
kube-system accessible no access
pgo accessible no access
pgouser1 accessible accessible
pgouser2 accessible accessible
somethingelse no access no access
NOTE: Based on your deployment, your Kubernetes administrator may restrict
access to the multi-namespace feature of the PostgreSQL Operator. In this case,
you do not need to worry about managing your namespaces and as such do not need
to use this command, but we recommend setting the PGO_NAMESPACE
variable as
described in the general notes on this page.
Provisioning: Create, View, Destroy
Creating a PostgreSQL Cluster
You can create a cluster using the pgo create cluster
command:
pgo create cluster hacluster
which if successfully, will yield output similar to this:
created Pgcluster hacluster
workflow id ae714d12-f5d0-4fa9-910f-21944b41dec8
Create a PostgreSQL Cluster with Different PVC Sizes
You can also create a PostgreSQL cluster with an arbitrary PVC size using the
pgo create cluster
command. For
example, if you want to create a PostgreSQL cluster with with a 128GB PVC, you
can use the following command:
pgo create cluster hacluster --pvc-size=128Gi
The above command sets the PVC size for all PostgreSQL instances in the cluster, i.e. the primary and replicas.
This also extends to the size of the pgBackRest repository as well, if you are using the local Kubernetes cluster storage for your backup repository. To create a PostgreSQL cluster with a pgBackRest repository that uses a 1TB PVC, you can use the following command:
pgo create cluster hacluster --pgbackrest-pvc-size=1Ti
Specify CPU / Memory for a PostgreSQL Cluster
To specify the amount of CPU and memory to request for a PostgreSQL cluster, you
can use the --cpu
and --memory
flags of the
pgo create cluster
command. Both
of these values utilize the Kubernetes quantity format
for specifying how to allocate resources.
For example, to create a PostgreSQL cluster that requests 4 CPU cores and has 16 gibibytes of memory, you can use the following command:
pgo create cluster hacluster --cpu=4 --memory=16Gi
Create a PostgreSQL Cluster with PostGIS
To create a PostgreSQL cluster that uses the geospatial extension PostGIS, you can execute the following command, updated with your desired image tag. In the example below, the cluster will use PostgreSQL 12.4 and PostGIS 3.0:
pgo create cluster hagiscluster \
--ccp-image=crunchy-postgres-gis-ha \
--ccp-image-tag=centos7-12.4-3.0-4.4.1
Create a PostgreSQL Cluster with a Tablespace
Tablespaces are a PostgreSQL feature that allows a user to select specific volumes to store data to, which is helpful in several types of scenarios. Often your workload does not require a tablespace, but the PostgreSQL Operator provides support for tablespaces throughout the lifecycle of a PostgreSQL cluster.
To create a PostgreSQL cluster that uses the tablespace feature with NFS storage, you can execute the following command:
pgo create cluster hactsluster --tablespace=name=ts1:storageconfig=nfsstorage
You can use your preferred storage engine instead of nfsstorage
. For example,
to create multiple tablespaces on GKE, you can execute the following command:
pgo create cluster hactsluster \
--tablespace=name=ts1:storageconfig=gce \
--tablespace=name=ts2:storageconfig=gce
Tablespaces are immediately available once the PostgreSQL cluster is
provisioned. For example, to create a table using the tablespace ts1
, you can
run the following SQL on your PostgreSQL cluster:
CREATE TABLE sensor_data (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
sensor1 numeric,
sensor2 numeric,
sensor3 numeric,
sensor4 numeric
)
TABLESPACE ts1;
You can also create tablespaces that have different sized PVCs from the ones defined in the storage specification. For instance, to create two tablespaces, one that uses a 10GiB PVC and one that uses a 20GiB PVC, you can execute the following command:
pgo create cluster hactsluster \
--tablespace=name=ts1:storageconfig=gce:pvcsize=10Gi \
--tablespace=name=ts2:storageconfig=gce:pvcsize=20Gi
Create a PostgreSQL Cluster Using a Backup from Another PostgreSQL Cluster
It is also possible to create a new PostgreSQL Cluster using a backup from another
PostgreSQL cluster. To do so, simply specify the cluster containing the backup
that you would like to utilize using the restore-from
option:
pgo create cluster hacluster2 --restore-from=hacluster1
When using this approach, a pgbackrest restore
will be performed using the pgBackRest
repository for the restore-from
cluster specified in order to populate the initial
PGDATA
directory for the new PostgreSQL cluster. By default, pgBackRest will restore
to the latest backup available and replay all WAL. However, a restore-opts
option
is also available that allows the restore
command to be further customized, e.g. to
perform a point-in-time restore and/or restore from an S3 storage bucket:
pgo create cluster hacluster2 \
--restore-from=hacluster1 \
--restore-opts="--repo-type=s3 --type=time --target='2020-07-02 20:19:36.13557+00'"
Tracking a Newly Provisioned Cluster
A new PostgreSQL cluster can take a few moments to provision. You may have
noticed that the pgo create cluster
command returns something called a
“workflow id”. This workflow ID allows you to track the progress of your new
PostgreSQL cluster while it is being provisioned using the pgo show workflow
command:
pgo show workflow ae714d12-f5d0-4fa9-910f-21944b41dec8
which can yield output similar to:
parameter value
--------- -----
pg-cluster hacluster
task completed 2019-12-27T02:10:14Z
task submitted 2019-12-27T02:09:46Z
workflowid ae714d12-f5d0-4fa9-910f-21944b41dec8
View PostgreSQL Cluster Details
To see details about your PostgreSQL cluster, you can use the pgo show cluster
command. These details include elements such as:
- The version of PostgreSQL that the cluster is using
- The PostgreSQL instances that comprise the cluster
- The Pods assigned to the cluster for all of the associated components, including the nodes that the pods are assigned to
- The Persistent Volume Claims (PVC) that are being consumed by the cluster
- The Kubernetes Deployments associated with the cluster
- The Kubernetes Services associated with the cluster
- The Kubernetes Labels that are assigned to the PostgreSQL instances
and more.
You can view the details of the cluster by executing the following command:
pgo show cluster hacluster
which will yield output similar to:
cluster : hacluster (crunchy-postgres-ha:centos7-12.4-4.4.1)
pod : hacluster-6dc6cfcfb9-f9knq (Running) on node01 (1/1) (primary)
pvc : hacluster
resources : CPU Limit= Memory Limit=, CPU Request= Memory Request=
storage : Primary=200M Replica=200M
deployment : hacluster
deployment : hacluster-backrest-shared-repo
service : hacluster - ClusterIP (10.102.20.42)
labels : pg-pod-anti-affinity= archive-timeout=60 crunchy-pgbadger=false crunchy_collect=false deployment-name=hacluster pg-cluster=hacluster crunchy-pgha-scope=hacluster autofail=true pgo-backrest=true pgo-version=4.4.1 current-primary=hacluster name=hacluster pgouser=admin workflowid=ae714d12-f5d0-4fa9-910f-21944b41dec8
Deleting a Cluster
You can delete a PostgreSQL cluster that is managed by the PostgreSQL Operator by executing the following command:
pgo delete cluster hacluster
This will remove the cluster from being managed by the PostgreSQL Operator, as well as delete the root data Persistent Volume Claim (PVC) and backup PVCs associated with the cluster.
If you wish to keep your PostgreSQL data PVC, you can delete the cluster with the following command:
pgo delete cluster hacluster --keep-data
You can then recreate the PostgreSQL cluster with the same data by using the
pgo create cluster
command with a cluster of the same name:
pgo create cluster hacluster
This technique is used when performing tasks such as upgrading the PostgreSQL Operator.
You can also keep the pgBackRest repository associated with the PostgreSQL
cluster by using the --keep-backups
flag with the pgo delete cluster
command:
pgo delete cluster hacluster --keep-backups
Testing PostgreSQL Cluster Availability
You can test the availability of your cluster by using the pgo test
command. The pgo test
command checks to see if the Kubernetes Services and
the Pods that comprise the PostgreSQL cluster are available to receive
connections. This includes:
- Testing that the Kubernetes Endpoints are available and able to route requests to healthy Pods
- Testing that each PostgreSQL instance is available and ready to accept client
connections by performing a connectivity check similar to the one performed by
pg_isready
To test the availability of a PostgreSQL cluster, you can run the following command:
pgo test hacluster
which will yield output similar to:
cluster : hacluster
Services
primary (10.102.20.42:5432): UP
Instances
primary (hacluster-6dc6cfcfb9-f9knq): UP
Disaster Recovery: Backups & Restores
The PostgreSQL Operator supports sophisticated functionality for managing your backups and restores. For more information for how this works, please see the disaster recovery guide.
Creating a Backup
The PostgreSQL Operator uses the open source pgBackRest backup and recovery utility for managing backups and PostgreSQL archives. These backups are also used as part of managing the overall health and high-availability of PostgreSQL clusters managed by the PostgreSQL Operator and used as part of the cloning process as well.
When a new PostgreSQL cluster is provisioned by the PostgreSQL Operator, a full
pgBackRest backup is taken by default. This is required in order to create new
replicas (via pgo scale
) for the PostgreSQL cluster as well as healing during
a failover scenario.
To create a backup, you can run the following command:
pgo backup hacluster
which by default, will create an incremental pgBackRest backup. The reason for this is that the PostgreSQL Operator initially creates a pgBackRest full backup when the cluster is initial provisioned, and pgBackRest will take incremental backups for each subsequent backup until a different backup type is specified.
Most pgBackRest options are supported and can be passed in by the PostgreSQL
Operator via the --backup-opts
flag. What follows are some examples for how
to utilize pgBackRest with the PostgreSQL Operator to help you create your
optimal disaster recovery setup.
Creating a Full Backup
You can create a full backup using the following command:
pgo backup hacluster --backup-opts="--type=full"
Creating a Differential Backup
You can create a differential backup using the following command:
pgo backup hacluster --backup-opts="--type=diff"
Creating an Incremental Backup
You can create a differential backup using the following command:
pgo backup hacluster --backup-opts="--type=incr"
An incremental backup is created without specifying any options after a full or differential backup is taken.
Creating Backups in S3
The PostgreSQL Operator supports creating backups in S3 or any object storage system that uses the S3 protocol. For more information, please read the section on PostgreSQL Operator Backups with S3 in the architecture section.
Displaying Backup Information
You can see information about the current state of backups in a PostgreSQL cluster managed by the PostgreSQL Operator by executing the following command:
pgo show backup hacluster
Setting Backup Retention
By default, pgBackRest will allow you to keep on creating backups until you run out of disk space. As such, it may be helpful to manage how many backups are retained.
pgBackRest comes with several flags for managing how backups can be retained:
--repo1-retention-full
: how many full backups to retain--repo1-retention-diff
: how many differential backups to retain--repo1-retention-archive
: how many sets of WAL archives to retain alongside the full and differential backups that are retained
For example, to create a full backup and retain the previous 7 full backups, you would execute the following command:
pgo backup hacluster --backup-opts="--type=full --repo1-retention-full=7"
Scheduling Backups
Any effective disaster recovery strategy includes having regularly scheduled backups. The PostgreSQL Operator enables this through its scheduling sidecar that is deployed alongside the Operator.
Creating a Scheduled Backup
For example, to schedule a full backup once a day at midnight, you can execute the following command:
pgo create schedule hacluster --schedule="0 1 * * *" \
--schedule-type=pgbackrest --pgbackrest-backup-type=full
To schedule an incremental backup once every 3 hours, you can execute the following command:
pgo create schedule hacluster --schedule="0 */3 * * *" \
--schedule-type=pgbackrest --pgbackrest-backup-type=incr
You can also create regularly scheduled backups and combine it with a retention policy. For example, using the above example of taking a nightly full backup, you can specify a policy of retaining 21 backups by executing the following command:
pgo create schedule hacluster --schedule="0 0 * * *" \
--schedule-type=pgbackrest --pgbackrest-backup-type=full \
--schedule-opts="--repo1-retention-full=21"
Restore a Cluster
The PostgreSQL Operator supports the ability to perform a full restore on a PostgreSQL cluster (i.e. a “clone” or “copy”) as well as a point-in-time-recovery. There are two types of ways to restore a cluster:
- Restore to a new cluster using the
--restore-from
flag in thepgo create cluster
command. This is effectively a clone or a copy. - Restore in-place using the
pgo restore
command. Note that this is destructive.
It is typically better to perform a restore to a new cluster, particularly when performing a point-in-time-recovery, as it can allow you to more effectively manage your downtime and avoid making undesired changes to your production data.
Additionally, the “restore to a new cluster” technique works so long as you have
a pgBackRest repository available: the pgBackRest repository does not need to be
attached to an active cluster! For example, if a cluster named hippo
was
deleted as such:
pgo delete cluster hippo --keep-backups
you can create a new cluster from the backups like so:
pgo create cluster datalake --restore-from=hippo
Below provides guidance on how to perform a restore to a new PostgreSQL cluster both as a full copy and to a specific point in time. Additionally, it also shows how to restore in place to a specific point in time.
Restore to a New Cluster (aka “copy” or “clone”)
Restoring to a new PostgreSQL cluster allows one to take a backup and create a new PostgreSQL cluster that can run alongside an existing PostgreSQL cluster. There are several scenarios where using this technique is helpful:
- Creating a copy of a PostgreSQL cluster that can be used for other purposes. Another way of putting this is “creating a clone.”
- Restore to a point-in-time and inspect the state of the data without affecting the current cluster
and more.
Full Restore
To create a new PostgreSQL cluster from a backup and restore it fully, you can execute the following command:
pgo create cluster newcluster --restore-from=oldcluster
Point-in-time-Recovery (PITR)
To create a new PostgreSQL cluster and restore it to specific point-in-time (e.g. before a key table was dropped), you can use the following command, substituting the time that you wish to restore to:
pgo create cluster newcluster \
--restore-from oldcluster \
--restore-opts "--type=time --target='2019-12-31 11:59:59.999999+00'"
When the restore is complete, the cluster is immediately available for reads and
writes. To inspect the data before allowing connections, add pgBackRest’s
--target-action=pause
option to the --restore-opts
parameter.
The PostgreSQL Operator supports the full set of pgBackRest restore options,
which can be passed into the --backup-opts
parameter. For more information,
please review the pgBackRest restore options
Restore in-place
Restoring a PostgreSQL cluster in-place is a destructive action that will
perform a recovery on your existing data directory. This is accomplished using
the pgo restore
command. The most common scenario is to restore the database to a specific point
in time.
Point-in-time-Recovery (PITR)
The more likely scenario when performing a PostgreSQL cluster restore is to recover to a particular point-in-time (e.g. before a key table was dropped). For example, to restore a cluster to December 31, 2019 at 11:59pm:
pgo restore hacluster --pitr-target="2019-12-31 11:59:59.999999+00" \
--backup-opts="--type=time"
When the restore is complete, the cluster is immediately available for reads and
writes. To inspect the data before allowing connections, add pgBackRest’s
--target-action=pause
option to the --backup-opts
parameter.
The PostgreSQL Operator supports the full set of pgBackRest restore options,
which can be passed into the --backup-opts
parameter. For more information,
please review the pgBackRest restore options
Using this technique, after a restore is complete, you will need to re-enable high availability on the PostgreSQL cluster manually. You can re-enable high availability by executing the following command:
pgo update cluster hacluster --autofail=true
Logical Backups (pg_dump
/ pg_dumpall
)
The PostgreSQL Operator supports taking logical backups with pg_dump
and
pg_dumpall
. While they do not provide the same performance and storage
optimizations as the physical backups provided by pgBackRest, logical backups
are helpful when one wants to upgrade between major PostgreSQL versions, or
provide only a subset of a database, such as a table.
Create a Logical Backup
To create a logical backup of the ‘postgres’ database, you can run the following command:
pgo backup hacluster --backup-type=pgdump
To create a logical backup of a specific database, you can use the --database
flag,
as in the following command:
pgo backup hacluster --backup-type=pgdump --database=mydb
You can pass in specific options to --backup-opts
, which can accept most of
the options that the pg_dump
command accepts. For example, to only dump the data from a specific table called
users
:
pgo backup hacluster --backup-type=pgdump --backup-opts="-t users"
To use pg_dumpall
to create a logical backup of all the data in a PostgreSQL
cluster, you must pass the --dump-all
flag in --backup-opts
, i.e.:
pgo backup hacluster --backup-type=pgdump --backup-opts="--dump-all"
Viewing Logical Backups
To view an available list of logical backups, you can use the pgo show backup
command:
pgo show backup --backup-type=pgdump
This provides information about the PVC that the logical backups are stored on as well as the timestamps required to perform a restore from a logical backup.
Restore from a Logical Backup
To restore from a logical backup, you need to reference the PVC that the logical backup is stored to, as well as the timestamp that was created by the logical backup.
You can restore a logical backup using the following command:
pgo restore hacluster --backup-type=pgdump --backup-pvc=hacluster-pgdump-pvc \
--pitr-target="2019-01-15-00-03-25" -n pgouser1
To restore to a specific database, add the --pgdump-database
flag to the
command from above:
pgo restore hacluster --backup-type=pgdump --backup-pvc=hacluster-pgdump-pvc \
--pgdump-database=mydb --pitr-target="2019-01-15-00-03-25" -n pgouser1
High-Availability: Scaling Up & Down
The PostgreSQL Operator supports a robust high-availability set up to ensure that your PostgreSQL clusters can stay up and running. For detailed information on how it works, please see the high-availability architecture section.
Creating a New Replica
To create a new replica, also known as “scaling up”, you can execute the following command:
pgo scale hacluster --replica-count=1
If you wanted to add two new replicas at the same time, you could execute the following command:
pgo scale hacluster --replica-count=2
Viewing Available Replicas
You can view the available replicas in a few ways. First, you can use pgo show cluster
to see the overall information about the PostgreSQL cluster:
pgo show cluster hacluster
You can also find specific replica names by using the --query
flag on the
pgo failover
and pgo scaledown
commands, e.g.:
pgo failover --query hacluster
Manual Failover
The PostgreSQL Operator is set up with an automated failover system based on distributed consensus, but there may be times where you wish to have your cluster manually failover. If you wish to have your cluster manually failover, first, query your cluster to determine which failover targets are available. The query command also provides information that may help your decision, such as replication lag:
pgo failover --query hacluster
Once you have selected the replica that is best for your to failover to, you can perform a failover with the following command:
pgo failover hacluster --target=hacluster-abcd
where hacluster-abcd
is the name of the PostgreSQL instance that you want to
promote to become the new primary
Destroying a Replica
To destroy a replica, first query the available replicas by using the --query
flag on the pgo scaledown
command, i.e.:
pgo scaledown hacluster --query
Once you have picked the replica you want to remove, you can remove it by executing the following command:
pgo scaledown hacluster --target=hacluster-abcd
where hacluster-abcd
is the name of the PostgreSQL replica that you want to
destroy.
Cluster Maintenance & Resource Management
There are several operations that you can perform to modify a PostgreSQL cluster over its lifetime.
Modify CPU / Memory for a PostgreSQL Cluster
As database workloads change, it may be necessary to modify the CPU and memory
allocation for your PostgreSQL cluster. The PostgreSQL Operator allows for this
via the --cpu
and --memory
flags on the pgo update cluster
command. Similar to the create command, both flags accept values that follow the
Kubernetes quantity format.
For example, to update a PostgreSQL cluster to use 8 CPU cores and has 32 gibibytes of memory, you can use the following command:
pgo update cluster hacluster --cpu=8 --memory=32Gi
The resource allocations apply to all instances in a PostgreSQL cluster: this means your primary and any replicas will have the same cluster resource allocations. Be sure to specify resource requests that your Kubernetes environment can support.
NOTE: This operation can cause downtime. Modifying the resource requests allocated to a Deployment requires that the Pods in a Deployment must be restarted. Each PostgreSQL instance is safely shutdown using the “fast” shutdown method to help ensure it will not enter crash recovery mode when a new Pod is created.
When the operation completes, each PostgreSQL instance will have the new resource allocations.
Adding a Tablespace to a Cluster
Based on your workload or volume of data, you may wish to add a tablespace to your PostgreSQL cluster.
You can add a tablespace to an existing PostgreSQL cluster with the
pgo update cluster
command.
Adding a tablespace to a cluster uses a similar syntax to
creating a cluster with a tablespace, for example:
pgo update cluster hacluster \
--tablespace=name=tablespace3:storageconfig=storageconfigname
NOTE: This operation can cause downtime. In order to add a tablespace to a PostgreSQL cluster, persistent volume claims (PVCs) need to be created and mounted to each PostgreSQL instance in the cluster. The act of mounting a new PVC to a Kubernetes Deployment causes the Pods in the deployment to restart.
Each PostgreSQL instance is safely shutdown using the “fast” shutdown method to help ensure it will not enter crash recovery mode when a new Pod is created.
When the operation completes, the tablespace will be set up and accessible to use within the PostgreSQL cluster.
For more information on tablespaces, please visit the tablespace section of the documentation.
Clone a PostgreSQL Cluster
You can create a copy of an existing PostgreSQL cluster in a new PostgreSQL
cluster by using the pgo create cluster
command with the --restore-from
flag (and, if needed, --restore-opts
).
The command copies the pgBackRest repository from either an active PostgreSQL
cluster, or a pgBackRest repository that exists from a former cluster that was
deleted using pgo delete cluster --keep-backups
.
You can clone a PostgreSQL cluster by running the following command:
pgo clone hacluster newhacluster
By leveraging pgo create cluster
, you are able to copy the data from a
PostgreSQL cluster while creating the topology of a new cluster the way you want
to. For instance, if you want to copy data from an existing cluster that does
not have metrics to a new cluster that does, you can accomplish that with the
following command:
pgo create cluster newcluster --restore-from=oldcluster --metrics
Clone a PostgreSQL Cluster to Different PVC Size
You can have a cloned PostgreSQL cluster use a different PVC size, which is useful when moving your PostgreSQL cluster to a larger PVC. For example, to clone a PostgreSQL cluster to a 256GiB PVC, you can execute the following command:
pgo create cluster bighippo --restore-from=hippo --pvc-size=256Gi
You can also have the cloned PostgreSQL cluster use a larger pgBackRest backup repository by setting its PVC size. For example, to have a cloned PostgreSQL cluster use a 1TiB pgBackRest repository, you can execute the following command:
pgo create cluster bighippo --restore-from=hippo --pgbackrest-pvc-size=1Ti
Enable TLS
TLS allows secure TCP connections to PostgreSQL, and the PostgreSQL Operator makes it easy to enable this PostgreSQL feature. The TLS support in the PostgreSQL Operator does not make an opinion about your PKI, but rather loads in your TLS key pair that you wish to use for the PostgreSQL server as well as its corresponding certificate authority (CA) certificate. Both of these Secrets are required to enable TLS support for your PostgreSQL cluster when using the PostgreSQL Operator, but it in turn allows seamless TLS support.
Setup
There are three items that are required to enable TLS in your PostgreSQL clusters:
- A CA certificate
- A TLS private key
- A TLS certificate
There are a variety of methods available to generate these items: in fact, Kubernetes comes with its own certificate management system! It is up to you to decide how you want to manage this for your cluster. The PostgreSQL documentation also provides an example for how to generate a TLS certificate as well.
To set up TLS for your PostgreSQL cluster, you have to create two Secrets: one that contains the CA certificate, and the other that contains the server TLS key pair.
First, create the Secret that contains your CA certificate. Create the Secret as a generic Secret, and note that the following requirements must be met:
- The Secret must be created in the same Namespace as where you are deploying your PostgreSQL cluster
- The
name
of the key that is holding the CA must beca.crt
There are optional settings for setting up the CA secret:
- You can pass in a certificate revocation list (CRL) for the CA secret by
passing in the CRL using the
ca.crl
key name in the Secret.
For example, to create a CA Secret with the trusted CA to use for the PostgreSQL clusters, you could execute the following command:
kubectl create secret generic postgresql-ca --from-file=ca.crt=/path/to/ca.crt
To create a CA Secret that includes a CRL, you could execute the following command:
kubectl create secret generic postgresql-ca \
--from-file=ca.crt=/path/to/ca.crt \
--from-file=ca.crl=/path/to/ca.crl
Note that you can reuse this CA Secret for other PostgreSQL clusters deployed by the PostgreSQL Operator.
Next, create the Secret that contains your TLS key pair. Create the Secret as a a TLS Secret, and note the following requirement must be met:
- The Secret must be created in the same Namespace as where you are deploying your PostgreSQL cluster
kubectl create secret tls hacluster-tls-keypair \
--cert=/path/to/server.crt \
--key=/path/to/server.key
Now you can create a TLS-enabled PostgreSQL cluster!
Create a TLS Enabled PostgreSQL Cluster
Using the above example, to create a TLS-enabled PostgreSQL cluster that can accept both TLS and non-TLS connections, execute the following command:
pgo create cluster hacluster-tls \
--server-ca-secret=postgresql-ca \
--server-tls-secret=hacluster-tls-keypair
Including the --server-ca-secret
and --server-tls-secret
flags automatically
enable TLS connections in the PostgreSQL cluster that is deployed. These flags
should reference the CA Secret and the TLS key pair Secret, respectively.
If deployed successfully, when you connect to the PostgreSQL cluster, assuming
your PGSSLMODE
is set to prefer
or higher, you will see something like this
in your psql
terminal:
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Force TLS in a PostgreSQL Cluster
There are many environments where you want to force all remote connections to
occur over TLS, for example, if you deploy your PostgreSQL cluster’s in a public
cloud or on an untrusted network. The PostgreSQL Operator lets you force all
remote connections to occur over TLS by using the --tls-only
flag.
For example, using the setup above, you can force TLS in a PostgreSQL cluster by executing the following command:
pgo create cluster hacluster-tls-only \
--tls-only \
--server-ca-secret=hacluster-tls-keypair --server-tls-secret=postgresql-ca
If deployed successfully, when you connect to the PostgreSQL cluster, assuming
your PGSSLMODE
is set to prefer
or higher, you will see something like this
in your psql
terminal:
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
If you try to connect to a PostgreSQL cluster that is deployed using the
--tls-only
with TLS disabled (i.e. PGSSLMODE=disable
), you will receive an
error that connections without TLS are unsupported.
TLS Authentication for PostgreSQL Replication
PostgreSQL supports certificate-based authentication, which allows for PostgreSQL to authenticate users based on the common name (CN) in a certificate. Using this feature, the PostgreSQL Operator allows you to configure PostgreSQL replicas in a cluster to authenticate using a certificate instead of a password.
To use this feature, first you will need to set up a Kubernetes TLS Secret that
has a CN of primaryuser
. If you do not wish to have this as your CN, you will
need to map the CN of this certificate to the value of primaryuser
using a
pg_ident
username map, which you can configure as part of a
custom PostgreSQL configuration.
You also need to ensure that the certificate is verifiable by the certificate
authority (CA) chain that you have provided for your PostgreSQL cluster. The CA
is provided as part of the --server-ca-secret
flag in the
pgo create cluster
command.
To create a PostgreSQL cluster that uses TLS authentication for replication,
first create Kubernetes Secrets for the server and the CA. For the purposes of
this example, we will use the ones that were created earlier: postgresql-ca
and hacluster-tls-keypair
. After generating a certificate that has a CN of
primaryuser
, create a Kubernetes Secret that references this TLS keypair
called hacluster-tls-replication-keypair
:
kubectl create secret tls hacluster-tls-replication-keypair \
--cert=/path/to/replication.crt \
--key=/path/to/replication.key
We can now create a PostgreSQL cluster and allow for it to use TLS authentication for its replicas! Let’s create a PostgreSQL cluster with two replicas that also requires TLS for any connection:
pgo create cluster hippo \
--tls-only \
--server-ca-secret=postgresql-ca \
--server-tls-secret=hacluster-tls-keypair \
--replication-tls-secret=hacluster-tls-replication-keypair \
--replica-count=2
By default, the PostgreSQL Operator has each replica connect to PostgreSQL using
a PostgreSQL TLS mode
of verify-ca
. If you wish to perform TLS mutual authentication between
PostgreSQL instances (i.e. certificate-based authentication with SSL mode of
verify-full
), you will need to create a
PostgreSQL custom configuration.
Custom PostgreSQL Configuration
Customizing PostgreSQL configuration is currently not subject to the pgo
client, but given it is a common question, we thought it may be helpful to link
to how to do it from here. To find out more about how to
customize your PostgreSQL configuration,
please refer to the Custom PostgreSQL Configuration
section of the documentation.
pgAdmin 4: PostgreSQL Administration
pgAdmin 4 is a popular graphical user interface that lets you work with PostgreSQL databases from both a desktop or web-based client. In the case of the PostgreSQL Operator, the pgAdmin 4 web client can be deployed and synchronized with PostgreSQL clusters so that users can administrate their databases with their PostgreSQL username and password.
For example, let’s work with a PostgreSQL cluster called hippo
that has a user named hippo
with password datalake
, e.g.:
pgo create cluster hippo --username=hippo --password=datalake
Once the hippo
PostgreSQL cluster is ready, create the pgAdmin 4 deployment
with the pgo create pgadmin
command:
pgo create pgadmin hippo
This creates a pgAdmin 4 deployment unique to this PostgreSQL cluster and
synchronizes the PostgreSQL user information into it. To access pgAdmin 4, you
can set up a port-forward to the Service, which follows the
pattern <clusterName>-pgadmin
, to port 5050
:
kubectl port-forward svc/hippo-pgadmin 5050:5050
Point your browser at http://localhost:5050
and use your database username
(e.g. hippo
) and password (e.g. datalake
) to log in.
(Note: if your password does not appear to work, you can retry setting up the
user with the pgo update user
command: pgo update user hippo --password=datalake
)
The pgo create user
, pgo update user
, and pgo delete user
commands are
synchronized with the pgAdmin 4 deployment. Any user with credentials to this
PostgreSQL cluster will be able to log in and use pgAdmin 4:
You can remove the pgAdmin 4 deployment with the pgo delete pgadmin
command.
For more information, please read the pgAdmin 4 Architecture section of the documentation.
Standby Clusters: Multi-Cluster Kubernetes Deployments
A standby PostgreSQL cluster can be used to create an advanced high-availability set with a PostgreSQL cluster running in a different Kubernetes cluster, or used for other operations such as migrating from one PostgreSQL cluster to another. Note: this is not high availability per se: a high-availability PostgreSQL cluster will automatically fail over upon a downtime event, whereas a standby PostgreSQL cluster must be explicitly promoted.
With that said, you can run multiple PostgreSQL Operators in different Kubernetes clusters, and the below functionality will work!
Below are some commands for setting up and using standby PostgreSQL clusters. For more details on how standby clusters work, please review the section on Kubernetes Multi-Cluster Deployments.
Creating a Standby Cluster
Before creating a standby cluster, you will need to ensure that your primary cluster is created properly. Standby clusters require the use of S3 or equivalent S3-compatible storage system that is accessible to both the primary and standby clusters. For example, to create a primary cluster to these specifications:
pgo create cluster hippo --pgbouncer --replica-count=2 \
--pgbackrest-storage-type=local,s3 \
--pgbackrest-s3-key=<redacted> \
--pgbackrest-s3-key-secret=<redacted> \
--pgbackrest-s3-bucket=watering-hole \
--pgbackrest-s3-endpoint=s3.amazonaws.com \
--pgbackrest-s3-region=us-east-1 \
--pgbackrest-s3-uri-style=host \
--pgbackrest-s3-verify-tls=true \
--password-superuser=supersecrethippo \
--password-replication=somewhatsecrethippo \
--password=opensourcehippo
Before setting up the standby PostgreSQL cluster, you will need to wait a few moments for the primary PostgreSQL cluster to be ready. Once your primary PostgreSQL cluster is available, you can create a standby cluster by using the following command:
pgo create cluster hippo-standby --standby --replica-count=2 \
--pgbackrest-storage-type=s3 \
--pgbackrest-s3-key=<redacted> \
--pgbackrest-s3-key-secret=<redacted> \
--pgbackrest-s3-bucket=watering-hole \
--pgbackrest-s3-endpoint=s3.amazonaws.com \
--pgbackrest-s3-region=us-east-1 \
--pgbackrest-s3-uri-style=host \
--pgbackrest-s3-verify-tls=true \
--pgbackrest-repo-path=/backrestrepo/hippo-backrest-shared-repo \
--password-superuser=supersecrethippo \
--password-replication=somewhatsecrethippo \
--password=opensourcehippo
The standby cluster will take a few moments to bootstrap, but it is now set up!
Promoting a Standby Cluster
Before promoting a standby cluster, it is first necessary to shut down the primary cluster, otherwise you can run into a potential “split-brain“ scenario (if your primary Kubernetes cluster is down, it may not be possible to do this).
To shutdown, run the following command:
pgo update cluster hippo --shutdown
Once it is shut down, you can promote the standby cluster:
pgo update cluster hippo-standby --promote-standby
The standby is now an active PostgreSQL cluster and can start to accept writes.
To convert the previous active cluster into a standby cluster, you can run the following command:
pgo update cluster hippo --enable-standby
This will take a few moments to make this PostgreSQL cluster into a standby cluster. When it is ready, you can start it up with the following command:
pgo update cluster hippo --startup
Monitoring
View Disk Utilization
You can see a comparison of Postgres data size versus the Persistent volume claim size by entering the following:
pgo df hacluster -n pgouser1
PostgreSQL Metrics via pgMonitor
You can view metrics about your PostgreSQL cluster using the pgMonitor stack by deploying the “crunchy-collect” sidecar with the PostgreSQL cluster:
pgo create cluster hacluster --metrics
Note: To store and visualize the metrics, you must deploy Prometheus and Grafana with yoru PostgreSQL cluster. For instructions on installing Grafana and Prometheus in your environment, please review the installation instructions for the metrics stack.
Labels
Labels are a helpful way to organize PostgreSQL clusters, such as by application type or environment. The PostgreSQL Operator supports managing Kubernetes Labels as a convenient way to group PostgreSQL clusters together.
You can view which labels are assigned to a PostgreSQL cluster using the
pgo show cluster
command. You are also
able to see these labels when using kubectl
or oc
.
Add a Label to a PostgreSQL Cluster
Labels can be added to PostgreSQL clusters using the pgo label
command. For example, to add a label with a key/value pair of env=production
,
you could execute the following command:
pgo label hacluster --label=env=production
Add a Label to Multiple PostgreSQL Clusters
You can add also add a label to multiple PostgreSQL clusters simultaneously
using the --selector
flag on the pgo label
command. For example, to add a
label with a key/value pair of env=production
to clusters that have a label
key/value pair of app=payment
, you could execute the following command:
pgo label --selector=app=payment --label=env=production
Policy Management
Create a Policy
To create a SQL policy, enter the following:
pgo create policy mypolicy --in-file=mypolicy.sql -n pgouser1
This examples creates a policy named mypolicy using the contents of the file mypolicy.sql which is assumed to be in the current directory.
You can view policies as following:
pgo show policy --all -n pgouser1
Apply a Policy
pgo apply mypolicy --selector=environment=prod
pgo apply mypolicy --selector=name=hacluster
Advanced Operations
Connection Pooling via pgBouncer
To add a pgbouncer Deployment to your Postgres cluster, enter:
pgo create cluster hacluster --pgbouncer -n pgouser1
You can add pgbouncer after a Postgres cluster is created as follows:
pgo create pgbouncer hacluster
pgo create pgbouncer --selector=name=hacluster
You can also specify a pgbouncer password as follows:
pgo create cluster hacluster --pgbouncer --pgbouncer-pass=somepass -n pgouser1
You can remove a pgbouncer from a cluster as follows:
pgo delete pgbouncer hacluster -n pgouser1
Query Analysis via pgBadger
You can create a pgbadger sidecar container in your Postgres cluster pod as follows:
pgo create cluster hacluster --pgbadger -n pgouser1
Create a Cluster using Specific Storage
pgo create cluster hacluster --storage-config=somestorageconfig -n pgouser1
Likewise, you can specify a storage configuration when creating a replica:
pgo scale hacluster --storage-config=someslowerstorage -n pgouser1
This example specifies the somestorageconfig storage configuration to be used by the Postgres cluster. This lets you specify a storage configuration that is defined in the pgo.yaml file specifically for a given Postgres cluster.
You can create a Cluster using a Preferred Node as follows:
pgo create cluster hacluster --node-label=speed=superfast -n pgouser1
That command will cause a node affinity rule to be added to the Postgres pod which will influence the node upon which Kubernetes will schedule the Pod.
Likewise, you can create a Replica using a Preferred Node as follows:
pgo scale hacluster --node-label=speed=slowerthannormal -n pgouser1
Create a Cluster with LoadBalancer ServiceType
pgo create cluster hacluster --service-type=LoadBalancer -n pgouser1
This command will cause the Postgres Service to be of a specific type instead of the default ClusterIP service type.
Namespace Operations
Create an Operator namespace where Postgres clusters can be created and managed by the Operator:
pgo create namespace mynamespace
Update a Namespace to be able to be used by the Operator:
pgo update namespace somenamespace
Delete a Namespace:
pgo delete namespace mynamespace
PostgreSQL Operator User Operations
PGO users are users defined for authenticating to the PGO REST API. You can manage those users with the following commands:
pgo create pgouser someuser --pgouser-namespaces="pgouser1,pgouser2" --pgouser-password="somepassword" --pgouser-roles="pgoadmin"
pgo create pgouser otheruser --all-namespaces --pgouser-password="somepassword" --pgouser-roles="pgoadmin"
Update a user:
pgo update pgouser someuser --pgouser-namespaces="pgouser1,pgouser2" --pgouser-password="somepassword" --pgouser-roles="pgoadmin"
pgo update pgouser otheruser --all-namespaces --pgouser-password="somepassword" --pgouser-roles="pgoadmin"
Delete a PGO user:
pgo delete pgouser someuser
PGO roles are also managed as follows:
pgo create pgorole somerole --permissions="Cat,Ls"
Delete a PGO role with:
pgo delete pgorole somerole
Update a PGO role with:
pgo update pgorole somerole --permissions="Cat,Ls"
PostgreSQL Cluster User Operations
Managed Postgres users can be viewed using the following command:
pgo show user hacluster
Postgres users can be created using the following command examples:
pgo create user hacluster --username=somepguser --password=somepassword --managed
pgo create user --selector=name=hacluster --username=somepguser --password=somepassword --managed
Those commands are identical in function, and create on the hacluster Postgres cluster, a user named somepguser, with a password of somepassword, the account is managed meaning that these credentials are stored as a Secret on the Kubernetes cluster in the Operator namespace.
Postgres users can be deleted using the following command:
pgo delete user hacluster --username=somepguser
That command deletes the user on the hacluster Postgres cluster.
Postgres users can be updated using the following command:
pgo update user hacluster --username=somepguser --password=frodo
That command changes the password for the user on the hacluster Postgres cluster.