Monitoring

While having high availability and disaster recovery systems in place helps in the event of something going wrong with your PostgreSQL cluster, monitoring helps you anticipate problems before they happen. Monitoring can also help you diagnose and resolve issues that may cause degraded performance.

The Crunchy Postgres for Kubernetes Monitoring stack is a fully integrated solution for monitoring and visualizing metrics captured from PostgreSQL clusters created using Crunchy Postgres for Kubernetes. By leveraging pgMonitor to configure and integrate the various tools, components and metrics needed to effectively monitor PostgreSQL clusters, Crunchy Postgres for Kubernetes Monitoring provides a powerful and easy-to-use solution to effectively monitor and visualize PostgreSQL database and container metrics. Included in the monitoring infrastructure are the following components:

pgMonitor - Provides the configuration needed to enable the effective capture and visualization of PostgreSQL database metrics using the various tools comprising the PostgreSQL Operator Monitoring infrastructure
Grafana - Enables visual dashboard capabilities for monitoring PostgreSQL clusters, specifically using Crunchy PostgreSQL Exporter data stored within Prometheus
Prometheus - A multi-dimensional data model with time series data, which is used in collaboration with the Crunchy PostgreSQL Exporter to provide and store metrics
Alertmanager - Handles alerts sent by Prometheus by deduplicating, grouping, and routing them to receiver integrations.

By leveraging the installation method described in this section, Crunchy Postgres for Kubernetes Monitoring can be deployed alongside Crunchy Postgres for Kubernetes.

Kustomize Install Crunchy Postgres for Kubernetes Monitoring

Examples of how to use Kustomize to install Crunchy Postgres for Kubernetes components can be found on GitHub in the Postgres Operator examples repository. Click here to fork the repository.

Once you have forked the repo, you can download it to your working environment with a command similar to this:

YOUR_GITHUB_UN="$YOUR_GITHUB_USERNAME"
git clone --depth 1 "git@github.com:${YOUR_GITHUB_UN}/postgres-operator-examples.git"
cd postgres-operator-examples

For Powershell environments:

$env:YOUR_GITHUB_UN="YOUR_GITHUB_USERNAME"
git clone --depth 1 "git@github.com:$env:YOUR_GITHUB_UN/postgres-operator-examples.git"
cd postgres-operator-examples

You now have what you need to follow along with the installation steps.

Install the Crunchy Postgres Exporter Sidecar or OpenTelemetry Collector Sidecar

In order to export metrics from your PostgresCluster, CPK will add an exporter sidecar to some of the PostgresCluster components, depending on your chosen exporter. CPK can use the Crunchy Postgres Exporter or, if you are running CPK 5.8 or later and have the OpenTelemetryMetrics feature gate enabled, the OpenTelemetry collector.

Crunchy Postgres Exporter

The Crunchy Postgres Exporter sidecar can collect real-time metrics about a PostgreSQL database. Let's look at how we can add the sidecar to your cluster using the kustomize/postgres example in the Postgres Operator examples repository.

If you followed the Quickstart to create a Postgres cluster, go to the kustomize/postgres/postgres.yaml file and add the following YAML to the spec:

monitoring:
  pgmonitor:
    exporter: {}

Monitoring tools are added using the spec.monitoring section of the custom resource. Currently, the only monitoring tool supported is the Crunchy PostgreSQL Exporter configured with pgMonitor. Save your changes and run:

kubectl apply -k kustomize/postgres

Crunchy Postgres for Kubernetes will detect the change and add the Exporter sidecar to all Postgres Pods that exist in your cluster. Crunchy Postgres for Kubernetes will also configure the Exporter to connect to the database and gather metrics. These metrics can be accessed using the Crunchy Postgres for Kubernetes Monitoring stack.

The OpenTelemetry Collector

For an in-depth look at OpenTelemetry, check out the Database Observability architecture. For this tutorial, what you need to know is that OpenTelemetry exports logs and metrics from several components of the PostgresCluster: the Postgres pods (including Postgres and Patroni), the pgbouncer pod, and (if present) the pgbackrest Repo Host pod. You can even use OpenTelemetry to export logs from a Standalone PgAdmin deployment.

At this time, OpenTelemetry export is only available in CPK 5.8 and later and is behind two feature gates: OpenTelemetryLogs and OpenTelemetryMetrics.

If your CPK is set up to allow one or both of those features gates, you can add OpenTelemetry to a PostgresCluster or PGAdmin by adding the following YAML to the spec:

instrumentation: {}

Save your changes and run:

kubectl apply -k kustomize/postgres

Crunchy Postgres for Kubernetes will detect the change and add the OpenTelemetry Collector sidecar to the correct components for your PostgresCluster or PGAdmin. With that minimal setup, the OpenTelemetry Collector will direct parsed logs to the console (accessible through kubectl logs) and will expose metrics for scraping.

If you've installed the most recent CPK Monitoring stack, these metrics can be accessed in your Prometheus or Grafana.

For more custom options for the OpenTelemetry collector, see our pages on OpenTelemetry logging and OpenTelemetry metrics.

Locate a Kustomize installer for Monitoring

The Monitoring project is located in the kustomize/monitoring directory.

Configuration

While the default Kustomize install should work in most Kubernetes environments, it may be necessary to further customize the project according to your specific needs.

For instance, by default fsGroup is set to 26 for the securityContext defined for the various Deployments comprising the Monitoring stack:

securityContext:
  fsGroup: 26

In most Kubernetes environments this setting is needed to ensure processes within the container have the permissions needed to write to any volumes mounted to each of the Pods comprising the Monitoring stack. However, when installing in an OpenShift environment (and more specifically when using the restricted Security Context Constraint), the fsGroup setting should be removed since OpenShift will automatically handle setting the proper fsGroup within the Pod's securityContext.

Additionally, within this same section it may also be necessary to modify the supplementalGroups setting according to your specific storage configuration:

securityContext:
  supplementalGroups: 65534

Therefore, the following files (located under kustomize/monitoring) should be modified and/or patched (e.g. using additional overlays) as needed to ensure the securityContext is properly defined for your Kubernetes environment:

alertmanager/deployment.yaml
grafana/deployment.yaml
prometheus/deployment.yaml

Those files should also be modified to set appropriate constraints on compute resources for the Grafana, Prometheus and/or AlertManager deployments. And to modify the configuration for the various storage resources (i.e. PersistentVolumeClaims) created by the Monitoring installer, modify the following files:

alertmanager/pvc.yaml
grafana/pvc.yaml
prometheus/pvc.yaml

Additionally, it is also possible to further customize the configuration for the various components comprising the Monitoring stack (Grafana, Prometheus and/or AlertManager) by modifying the following configuration resources:

alertmanager/config/alertmanager.yml
grafana/config/crunchy_grafana_datasource.yml
prometheus/config/crunchy-alert-rules-pg.yml
prometheus/config/prometheus.yml

Finally, please note that the default username and password for Grafana can be updated by modifying the Secret grafana-admin defined in kustomize/monitoring/grafana/kustomization.yaml:

secretGenerator:
- name: grafana-admin
  literals:
    - "password=admin"
    - "username=admin"

Install

Once the Kustomize project has been modified according to your specific needs, Monitoring can then be installed using kubectl and Kustomize:

kubectl apply -k kustomize/monitoring

Once installed, use the kubectl port-forward command to immediately access the various Monitoring stack components. For example, to access the Grafana dashboards, use a command similar to

kubectl -n postgres-operator port-forward service/crunchy-grafana 3000:3000

and then login via a web browser pointed to localhost:3000.

If you are upgrading or altering a preexisting installation, see below for specific instructions for this use-case.

Install using Older Kubectl

This installer is optimized for Kustomize v4.0.5 or later, which is included in kubectl v1.21. If you are using an earlier version of kubectl to manage your Kubernetes objects, the kubectl apply -k kustomize/monitoring command will produce an error:

Error: json: unknown field "labels"

To fix this error, download the most recent version of Kustomize. Once you have installed Kustomize v4.0.5 or later, you can use it to produce valid Kubernetes yaml:

kustomize build kustomize/monitoring

The output from the kustomize build command can be captured to a file or piped directly to kubectl:

kustomize build kustomize/monitoring | kubectl apply -f -

Uninstall

And similarly, once Monitoring has been installed, it can uninstalled using kubectl and Kustomize:

kubectl delete -k kustomize/monitoring

Upgrading the Monitoring stack to v5.5.x

Several changes have been made to the kustomize installer for the Monitoring stack in order to make the project easier to read and modify:

Project reorganization

The project has been reorganized so that each tranche of the Monitoring stack has its own folder. This should make it easier to find and modify the Kubernetes objects or configurations for each tranche. For example, if you want to modify the Prometheus configuration, you can find the source file in prometheus/config/prometheus.yml; if you want to modify the PVC used by Prometheus, you can find the source file in prometheus/pvc.yaml.

Image and configuration updating in line with pgMonitor

Crunchy Postgres for Kubernetes Monitoring used the Grafana dashboards and configuration set by the pgMonitor project. We have updated the installer to pgMonitor v4.9 settings, including updating the images for the Alertmanager, Grafana, and Prometheus Deployments.

Regularize naming conventions

We have changed the following Kubernetes objects to regularize our installation:

the ServiceAccount prometheus-sa is renamed prometheus
the ClusterRole prometheus-cr is renamed prometheus
the ClusterRoleBinding prometheus-crb is renamed prometheus (and has been updated to take into account the ClusterRole and ServiceAccount renaming)
the ConfigMaps alertmanager-rules-config is renamed alert-rules-config
the Secret grafana-secret is renamed grafana-admin

How to upgrade the Monitoring installation

First, verify that you are using a Monitoring installation from before these changes. To verify, you can check that the existing monitoring Deployments are lacking a vendor label:

kubectl get deployments -L vendor
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE   VENDOR
crunchy-grafana        1/1     1            1           11s   
crunchy-prometheus     1/1     1            1           11s   
crunchy-alertmanager   1/1     1            1           11s

If the vendor label show crunchydata, then you are using an updated installer and do not need to follow the instructions here:

kubectl get deployments -L vendor
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE   VENDOR
crunchy-grafana        1/1     1            1           16s   crunchydata
crunchy-prometheus     1/1     1            1           16s   crunchydata
crunchy-alertmanager   1/1     1            1           16s   crunchydata

Second, if you have an older version of the Monitoring stack installed, before upgrading to the new version, you should first remove the Deployments:

kubectl delete deployments crunchy-grafana crunchy-prometheus crunchy-alertmanager

Now you can install as usual:

kubectl apply -k kustomize/monitoring

This will leave some orphaned Kubernetes objects, that can be cleaned up manually without impacting performance. The objects to be cleaned up include all of the objects listed above in point 3 on Regularize naming conventions:

kubectl delete clusterrolebinding prometheus-crb
kubectl delete serviceaccount prometheus-sa
kubectl delete clusterrole prometheus-cr
kubectl delete configmap alertmanager-rules-config
kubectl delete secret grafana-secret

Alternatively, you can install the Monitoring stack with the --prune --all flags to remove the objects that are no longer managed by this manifest:

kubectl apply -k kustomize --prune --all

This will remove those objects that are namespaced: the ConfigMap, Secret, and ServiceAccount. To prune cluster-wide objects, see the --prune-allowlist flag.

Pruning is an automated feature and should be used with caution.

Helm Install Crunchy Postgres for Kubernetes Monitoring

Examples of how to use Helm to install Crunchy Postgres for Kubernetes components can be found on GitHub in the Postgres Operator examples repository. Click here to fork this repository.

Once you have forked the repo, you can download it to your working environment with a command similar to this:

YOUR_GITHUB_UN="$YOUR_GITHUB_USERNAME"
git clone --depth 1 "git@github.com:${YOUR_GITHUB_UN}/postgres-operator-examples.git"
cd postgres-operator-examples

For Powershell environments:

$env:YOUR_GITHUB_UN="YOUR_GITHUB_USERNAME"
git clone --depth 1 "git@github.com:$env:YOUR_GITHUB_UN/postgres-operator-examples.git"
cd postgres-operator-examples

You now have what you need to follow along with the installation steps.

Install the Crunchy Postgres Exporter Sidecar or OpenTelemetry Collector Sidecar

Crunchy Postgres Exporter

The Crunchy Postgres Exporter sidecar can collect real-time metrics about a PostgreSQL database. Let's look at how we can add the sidecar to your cluster using the helm/postgres example in the Postgres Operator examples repository.

Under helm/postgres/values.yaml, you will find various options for configuring a Crunchy Postgres for Kubernetes cluster. Uncomment the section that enables monitoring and set it to true:

monitoring: true

Then, uncomment the section that installs the Exporter sidecar:

imageExporter: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-x.x.x

If your cluster is already running through a helm installation, use helm upgrade to update your cluster. Otherwise, use helm install, and you'll be ready to export metrics from your cluster.

The OpenTelemetry Collector

At this time, OpenTelemetry export is only available in CPK 5.8 and later and is behind two feature gates: OpenTelemetryLogs and OpenTelemetryMetrics.

Under helm/postgres/values.yaml, you will find various options for configuring a Crunchy Postgres for Kubernetes cluster. If your CPK is set up to allow one or both of those features gates, you can uncomment the section that enables instrumentation and set it to true:

instrumentation: true

If your cluster is already running through a helm installation, use helm upgrade to update your cluster. Otherwise, use helm install, and you'll be ready to export metrics from your cluster.

Crunchy Postgres for Kubernetes will detect the change and add the OpenTelemetry Collector sidecar to the correct components for your PostgresCluster. With that minimal setup, the OpenTelemetry Collector will direct parsed logs to the console (accessible through kubectl logs) and will expose metrics for scraping.

If you've installed the most recent CPK Monitoring stack, these metrics can be accessed in your Prometheus or Grafana.

For more custom options for the OpenTelemetry collector, see our pages on OpenTelemetry logging and OpenTelemetry metrics.

Install directly from the registry

Crunchy Data hosts an OCI registry that helm can use directly. (Not all helm commands support OCI registries. For more information on which commands can be used, see the Helm documentation.)

You can install Crunchy Postgres for Kubernetes Monitoring directly from the registry using the helm install command:

helm install crunchy oci://registry.developers.crunchydata.com/crunchydata/crunchy-monitoring

Or to see what values are set in the default values.yaml before installing, you could run a helm show command just as you would with any other registry:

helm show values oci://registry.developers.crunchydata.com/crunchydata/crunchy-monitoring

Once installed, use the kubectl port-forward command to immediately access the various Monitoring stack components. For example, to access the Grafana dashboards, use a command similar to

kubectl -n postgres-operator port-forward service/crunchy-grafana 3000:3000

Downloading from the registry

Rather than deploying directly from the Crunchy registry, you can instead use the registry as the source for the Helm chart. You might do this in order to configure the Helm chart before installing.

To do so, download the Helm chart from the Crunchy Container Registry:

# To pull down the most recent Helm chart
helm pull oci://registry.developers.crunchydata.com/crunchydata/crunchy-monitoring

# To pull down a specific Helm chart
helm pull oci://registry.developers.crunchydata.com/crunchydata/crunchy-monitoring --version 0.3.0

Once the Helm chart has been downloaded, uncompress the bundle

tar -xvf crunchy-monitoring-0.3.0.tgz

And from there, you can follow the instructions below on setting the Configuration and installing a local Helm chart.

Configuration

The values.yaml file for the Helm chart contains all of the available configuration settings for the Monitoring stack. The default values.yaml settings should work in most Kubernetes environments, but it may require some customization depending on your specific environment and needs.

For instance, it might be necessary to change the image versions for Alertmanager, Grafana, and/or Prometheus or to apply certain labels, etc. Each segment of the Monitoring stack has its own section. So if you needed to update only the Alertmanager image, you would update the alertmanager.image field.

Security Configuration

By default, the Crunchy Postgres for Kubernetes Monitoring Helm chart sets the securityContext.fsGroup to 26 for the Deployments comprising the Monitoring stack (i.e., Alertmanager, Grafana, and Prometheus).

The fsGroup setting can be removed by setting the openShift value to true. This can be done either by changing the value in the values.yaml file or by setting the value on the command line during installation or upgrade:

helm install crunchy oci://registry.developers.crunchydata.com/crunchydata/crunchy-monitoring --set openShift=true

If you need to make additional changes to pod's securityContext, it may be necessary to download the Helm chart and alter the Deployments directly rather than setting values in the values.yaml. For instance, if it is necessary to modify the supplementalGroups setting according to your specific storage configuration, you will need to update the Deployment files:

templates/alertmanager/deployment.yaml
templates/grafana/deployment.yaml
templates/prometheus/deployment.yaml

Compute and Storage Resources Configuration

To set appropriate constraints on compute resources for the Grafana, Prometheus and/or AlertManager Deployments, update the Deployment files:

templates/alertmanager/deployment.yaml
templates/grafana/deployment.yaml
templates/prometheus/deployment.yaml

Similarly, to modify the configuration for the various storage resources (i.e. PersistentVolumeClaims) created by the Monitoring installer, the pvc.yaml file can also be modified for the Alertmanager, Grafana, and Prometheus segments of the Monitoring stack.

Additional Configuration

Like the Kustomize installation, the Crunchy Postgres for Kubernetes Monitoring stack installation includes ConfigMaps with configurations for the various Deployments. It is possible to further customize the configuration for the various components comprising the Monitoring stack (Grafana, Prometheus and/or AlertManager) by modifying the configuration resources, which are located in the config directory:

alertmanager.yml
crunchy-alert-rules-pg.yml
crunchy_grafana_datasource.yml
prometheus.yml

If you want to make changes to the Grafana dashboards, those configurations and dashboard json files are located in the dashboards directory. If you wish to add a new dashboard as part of your Helm chart, you can accomplish that by putting the json file in the dashboards directory. All the json files in that directory are imported by the Helm chart and loaded in the Grafana configuration.

Finally, please note that the default username and password for Grafana can be updated by modifying the values.yaml:

grafana:
  admin:
    password: admin
    username: admin

Uninstall

To uninstall the Monitoring stack, use the helm uninstall command:

helm uninstall crunchy -n $NAMESPACE

Next Steps

Now that we can monitor our cluster, it's a good time to see how we can customize Postgres cluster configuration. If your monitoring stack needs further configuration, see our docs on Exporter Configuration and Monitoring Architecture.