OpenTelemetry Metrics
For an overview of the full observability architecture within CPK, including details about the architecture for OpenTelemetry metrics, please see the Database Observability Architecture page. This section will provide steps for enabling OpenTelemetry metrics.
Enabling OpenTelemetry Metrics
In order to use OpenTelemetry metrics, the OpenTelemetryMetrics
feature gate must first be enabled in your CPK installation.
Please see the Feature Gate Installation Guide for guidance on how to
properly enable this feature gate within your installation.
Once the feature gate is enabled, you will be able to create PostgresClusters with OpenTelemetry metrics. To do that,
add an instrumentation
block to your PostgresCluster spec, like so for a default-only configuration:
spec:
instrumentation: {}
You will see OpenTelemetry collector sidecars deployed alongside the various components comprising your PostgresCluster that are enabled to export metrics: currently Postgres and pgBouncer.
By default, CPK will setup a Prometheus exporter that will expose these metrics on port 9187
at the /metrics
endpoint. If you are using the CPK Monitoring installer, then you are good to go as it is already set up to scrape this endpoint for metrics. If you are using a custom metric aggregation method, just point that towards the Postgres and/or pgBouncer pods at the 9187/metrics
endpoint.
Customizing Postgres Queries
While the metrics that are provided out of the box will cover most users' Postgres observability needs, some users will want to add their own metrics and/or remove some of the built-in metrics. Luckily, we have made both customizations very easy.
Adding Custom Metrics
Since we use the OTel SqlQueryReceiver to run our queries and collect the metrics, in order to add your own custom metrics you will first need to put them in a YAML file in a format that the SqlQueryReceiver accepts. This should be a list or array of entries that each have an sql
field that holds the SQL statement that will run on the database and a metrics
section that holds a list of the different metrics that are associated with that SQL statement. Each metric entry must have:
- a
metric_name
, which holds the desired name of the metric - a
value_column
, which holds the column name in the returned dataset that is used to set the value of the metric's datapoint
As an example of formatting, see the following query and its respective metrics from our built-in queries:
- sql: >
SELECT datname AS dbname
, checksum_failures AS count
, coalesce(extract(epoch from (clock_timestamp() - checksum_last_failure)), 0) AS time_since_last_failure_seconds
FROM pg_catalog.pg_stat_database
WHERE pg_stat_database.datname IS NOT NULL;
metrics:
- metric_name: ccp_data_checksum_failure_count
value_column: count
attribute_columns: ["dbname"]
description: Total number of checksum failures on this database
static_attributes:
server: "localhost:5432"
- metric_name: ccp_data_checksum_failure_time_since_last_failure_seconds
value_column: time_since_last_failure_seconds
value_type: double
attribute_columns: ["dbname"]
description: Time interval in seconds since the last checksum failure was encountered
static_attributes:
server: "localhost:5432"
As you can see there are other optional fields that can be set on each metric. See the SqlQueryReceiver documentation for details on all of the various settings.
Once you have your custom queries and metrics in an SqlQueryReceiver compliant yaml file, you will want to put it in a ConfigMap. If we had two queries files, one that contains queries we plan to run very frequently and one that contains queries we plan to run less frequently, we might create our ConfigMap using a command similar to the below:
kubectl create configmap my-custom-queries --from-file=path/to/file/my-slow-custom-queries.yaml --from-file=path/to/file/my-fast-custom-queries.yaml -n postgres-operator
Once the ConfigMap is created, you need to edit your spec to tell the operator to use the new queries. There is an instrumentation.metrics.customQueries.add
section which can hold a list of entries, where each entry has a name
, a queries
section where you specify the ConfigMap and the file within, and the optional collectionInterval
which tells the receiver how often to run the queries. For example:
spec:
instrumentation:
metrics:
customQueries:
add:
- name: slow-custom-queries
queries:
name: my-custom-queries
key: my-slow-custom-queries.yaml
collectionInterval: 300s
- name: 2fast2furious
queries:
name: my-custom-queries
key: my-fast-custom-queries.yaml
In this example, we are adding two sets of queries, both of which come from files in our my-custom-queries
ConfigMap. The first set uses the my-slow-custom-queries.yaml
file and is named slow-custom-queries
. This set of queries will be run every 5 minutes. The second set uses the my-fast-custom-queries.yaml
file and is named 2fast2furious
. It doesn't have a collectionInterval
set, so it will use the default setting, which is 5 seconds.
Removing Built-in Metrics
Removing default metrics is even more straight forward; simply add the names of the metrics that you no longer want to the spec.instrumentation.metrics.customQueries.remove
list. For example:
spec:
instrumentation:
metrics:
customQueries:
remove:
- ccp_pg_stat_activity_count
- ccp_stat_user_tables_autoanalyze_count
If all metrics for a given SQL query are removed, the SQL query will no longer be run.