Database Observability
Crunchy Postgres for Kubernetes (CPK) ensures your Postgres cluster deployments are fully observable, allowing you to easily view and analyze log and metric data for your Postgres databases, as well as any other components deployed alongside your Postgres database (pgBackRest, PgBouncer, pgAdmin and more). By leveraging the OpenTelemetry framework and standard, CPK seamlessly collects and exposes logs and metrics in a clean and consistent way that is interoperable with a variety of different observability backends. This means you can leverage a large ecosystem of different OpenTelemetry-compatible services, backends, and tooling to store, search, manage, and monitor any log or metric data generated by your Postgres databases.
Additionally, you can seamlessly collect logging and metric data across all of your Postgres cluster deployments (which may span across multiple Kubernetes clusters, data centers and regions) in a consistent and centralized manner, greatly enhancing your ability to gain deeper insights into Postgres cluster deployments. This streamlines your ability to monitor the overall health of your various Postgres cluster deployments, while also greatly enhancing your ability to troubleshoot any issues that might occur, and answer questions about certain behavior and activities that occur within your database cluster.
Observability Overview
Observability is the ability to analyze, measure, and better understand the internal state of a system using the external outputs (i.e., the telemetry data) provided by that system. These outputs come in a variety of different forms, including:
- Logs - A timestamped record or file that captures information about specific activities, changes or errors within a system.
- Metrics - A measurement of a service captured at runtime used to identify system performance, availability, and/or reliability.
- Traces - A recorded sequence of events that allows you to understand the full path of a request to a system as it traverses various services and components.
When a system is observable, system administrators, analysts, and engineers can easily answer questions around why the system behaved or responded in a certain way, without requiring detailed or direct knowledge about the internal workings of that system. This is the primary goal of the observability capabilities built into CPK: to ensure you can easily answer questions about the functionality and health of your database clusters, without requiring deep knowledge of the internal workings of each component comprising your Postgres cluster.
Observability For Databases
When running a database such as Postgres, you want to be able to closely monitor and analyze the key attributes of the system such as its overall health and performance, while also answering questions about who is accessing the database and how it is being accessed. Additionally, you want to be able to easily troubleshoot any issues that might occur, while also easily identifying the root cause for those issues.
Fortunately, Postgres creates a variety of different external outputs that can be leveraged to ensure the database is observable. This includes rich sets of logs (which can be further enhanced with various Postgres extensions), as well as key metric information that can be obtained by querying system tables within the database, and by looking at pertinent data within the environment and operating system the database is running within. For instance, CPU and memory usage can be obtained by analyzing cgroup v2 information for a container-based deployment of Postgres within Kubernetes. The same is true for the various components deployed alongside of your Postgres database, such as those that provide High Availability, Disaster Recovery, Connection Pooling, and more, all of which also provide a rich set of observable outputs.
Crunchy Postgres for Kubernetes therefore leverages these outputs to ensure all of the Postgres databases within Kubernetes are fully observable, equipping you with the tools to seamlessly monitor and analyze key attributes of any database cluster within your environment in real time. This puts you in the position of being able to answer questions about the database performance and functionality, equipping you with the critical information needed to ensure your database deployments are properly tuned and configured to ensure your applications and users get the most of out of all your Postgres cluster deployments.
Observability In Kubernetes
As a cloud-native technology, Kubernetes requires a solution for observability that is able to handle the diverse application and and system deployments that exist across complex and distributed cloud architectures. This includes a solution that is vendor agnostic, and provides a consistent framework and standards for collecting, processing, and exposing telemetry data. Fortunately, the OpenTelemetry framework was designed from the ground-up to provide a cloud-native approach to observability, making OpenTelemetry a perfect fit for enabling observability across all applications and systems within a Kubernetes environment.
OpenTelemetry Overview
OpenTelemetry is an open-source observability framework that is used to collect, analyze, and export telemetry data (logs, metrics, and traces) from a system in a consistent and standardized manner, that is both tool and vendor agnostic. OpenTelemetry therefore plays a key role in allowing you to better understand the behavior of your systems by making it easier to capture and transfer telemetry data in a standard and consistent manner. And because the OpenTelemetry standard is vendor and tool agnostic, you can easily send your telemetry data to a variety of different OpenTelemetry compliant services or backends without requiring any changes to how that data is created, collected, or exported. This means you can easily plug into the various observability backends to meet your observability needs, while also avoiding vendor lock-in or any costly changes to your telemetry implementation when you want to change backends and/or leverage new services.
The primary tool used to collect and process OpenTelemetry data is known as the OpenTelemetry collector. The OpenTelemetry collector is responsible for receiving telemetry data (e.g., logs and metrics) from various applications and services; the OpenTelemetry collector then transforms, filters, and modifies that data (e.g., according to OpenTelemetry conventions and the OpenTelemetry logging model). The collector then exports that data to a variety of different OpenTelemetry-compatible backends and services:
For detailed information about OpenTelemetry and the OpenTelemetry collector, please see the OpenTelemetry Documentation.
Observability & OpenTelemetry in CPK
By leveraging OpenTelemetry standards and tooling, CPK seamlessly collects metrics and logging data by attaching OpenTelemetry collector sidecars to all of the components comprising your Postgres cluster. For instance, not only is telemetry data collected and exported for your Postgres databases, it is also collected and exported for the High-Availability, Disaster Recovery, Connection Pooling, and User Interface components comprising your cluster. And since CPK does all of the heavy lifting to configure those components for metrics collecting, while also properly formatting those logs and metrics according to the OpenTelemetry conventions and standards, you can simply focus on deciding what OpenTelemetry-compatible service and tools you want to use to view and analyze telemetry data for your Postgres clusters, all via a simple YAML configuration within your PostgresCluster spec.
OpenTelemetry Logging in CPK
When OpenTelemetry logging is enabled, CPK automatically handles the setup and configuration needed to ensure all of the components comprising your Postgres cluster export pertinent logging information to a variety of different OpenTelemetry-compatible logging services and backends. CPK monitors and captures those logs using the OpenTelemetry collector, and transforms them according to the OpenTelemetry log data model. This results in a consistent set of logs across each of the components comprising your full Postgres cluster deployment. From there, your logs can be exported to variety of different OpenTelemetry-compatible logging backends, based on the configuration you provide in your PostgresCluster spec.
For instance, to export your logs to Google Cloud, your spec would include an instrumentation
section in your
PostgresCluster
spec similar to the following:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: otel-hippo
spec:
instrumentation:
config:
detectors:
- name: gcp
exporters:
googlecloud:
log:
default_log_name: "collector-exported-log"
resource_filters:
- prefix: "k8s"
- prefix: "db"
logs:
exporters: ['googlecloud']
This means you can simply focus on where you want to send your logs, while CPK seamlessly and automatically handles everything else (e.g., capturing, processing, and transforming of your logs).
The various types of logs that are exported from your Postgres cluster using OpenTelemetry include:
- Database Logs - Logs from the Postgres database and the pgAudit extension
- High Availability Logs - Logs from Patroni, which is responsible for keeping your Postgres clusters highly available
- Disaster Recovery Logs - Logs produced by pgBackRest when backing-up and restoring your databases
- Connection Pooling Logs - Logs produced by PgBouncer when connection pooling is enabled within a Postgres cluster
- User Interface Logs - Logs produced by pgAdmin when a Postgres user interface is deployed to manage one or more Postgres clusters
OpenTelemetry Metrics in CPK
When OpenTelemetry metrics are enabled, CPK automatically starts collecting metrics across the various components comprising your Postgres cluster. For a detailed overview of the metrics collected via OpenTelemetry, as well as the Grafana dashboards included in CPK for viewing those metrics, please see the Monitoring section of the documentation. For details on configuring OpenTelemetry metrics, such as how to add your own custom metrics, see the OpenTelemetry Metrics guide.