Crunchy Postgres Exporter

The crunchy-postgres-exporter container provides real time metrics about the PostgreSQL database via an API. These metrics are scraped and stored by a Prometheus time-series database and are then graphed and visualized through the open source data visualizer Grafana.

The crunchy-postgres-exporter container uses pgMonitor for advanced metric collection. It is required that the crunchy-postgres-ha container has the PGMONITOR_PASSWORD environment variable to create the appropriate user (ccp_monitoring) to collect metrics.

Custom queries to collect metrics can be specified by the user. By mounting a queries.yml file to /conf on the container, additional metrics can be specified for the API to collect. For an example of a queries.yml file, see here

Packages

The crunchy-postgres-exporter Docker image contains the following packages (versions vary depending on PostgreSQL version):

  • PostgreSQL (13.0, 12.4, 11.9, 10.14, 9.6.19 and 9.5.23)
  • CentOS7 - publicly available
  • UBI7 - customers only
  • PostgreSQL Exporter

Environment Variables

Required

Name Default Description
EXPORTER_PG_PASSWORD none Provides the password needed to generate the PostgreSQL URL required by the PostgreSQL Exporter to connect to a PG database. Should typically match the PGMONITOR_PASSWORD value set in the crunchy-postgres container.

Optional

Name Default Description
EXPORTER_PG_USER ccp_monitoring Provides the username needed to generate the PostgreSQL URL required by the PostgreSQL Exporter to connect to a PG database. Should typically be ccp_monitoring per the crunchy-postgres container specification (see environment varaible PGMONITOR_PASSWORD).
EXPORTER_PG_HOST 127.0.0.1 Provides the host needed to generate the PostgreSQL URL required by the PostgreSQL Exporter to connect to a PG database
EXPORTER_PG_PORT 5432 Provides the port needed to generate the PostgreSQL URL required by the PostgreSQL Exporter to connect to a PG database
EXPORTER_PG_DATABASE postgres Provides the name of the database used to generate the PostgreSQL URL required by the PostgreSQL Exporter to connect to a PG database
DATA_SOURCE_NAME None Explicitly defines the URL for connecting to the PostgreSQL database (must be in the form of postgresql://). If provided, overrides all other settings provided to generate the connection URL.
CRUNCHY_DEBUG FALSE Set this to true to enable debugging in logs. Note: this mode can reveal secrets in logs.
POSTGRES_EXPORTER_PORT 9187 Set the postgres-exporter port to listen on for web interface and telemetry.

Viewing Cluster Metrics

To view a particular cluster’s available metrics in a local browser window, port forwarding can be set up as follows. For a pgcluster, mycluster, deployed in the pgouser1 namespace, use

# If deployed to Kubernetes
kubectl port-forward -n pgouser1 svc/mycluster 9187:9187

# If deployed to OpenShift
oc port-forward -n pgouser1 svc/mycluster 9187:9187

Then, in your local browser, go to http://127.0.0.1:9187/metrics to view the available metrics for that cluster.

Crunchy Postgres Exporter Metrics Detail

Below are details on the various metrics available from the crunchy-postgres-exporter container. The name, SQL query and metric details are given for each available item.

queries_backrest

ccp_backrest_last_diff_backup

SQL Query:

WITH all_backups AS ( SELECT config_file , jsonb_array_elements(data) AS stanza_data FROM monitor.pgbackrest_info(#PGBACKREST_INFO_THROTTLE_MINUTES#) ) , per_stanza AS ( SELECT config_file , stanza_data->>'name' AS stanza , jsonb_array_elements(stanza_data->'backup') AS backup_data FROM all_backups ) SELECT config_file , stanza , extract(epoch from (CURRENT_TIMESTAMP - max(to_timestamp((backup_data->'timestamp'->>'stop')::bigint)))) AS time_since_completion_seconds FROM per_stanza WHERE backup_data->>'type' IN ('full', 'diff') GROUP BY config_file, stanza

Metrics:

config_file
Configuration file for this backup
stanza
PGBackrest Stanza Name
time_since_completion_seconds
Seconds since the last completed full or differential backup. Differential is always based off last full.

ccp_backrest_last_full_backup

SQL Query:

WITH all_backups AS ( SELECT config_file , jsonb_array_elements(data) AS stanza_data FROM monitor.pgbackrest_info(#PGBACKREST_INFO_THROTTLE_MINUTES#) ) , per_stanza AS ( SELECT config_file , stanza_data->>'name' AS stanza , jsonb_array_elements(stanza_data->'backup') AS backup_data FROM all_backups ) SELECT config_file , stanza , extract(epoch from (CURRENT_TIMESTAMP - max(to_timestamp((backup_data->'timestamp'->>'stop')::bigint)))) AS time_since_completion_seconds FROM per_stanza WHERE backup_data->>'type' IN ('full') GROUP BY config_file, stanza

Metrics:

config_file
Configuration file for this backup
stanza
PGBackrest Stanza Name
time_since_completion_seconds
Seconds since the last completed full backup

ccp_backrest_last_incr_backup

SQL Query:

WITH all_backups AS ( SELECT config_file , jsonb_array_elements(data) AS stanza_data FROM monitor.pgbackrest_info(#PGBACKREST_INFO_THROTTLE_MINUTES#) ) , per_stanza AS ( SELECT config_file , stanza_data->>'name' AS stanza , jsonb_array_elements(stanza_data->'backup') AS backup_data FROM all_backups ) SELECT config_file , stanza , extract(epoch from (CURRENT_TIMESTAMP - max(to_timestamp((backup_data->'timestamp'->>'stop')::bigint)))) AS time_since_completion_seconds FROM per_stanza WHERE backup_data->>'type' IN ('full', 'diff', 'incr') GROUP BY config_file, stanza

Metrics:

config_file
Configuration file for this backup
stanza
PGBackrest Stanza Name
time_since_completion_seconds
Seconds since the last completed full, differential or incremental backup. Incremental is always based off last full or differential.

ccp_backrest_last_info

SQL Query:

WITH all_backups AS ( SELECT config_file , jsonb_array_elements(data) AS stanza_data FROM monitor.pgbackrest_info(#PGBACKREST_INFO_THROTTLE_MINUTES#) ) , per_stanza AS ( SELECT config_file , stanza_data->>'name' AS stanza , jsonb_array_elements(stanza_data->'backup') AS backup_data FROM all_backups ) SELECT a.config_file , a.stanza , a.backup_data->>'type' AS backup_type , a.backup_data->'info'->'repository'->>'delta' AS repo_backup_size_bytes , a.backup_data->'info'->'repository'->>'size' AS repo_total_size_bytes , (a.backup_data->'timestamp'->>'stop')::bigint - (a.backup_data->'timestamp'->>'start')::bigint AS backup_runtime_seconds FROM per_stanza a JOIN ( SELECT config_file , stanza , backup_data->>'type' AS backup_type , max(backup_data->'timestamp'->>'start') AS max_backup_start , max(backup_data->'timestamp'->>'stop') AS max_backup_stop FROM per_stanza GROUP BY 1,2,3) b ON a.config_file = b.config_file AND a.stanza = b.stanza AND a.backup_data->>'type' = b.backup_type AND a.backup_data->'timestamp'->>'start' = b.max_backup_start AND a.backup_data->'timestamp'->>'stop' = b.max_backup_stop;

Metrics:

config_file
Configuration file for this backup
stanza
PGBackrest Stanza Name
backup_type
Backup type
repo_backup_size_bytes
Actual size of only this individual backup in the pgbackrest repository
repo_total_size_bytes
Total size of this backup in the pgbackrest repository, including all required previous backups and WAL
backup_runtime_seconds
Total runtime in seconds of this backup

queries_common

ccp_archive_command_status

SQL Query:

SELECT CASE WHEN EXTRACT(epoch from (last_failed_time - last_archived_time)) IS NULL THEN 0 WHEN EXTRACT(epoch from (last_failed_time - last_archived_time)) < 0 THEN 0 ELSE EXTRACT(epoch from (last_failed_time - last_archived_time)) END AS seconds_since_last_fail FROM pg_catalog.pg_stat_archiver

Metrics:

seconds_since_last_fail
Seconds since the last recorded failure of the archive_command

ccp_database_size

SQL Query:

SELECT datname as dbname, pg_database_size(datname) as bytes FROM pg_catalog.pg_database WHERE datistemplate = false

Metrics:

dbname
Database name
bytes
Database size in bytes

ccp_is_in_recovery

SQL Query:

SELECT CASE WHEN pg_is_in_recovery = true THEN 1 ELSE 2 END AS status from pg_is_in_recovery();

Metrics:

status
Return value of 1 means database is in recovery. Otherwise 2 it is a primary.

ccp_locks

SQL Query:

SELECT pg_database.datname as dbname, tmp.mode, COALESCE(count,0) as count FROM ( VALUES ('accesssharelock'), ('rowsharelock'), ('rowexclusivelock'), ('shareupdateexclusivelock'), ('sharelock'), ('sharerowexclusivelock'), ('exclusivelock'), ('accessexclusivelock') ) AS tmp(mode) CROSS JOIN pg_catalog.pg_database LEFT JOIN (SELECT database, lower(mode) AS mode,count(*) AS count FROM pg_catalog.pg_locks WHERE database IS NOT NULL GROUP BY database, lower(mode) ) AS tmp2 ON tmp.mode=tmp2.mode and pg_database.oid = tmp2.database

Metrics:

dbname
Database name
mode
Lock type
count
Number of locks

ccp_pg_settings_checksum

SQL Query:

SELECT monitor.pg_settings_checksum() AS status

Metrics:

status
Value of checksum monitioring status for pg_catalog.pg_settings (postgresql.conf). 0 = valid config. 1 = settings changed. To reset current config to valid after alert, run monitor.pg_settings_checksum_set_valid().

ccp_postgresql_version

SQL Query:

SELECT current_setting('server_version_num')::int AS current

Metrics:

current
The current version of PostgreSQL that this exporter is running on as a 6 digit integer (######).

ccp_postmaster_runtime

SQL Query:

SELECT extract('epoch' from pg_postmaster_start_time) as start_time_seconds from pg_catalog.pg_postmaster_start_time()

Metrics:

start_time_seconds
Time at which postmaster started. Note this metric has been deprecated as of pgMonitor 4.3 and will be removed in a future version. Use ccp_postmaster_uptime_seconds instead.

ccp_postmaster_uptime

SQL Query:

SELECT extract(epoch from (now() - pg_postmaster_start_time() )) AS seconds;

Metrics:

seconds
Time interval in seconds since PostgreSQL database was last restarted

ccp_sequence_exhaustion

SQL Query:

SELECT count FROM monitor.sequence_exhaustion(75)

Metrics:

count
Count of sequences that have reached greater than or equal to 75% of their max available numbers. Function monitor.sequence_status() can provide more details if run directly on system.

ccp_settings_gauge

SQL Query:

select (select setting::int from pg_catalog.pg_settings where name = 'checkpoint_timeout') as checkpoint_timeout , (select setting::float from pg_catalog.pg_settings where name = 'checkpoint_completion_target') as checkpoint_completion_target , (select 8192*setting::bigint as bytes from pg_catalog.pg_settings where name = 'shared_buffers') as shared_buffers

Metrics:

checkpoint_timeout
Checkpoint timeout in seconds
checkpoint_completion_target
Checkpoint completion target, ranging from 0 to 1
shared_buffers
Size of shared_buffers in bytes

ccp_settings_pending_restart

SQL Query:

SELECT count(*) AS count FROM pg_catalog.pg_settings WHERE pending_restart = true

Metrics:

count
Number of settings from pg_settings catalog in a pending_restart state

ccp_stat_bgwriter

SQL Query:

SELECT checkpoints_timed, checkpoints_req, checkpoint_write_time, checkpoint_sync_time, buffers_checkpoint, buffers_clean, maxwritten_clean, buffers_backend, buffers_backend_fsync, buffers_alloc, stats_reset FROM pg_catalog.pg_stat_bgwriter

Metrics:

checkpoints_timed
Number of scheduled checkpoints that have been performed
checkpoints_req
Number of requested checkpoints that have been performed
checkpoint_write_time
Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk, in milliseconds
checkpoint_sync_time
Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk, in milliseconds
buffers_checkpoint
Number of buffers written during checkpoints
buffers_clean
Number of buffers written by the background writer
maxwritten_clean
Number of times the background writer stopped a cleaning scan because it had written too many buffers
buffers_backend
Number of buffers written directly by a backend
buffers_backend_fsync
Number of times a backend had to execute its own fsync call (normally the background writer handles those even when the backend does its own write)
buffers_alloc
Number of buffers allocated
stats_reset
Time at which these statistics were last reset

ccp_stat_database

SQL Query:

SELECT s.datname as dbname, xact_commit, xact_rollback, blks_read, blks_hit, tup_returned, tup_fetched, tup_inserted, tup_updated, tup_deleted, conflicts, temp_files, temp_bytes, deadlocks FROM pg_catalog.pg_stat_database s JOIN pg_catalog.pg_database d on d.datname = s.datname WHERE d.datistemplate = false

Metrics:

dbname
Name of database
xact_commit
Number of transactions in this database that have been committed
xact_rollback
Number of transactions in this database that have been rolled back
blks_read
Number of disk blocks read in this database
blks_hit
Number of times disk blocks were found already in the buffer cache, so that a read was not necessary
tup_returned
Number of rows returned by queries in this database
tup_fetched
Number of rows fetched by queries in this database
tup_inserted
Number of rows inserted by queries in this database
tup_updated
Number of rows updated by queries in this database
tup_deleted
Number of rows deleted by queries in this database
conflicts
Number of queries canceled due to conflicts with recovery in this database
temp_files
Number of rows deleted by queries in this database
temp_bytes
Total amount of data written to temporary files by queries in this database
deadlocks
Number of deadlocks detected in this database

ccp_transaction_wraparound

SQL Query:

WITH max_age AS ( SELECT 2000000000 as max_old_xid, setting AS autovacuum_freeze_max_age FROM pg_catalog.pg_settings WHERE name = 'autovacuum_freeze_max_age'), per_database_stats AS ( SELECT datname , m.max_old_xid::int , m.autovacuum_freeze_max_age::int , age(d.datfrozenxid) AS oldest_current_xid FROM pg_catalog.pg_database d JOIN max_age m ON (true) WHERE d.datallowconn) SELECT max(oldest_current_xid) AS oldest_current_xid , max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS percent_towards_wraparound , max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float))) AS percent_towards_emergency_autovac FROM per_database_stats

Metrics:

oldest_current_xid
Oldest current transaction ID in cluster
percent_towards_wraparound
Percentage towards transaction ID wraparound
percent_towards_emergency_autovac
Percentage towards emergency autovacuum process starting

queries_per_db

ccp_stat_user_tables

SQL Query:

SELECT current_database() as dbname, schemaname, relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd, n_live_tup, n_dead_tup, vacuum_count, autovacuum_count, analyze_count, autoanalyze_count FROM pg_catalog.pg_stat_user_tables

Metrics:

dbname
Database name
schemaname
Name of the schema that this table is in
relname
Name of this table
seq_scan
Number of sequential scans initiated on this table
seq_tup_read
Number of live rows fetched by sequential scans
idx_scan
Number of index scans initiated on this table
idx_tup_fetch
Number of live rows fetched by index scans
n_tup_ins
Number of rows inserted
n_tup_upd
Number of rows updated
n_tup_del
Number of rows deleted
n_tup_hot_upd
Number of rows HOT updated (i.e., with no separate index update required)
n_live_tup
Estimated number of live rows
n_dead_tup
Estimated number of dead rows
vacuum_count
Number of times this table has been manually vacuumed (not counting VACUUM FULL)
autovacuum_count
Number of times this table has been vacuumed by the autovacuum daemon
analyze_count
Number of times this table has been manually analyzed
autoanalyze_count
Number of times this table has been analyzed by the autovacuum daemon

ccp_table_size

SQL Query:

SELECT current_database() as dbname, n.nspname as schemaname, c.relname, pg_total_relation_size(c.oid) as size_bytes FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid WHERE NOT pg_is_other_temp_schema(n.oid) AND relkind IN ('r', 'm', 'f')

Metrics:

dbname
Database name
schemaname
Schema name
relname
Table name
size_bytes
Table size including indexes

queries_pg10

ccp_connection_stats

SQL Query:

select ((total - idle) - idle_in_txn) as active , total , idle , idle_in_txn , (select coalesce(extract(epoch from (max(now() - state_change))),0) from pg_catalog.pg_stat_activity where state = 'idle in transaction') as max_idle_in_txn_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and state <> 'idle' ) as max_query_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and wait_event_type = 'Lock' ) as max_blocked_query_time , max_connections from ( select count(*) as total , coalesce(sum(case when state = 'idle' then 1 else 0 end),0) as idle , coalesce(sum(case when state = 'idle in transaction' then 1 else 0 end),0) as idle_in_txn from pg_catalog.pg_stat_activity) x join (select setting::float AS max_connections FROM pg_settings WHERE name = 'max_connections') xx ON (true);

Metrics:

active
Total non-idle connections
total
Total idle and non-idle connections
idle
Total idle connections
idle_in_txn
Total idle in transaction connections
max_idle_in_txn_time
Length of time in seconds of the longest idle in transaction session
max_query_time
Length of time in seconds of the longest running query
max_blocked_query_time
Length of time in seconds of the longest running query that has been blocked by a heavyweight lock
max_connections
Value of max_connections for the monitored database

ccp_pg_hba_checksum

SQL Query:

SELECT monitor.pg_hba_checksum() AS status

Metrics:

status
Value of checksum monitioring status for pg_catalog.pg_hba_file_rules (pg_hba.conf). 0 = valid config. 1 = settings changed. To reset current config to valid after alert, run monitor.pg_hba_checksum_set_valid().

ccp_replication_lag

SQL Query:

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replay_time

Metrics:

replay_time
Length of time since the last transaction was replayed on replica. Will always increase if no writes on primary.

ccp_replication_lag_size

SQL Query:

SELECT client_addr as replica , client_hostname as replica_hostname , client_port as replica_port , pg_wal_lsn_diff(sent_lsn, replay_lsn) as bytes FROM pg_catalog.pg_stat_replication

Metrics:

replica
Replica address
replica_hostname
Replica hostname
replica_port
Replica port
bytes
Replication lag in bytes

ccp_replication_slots

SQL Query:

SELECT slot_name, active::int, pg_wal_lsn_diff(pg_current_wal_insert_lsn(), restart_lsn) AS retained_bytes FROM pg_catalog.pg_replication_slots

Metrics:

slot_name
Name of replication slot
active
Active state of slot. 1 = true. 0 = false.
retained_bytes
The amount of WAL (in bytes) being retained for this slot

ccp_wal_activity

SQL Query:

SELECT last_5_min_size_bytes, (SELECT COALESCE(sum(size),0) FROM pg_catalog.pg_ls_waldir()) AS total_size_bytes FROM (SELECT COALESCE(sum(size),0) AS last_5_min_size_bytes FROM pg_catalog.pg_ls_waldir() WHERE modification > CURRENT_TIMESTAMP - '5 minutes'::interval) x;

Metrics:

last_5_min_size_bytes
Current size in bytes of the last 5 minutes of WAL generation. Includes recycled WALs.
total_size_bytes
Current size in bytes of the WAL directory

queries_pg11

ccp_connection_stats

SQL Query:

select ((total - idle) - idle_in_txn) as active , total , idle , idle_in_txn , (select coalesce(extract(epoch from (max(now() - state_change))),0) from pg_catalog.pg_stat_activity where state = 'idle in transaction') as max_idle_in_txn_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and state <> 'idle' ) as max_query_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and wait_event_type = 'Lock' ) as max_blocked_query_time , max_connections from ( select count(*) as total , coalesce(sum(case when state = 'idle' then 1 else 0 end),0) as idle , coalesce(sum(case when state = 'idle in transaction' then 1 else 0 end),0) as idle_in_txn from pg_catalog.pg_stat_activity) x join (select setting::float AS max_connections FROM pg_settings WHERE name = 'max_connections') xx ON (true);

Metrics:

active
Total non-idle connections
total
Total idle and non-idle connections
idle
Total idle connections
idle_in_txn
Total idle in transaction connections
max_idle_in_txn_time
Length of time in seconds of the longest idle in transaction session
max_query_time
Length of time in seconds of the longest running query
max_blocked_query_time
Length of time in seconds of the longest running query that has been blocked by a heavyweight lock
max_connections
Value of max_connections for the monitored database

ccp_pg_hba_checksum

SQL Query:

SELECT monitor.pg_hba_checksum() AS status

Metrics:

status
Value of checksum monitioring status for pg_catalog.pg_hba_file_rules (pg_hba.conf). 0 = valid config. 1 = settings changed. To reset current config to valid after alert, run monitor.pg_hba_checksum_set_valid().

ccp_replication_lag

SQL Query:

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replay_time

Metrics:

replay_time
Length of time since the last transaction was replayed on replica. Will always increase if no writes on primary.

ccp_replication_lag_size

SQL Query:

SELECT client_addr as replica , client_hostname as replica_hostname , client_port as replica_port , pg_wal_lsn_diff(sent_lsn, replay_lsn) as bytes FROM pg_catalog.pg_stat_replication

Metrics:

replica
Replica address
replica_hostname
Replica hostname
replica_port
Replica port
bytes
Replication lag in bytes

ccp_replication_slots

SQL Query:

SELECT slot_name, active::int, pg_wal_lsn_diff(pg_current_wal_insert_lsn(), restart_lsn) AS retained_bytes FROM pg_catalog.pg_replication_slots

Metrics:

slot_name
Name of replication slot
active
Active state of slot. 1 = true. 0 = false.
retained_bytes
The amount of WAL (in bytes) being retained for this slot

ccp_wal_activity

SQL Query:

SELECT last_5_min_size_bytes, (SELECT COALESCE(sum(size),0) FROM pg_catalog.pg_ls_waldir()) AS total_size_bytes FROM (SELECT COALESCE(sum(size),0) AS last_5_min_size_bytes FROM pg_catalog.pg_ls_waldir() WHERE modification > CURRENT_TIMESTAMP - '5 minutes'::interval) x;

Metrics:

last_5_min_size_bytes
Current size in bytes of the last 5 minutes of WAL generation. Includes recycled WALs.
total_size_bytes
Current size in bytes of the WAL directory

queries_pg12

ccp_connection_stats

SQL Query:

select ((total - idle) - idle_in_txn) as active , total , idle , idle_in_txn , (select coalesce(extract(epoch from (max(now() - state_change))),0) from pg_catalog.pg_stat_activity where state = 'idle in transaction') as max_idle_in_txn_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and state <> 'idle' ) as max_query_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and wait_event_type = 'Lock' ) as max_blocked_query_time , max_connections from ( select count(*) as total , coalesce(sum(case when state = 'idle' then 1 else 0 end),0) as idle , coalesce(sum(case when state = 'idle in transaction' then 1 else 0 end),0) as idle_in_txn from pg_catalog.pg_stat_activity) x join (select setting::float AS max_connections FROM pg_settings WHERE name = 'max_connections') xx ON (true);

Metrics:

active
Total non-idle connections
total
Total idle and non-idle connections
idle
Total idle connections
idle_in_txn
Total idle in transaction connections
max_idle_in_txn_time
Length of time in seconds of the longest idle in transaction session
max_query_time
Length of time in seconds of the longest running query
max_blocked_query_time
Length of time in seconds of the longest running query that has been blocked by a heavyweight lock
max_connections
Value of max_connections for the monitored database

ccp_data_checksum_failure

SQL Query:

SELECT datname AS dbname , checksum_failures AS count , coalesce(extract(epoch from (now() - checksum_last_failure)), 0) AS time_since_last_failure_seconds FROM pg_catalog.pg_stat_database;

Metrics:

dbname
Database name
count
Total number of checksum failures on this database
time_since_last_failure_seconds
Time interval in seconds since the last checksum failure was encountered

ccp_pg_hba_checksum

SQL Query:

SELECT monitor.pg_hba_checksum() AS status

Metrics:

status
Value of checksum monitioring status for pg_catalog.pg_hba_file_rules (pg_hba.conf). 0 = valid config. 1 = settings changed. To reset current config to valid after alert, run monitor.pg_hba_checksum_set_valid().

ccp_replication_lag

SQL Query:

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replay_time

Metrics:

replay_time
Length of time since the last transaction was replayed on replica. Will always increase if no writes on primary.

ccp_replication_lag_size

SQL Query:

SELECT client_addr as replica , client_hostname as replica_hostname , client_port as replica_port , pg_wal_lsn_diff(sent_lsn, replay_lsn) as bytes FROM pg_catalog.pg_stat_replication

Metrics:

replica
Replica address
replica_hostname
Replica hostname
replica_port
Replica port
bytes
Replication lag in bytes

ccp_replication_slots

SQL Query:

SELECT slot_name, active::int, pg_wal_lsn_diff(pg_current_wal_insert_lsn(), restart_lsn) AS retained_bytes FROM pg_catalog.pg_replication_slots

Metrics:

slot_name
Name of replication slot
active
Active state of slot. 1 = true. 0 = false.
retained_bytes
The amount of WAL (in bytes) being retained for this slot

ccp_wal_activity

SQL Query:

SELECT last_5_min_size_bytes, (SELECT COALESCE(sum(size),0) FROM pg_catalog.pg_ls_waldir()) AS total_size_bytes FROM (SELECT COALESCE(sum(size),0) AS last_5_min_size_bytes FROM pg_catalog.pg_ls_waldir() WHERE modification > CURRENT_TIMESTAMP - '5 minutes'::interval) x;

Metrics:

last_5_min_size_bytes
Current size in bytes of the last 5 minutes of WAL generation. Includes recycled WALs.
total_size_bytes
Current size in bytes of the WAL directory

queries_pg95

ccp_connection_stats

SQL Query:

select ((total - idle) - idle_in_txn) as active , total , idle , idle_in_txn , (select coalesce(extract(epoch from (max(now() - state_change))),0) from monitor.pg_stat_activity() where state = 'idle in transaction') as max_idle_in_txn_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from monitor.pg_stat_activity() where state <> 'idle') as max_query_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from monitor.pg_stat_activity() where waiting = 't' ) as max_blocked_query_time , max_connections from ( select count(*) as total , coalesce(sum(case when state = 'idle' then 1 else 0 end),0) as idle , coalesce(sum(case when state = 'idle in transaction' then 1 else 0 end),0) as idle_in_txn from monitor.pg_stat_activity()) x join (select setting::float AS max_connections FROM pg_settings WHERE name = 'max_connections') xx ON (true);

Metrics:

active
Total non-idle connections
total
Total idle and non-idle connections
idle
Total idle connections
idle_in_txn
Total idle in transaction connections
max_idle_in_txn_time
Length of time in seconds of the longest idle in transaction session
max_query_time
Length of time in seconds of the longest running query
max_blocked_query_time
Length of time in seconds of the longest running query that has been blocked by a heavyweight lock
max_connections
Value of max_connections for the monitored database

ccp_replication_lag

SQL Query:

SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replay_time

Metrics:

replay_time
Length of time since the last transaction was replayed on replica. Will always increase if no writes on primary.

ccp_replication_lag_size

SQL Query:

SELECT replica_addr::text as replica , replica_hostname::text , replica_port::text , byte_lag::bigint as bytes FROM monitor.streaming_replica_check();

Metrics:

replica
Replica IP address
replica_hostname
Hostname of replica system (if available)
replica_port
Replica port
bytes
Replication lag in bytes between primary and replica

ccp_replication_slots

SQL Query:

SELECT slot_name, active::int, pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes FROM pg_catalog.pg_replication_slots

Metrics:

slot_name
Name of replication slot
active
Active state of slot. 1 = true. 0 = false.
retained_bytes
The amount of WAL (in bytes) being retained for this slot

ccp_wal_activity

SQL Query:

SELECT (SELECT setting::int8 FROM pg_settings WHERE name = 'wal_segment_size') * (SELECT setting::int8 FROM pg_settings WHERE name = 'wal_block_size') * (SELECT count(*) FROM monitor.pg_ls_waldir()) AS total_size_bytes;

Metrics:

total_size_bytes
Current size in bytes of the WAL directory

queries_pg96

ccp_connection_stats

SQL Query:

select ((total - idle) - idle_in_txn) as active , total , idle , idle_in_txn , (select coalesce(extract(epoch from (max(now() - state_change))),0) from monitor.pg_stat_activity() where state = 'idle in transaction') as max_idle_in_txn_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from monitor.pg_stat_activity() where state <> 'idle') as max_query_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and wait_event_type = 'Lock' ) as max_blocked_query_time , max_connections from ( select count(*) as total , coalesce(sum(case when state = 'idle' then 1 else 0 end),0) as idle , coalesce(sum(case when state = 'idle in transaction' then 1 else 0 end),0) as idle_in_txn from monitor.pg_stat_activity()) x join (select setting::float AS max_connections FROM pg_settings WHERE name = 'max_connections') xx ON (true);

Metrics:

active
Total non-idle connections
total
Total idle and non-idle connections
idle
Total idle connections
idle_in_txn
Total idle in transaction connections
max_idle_in_txn_time
Length of time in seconds of the longest idle in transaction session
max_query_time
Length of time in seconds of the longest running query
max_blocked_query_time
Length of time in seconds of the longest running query that has been blocked by a heavyweight lock
max_connections
Value of max_connections for the monitored database

ccp_replication_lag

SQL Query:

SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replay_time

Metrics:

replay_time
Length of time since the last transaction was replayed on replica. Will always increase if no writes on primary.

ccp_replication_lag_size

SQL Query:

SELECT replica_addr::text as replica , replica_hostname::text , replica_port::text , byte_lag::bigint as bytes FROM monitor.streaming_replica_check();

Metrics:

replica
Replica IP address
replica_hostname
Hostname of replica system (if available)
replica_port
Replica port
bytes
Replication lag in bytes between primary and replica

ccp_replication_slots

SQL Query:

SELECT slot_name, active::int, pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes FROM pg_catalog.pg_replication_slots

Metrics:

slot_name
Name of replication slot
active
Active state of slot. 1 = true. 0 = false.
retained_bytes
The amount of WAL (in bytes) being retained for this slot

ccp_wal_activity

SQL Query:

SELECT (SELECT setting::int8 FROM pg_settings WHERE name = 'wal_segment_size') * (SELECT setting::int8 FROM pg_settings WHERE name = 'wal_block_size') * (SELECT count(*) FROM monitor.pg_ls_waldir()) AS total_size_bytes;

Metrics:

total_size_bytes
Current size in bytes of the WAL directory

pgnodemx

In addition to the metrics above, the pgnodemx PostgreSQL extension provides SQL functions to allow the capture of node OS metrics via SQL queries. For more information, please see the pgnodemx project page:

https://github.com/CrunchyData/pgnodemx

queries_nodemx

ccp_nodemx_cpu

SQL Query:

SELECT monitor.kdapi_scalar_bigint('cpu_request') as request, monitor.kdapi_scalar_bigint('cpu_limit') as limit

Metrics:

request
CPU request value in milli cores
limit
CPU limit value in milli cores

ccp_nodemx_cpuacct

SQL Query:

SELECT monitor.cgroup_scalar_bigint('cpuacct.usage') as usage

Metrics:

usage
CPU usage in nanoseconds

ccp_nodemx_cpucfs

SQL Query:

SELECT monitor.cgroup_scalar_bigint('cpu.cfs_period_us') as period_us, monitor.cgroup_scalar_bigint('cpu.cfs_quota_us') as quota_us

Metrics:

period_us
the total available run-time within a period (in microseconds)
quota_us
the length of a period (in microseconds)

ccp_nodemx_cpustat

SQL Query:

select (SELECT val as nr_periods FROM monitor.cgroup_setof_kv('cpu.stat') where key='nr_periods'), (SELECT val as nr_throttled FROM monitor.cgroup_setof_kv('cpu.stat') where key='nr_throttled'), (SELECT val as throttled_time FROM monitor.cgroup_setof_kv('cpu.stat') where key='throttled_time')

Metrics:

nr_threads
number of periods that any thread was runnable
nr_throttled
number of runnable periods in which the application used its entire quota and was throttled
throttled_time
sum total amount of time individual threads within the monitor.cgroup were throttled

ccp_nodemx_data_disk

SQL Query:

SELECT mount_point,fs_type,total_bytes,available_bytes,total_file_nodes,free_file_nodes FROM monitor.proc_mountinfo() m JOIN monitor.fsinfo(m.mount_point) f USING (major_number, minor_number) WHERE m.mount_point IN ('/pgdata', '/pgwal') OR m.mount_point like '/tablespaces/%'

Metrics:

mount_point
mount point
fs_type
File system type
total_bytes
Size in bytes
available_bytes
Available size in bytes
total_file_nodes
Total file nodes
free_file_nodes
Available file nodes

ccp_nodemx_disk_activity

SQL Query:

SELECT mount_point,sectors_read,sectors_written FROM monitor.proc_mountinfo() m JOIN monitor.proc_diskstats() d USING (major_number, minor_number) WHERE m.mount_point IN ('/pgdata', '/pgwal') OR m.mount_point like '/tablespaces/%'

Metrics:

mount_point
mount point
sectors_read
Total sectors read
sectors_written
Total sectors writen

ccp_nodemx_mem

SQL Query:

SELECT monitor.kdapi_scalar_bigint('mem_request') as request, monitor.kdapi_scalar_bigint('mem_limit') as limit, (select val from monitor.cgroup_setof_kv('memory.stat') where key='cache') as cache, (select val from monitor.cgroup_setof_kv('memory.stat') where key='rss') as rss, (select val from monitor.cgroup_setof_kv('memory.stat') where key='shmem') as shmem, (select val from monitor.cgroup_setof_kv('memory.stat') where key='mapped_file') as mapped_file, (select val from monitor.cgroup_setof_kv('memory.stat') where key='dirty') as dirty, (select val from monitor.cgroup_setof_kv('memory.stat') where key='active_anon') as active_anon, (select val from monitor.cgroup_setof_kv('memory.stat') where key='inactive_anon') as inactive_anon, (select val from monitor.cgroup_setof_kv('memory.stat') where key='active_file') as active_file, (select val from monitor.cgroup_setof_kv('memory.stat') where key='inactive_file') as inactive_file

Metrics:

request
Memory request value in bytes
limit
Memory limit value in bytes
cache
Total bytes of page cache memory
rss
Total bytes of anonymous and swap cache memory
shmem
Total bytes of shared memory
mapped_file
Total bytes of mapped file (includes tmpfs/shmem)
mapped_file
Total bytes of mapped file (includes tmpfs/shmem)
dirty
Total bytes that are waiting to get written back to the disk
active_anon
Total bytes of anonymous and swap cache memory on active LRU list
inactive_anon
Total bytes of anonymous and swap cache memory on inactive LRU list
active_file
Total bytes of file-backed memory on active LRU list
inactive_file
Total bytes of file-backed memory on inactive LRU list

ccp_nodemx_network

SQL Query:

SELECT interface,tx_bytes,tx_packets, rx_bytes,rx_packets from monitor.proc_network_stats()

Metrics:

interface
tx_bytes
Number of bytes transmitted
tx_packets
Number of packets transmitted
rx_bytes
Number of bytes received
rx_packets
Number of packets received

ccp_nodemx_process

SQL Query:

SELECT monitor.cgroup_process_count() as count

Metrics:

count
Total number of database processes