Crunchy Postgres Exporter
The crunchy-postgres-exporter container provides real time metrics about the PostgreSQL database via an API. These metrics are scraped and stored by a Prometheus time-series database and are then graphed and visualized through the open source data visualizer Grafana.
The crunchy-postgres-exporter container uses pgMonitor for advanced metric collection.
It is required that the crunchy-postgres-ha
container has the PGMONITOR_PASSWORD
environment
variable to create the appropriate user (ccp_monitoring
) to collect metrics.
Custom queries to collect metrics can be specified by the user. By mounting a queries.yml file to /conf on the container, additional metrics can be specified for the API to collect. For an example of a queries.yml file, see here
Packages
The crunchy-postgres-exporter Docker image contains the following packages (versions vary depending on PostgreSQL version):
- PostgreSQL (13.0, 12.4, 11.9, 10.14, 9.6.19 and 9.5.23)
- CentOS7 - publicly available
- UBI7 - customers only
- PostgreSQL Exporter
Environment Variables
Required
Name | Default | Description |
---|---|---|
EXPORTER_PG_PASSWORD | none | Provides the password needed to generate the PostgreSQL URL required by the PostgreSQL Exporter to connect to a PG database. Should typically match the PGMONITOR_PASSWORD value set in the crunchy-postgres container. |
Optional
Name | Default | Description |
---|---|---|
EXPORTER_PG_USER | ccp_monitoring | Provides the username needed to generate the PostgreSQL URL required by the PostgreSQL Exporter to connect to a PG database. Should typically be ccp_monitoring per the crunchy-postgres container specification (see environment varaible PGMONITOR_PASSWORD ). |
EXPORTER_PG_HOST | 127.0.0.1 | Provides the host needed to generate the PostgreSQL URL required by the PostgreSQL Exporter to connect to a PG database |
EXPORTER_PG_PORT | 5432 | Provides the port needed to generate the PostgreSQL URL required by the PostgreSQL Exporter to connect to a PG database |
EXPORTER_PG_DATABASE | postgres | Provides the name of the database used to generate the PostgreSQL URL required by the PostgreSQL Exporter to connect to a PG database |
DATA_SOURCE_NAME | None | Explicitly defines the URL for connecting to the PostgreSQL database (must be in the form of postgresql:// ). If provided, overrides all other settings provided to generate the connection URL. |
CRUNCHY_DEBUG | FALSE | Set this to true to enable debugging in logs. Note: this mode can reveal secrets in logs. |
POSTGRES_EXPORTER_PORT | 9187 | Set the postgres-exporter port to listen on for web interface and telemetry. |
Viewing Cluster Metrics
To view a particular cluster’s available metrics in a local browser window, port forwarding can be set up as follows.
For a pgcluster, mycluster
, deployed in the pgouser1
namespace, use
# If deployed to Kubernetes
kubectl port-forward -n pgouser1 svc/mycluster 9187:9187
# If deployed to OpenShift
oc port-forward -n pgouser1 svc/mycluster 9187:9187
Then, in your local browser, go to http://127.0.0.1:9187/metrics
to view the available metrics for that cluster.
Crunchy Postgres Exporter Metrics Detail
Below are details on the various metrics available from the crunchy-postgres-exporter container. The name, SQL query and metric details are given for each available item.
queries_backrest
ccp_backrest_last_diff_backup
SQL Query:
WITH all_backups AS ( SELECT config_file , jsonb_array_elements(data) AS stanza_data FROM monitor.pgbackrest_info(#PGBACKREST_INFO_THROTTLE_MINUTES#) ) , per_stanza AS ( SELECT config_file , stanza_data->>'name' AS stanza , jsonb_array_elements(stanza_data->'backup') AS backup_data FROM all_backups ) SELECT config_file , stanza , extract(epoch from (CURRENT_TIMESTAMP - max(to_timestamp((backup_data->'timestamp'->>'stop')::bigint)))) AS time_since_completion_seconds FROM per_stanza WHERE backup_data->>'type' IN ('full', 'diff') GROUP BY config_file, stanzaMetrics:
config_file
Configuration file for this backupstanza
PGBackrest Stanza Nametime_since_completion_seconds
Seconds since the last completed full or differential backup. Differential is always based off last full.ccp_backrest_last_full_backup
SQL Query:
WITH all_backups AS ( SELECT config_file , jsonb_array_elements(data) AS stanza_data FROM monitor.pgbackrest_info(#PGBACKREST_INFO_THROTTLE_MINUTES#) ) , per_stanza AS ( SELECT config_file , stanza_data->>'name' AS stanza , jsonb_array_elements(stanza_data->'backup') AS backup_data FROM all_backups ) SELECT config_file , stanza , extract(epoch from (CURRENT_TIMESTAMP - max(to_timestamp((backup_data->'timestamp'->>'stop')::bigint)))) AS time_since_completion_seconds FROM per_stanza WHERE backup_data->>'type' IN ('full') GROUP BY config_file, stanzaMetrics:
config_file
Configuration file for this backupstanza
PGBackrest Stanza Nametime_since_completion_seconds
Seconds since the last completed full backupccp_backrest_last_incr_backup
SQL Query:
WITH all_backups AS ( SELECT config_file , jsonb_array_elements(data) AS stanza_data FROM monitor.pgbackrest_info(#PGBACKREST_INFO_THROTTLE_MINUTES#) ) , per_stanza AS ( SELECT config_file , stanza_data->>'name' AS stanza , jsonb_array_elements(stanza_data->'backup') AS backup_data FROM all_backups ) SELECT config_file , stanza , extract(epoch from (CURRENT_TIMESTAMP - max(to_timestamp((backup_data->'timestamp'->>'stop')::bigint)))) AS time_since_completion_seconds FROM per_stanza WHERE backup_data->>'type' IN ('full', 'diff', 'incr') GROUP BY config_file, stanzaMetrics:
config_file
Configuration file for this backupstanza
PGBackrest Stanza Nametime_since_completion_seconds
Seconds since the last completed full, differential or incremental backup. Incremental is always based off last full or differential.ccp_backrest_last_info
SQL Query:
WITH all_backups AS ( SELECT config_file , jsonb_array_elements(data) AS stanza_data FROM monitor.pgbackrest_info(#PGBACKREST_INFO_THROTTLE_MINUTES#) ) , per_stanza AS ( SELECT config_file , stanza_data->>'name' AS stanza , jsonb_array_elements(stanza_data->'backup') AS backup_data FROM all_backups ) SELECT a.config_file , a.stanza , a.backup_data->>'type' AS backup_type , a.backup_data->'info'->'repository'->>'delta' AS repo_backup_size_bytes , a.backup_data->'info'->'repository'->>'size' AS repo_total_size_bytes , (a.backup_data->'timestamp'->>'stop')::bigint - (a.backup_data->'timestamp'->>'start')::bigint AS backup_runtime_seconds FROM per_stanza a JOIN ( SELECT config_file , stanza , backup_data->>'type' AS backup_type , max(backup_data->'timestamp'->>'start') AS max_backup_start , max(backup_data->'timestamp'->>'stop') AS max_backup_stop FROM per_stanza GROUP BY 1,2,3) b ON a.config_file = b.config_file AND a.stanza = b.stanza AND a.backup_data->>'type' = b.backup_type AND a.backup_data->'timestamp'->>'start' = b.max_backup_start AND a.backup_data->'timestamp'->>'stop' = b.max_backup_stop;Metrics:
config_file
Configuration file for this backupstanza
PGBackrest Stanza Namebackup_type
Backup typerepo_backup_size_bytes
Actual size of only this individual backup in the pgbackrest repositoryrepo_total_size_bytes
Total size of this backup in the pgbackrest repository, including all required previous backups and WALbackup_runtime_seconds
Total runtime in seconds of this backupqueries_common
ccp_archive_command_status
SQL Query:
SELECT CASE WHEN EXTRACT(epoch from (last_failed_time - last_archived_time)) IS NULL THEN 0 WHEN EXTRACT(epoch from (last_failed_time - last_archived_time)) < 0 THEN 0 ELSE EXTRACT(epoch from (last_failed_time - last_archived_time)) END AS seconds_since_last_fail FROM pg_catalog.pg_stat_archiverMetrics:
seconds_since_last_fail
Seconds since the last recorded failure of the archive_commandccp_database_size
SQL Query:
SELECT datname as dbname, pg_database_size(datname) as bytes FROM pg_catalog.pg_database WHERE datistemplate = falseMetrics:
dbname
Database namebytes
Database size in bytesccp_is_in_recovery
SQL Query:
SELECT CASE WHEN pg_is_in_recovery = true THEN 1 ELSE 2 END AS status from pg_is_in_recovery();Metrics:
status
Return value of 1 means database is in recovery. Otherwise 2 it is a primary.ccp_locks
SQL Query:
SELECT pg_database.datname as dbname, tmp.mode, COALESCE(count,0) as count FROM ( VALUES ('accesssharelock'), ('rowsharelock'), ('rowexclusivelock'), ('shareupdateexclusivelock'), ('sharelock'), ('sharerowexclusivelock'), ('exclusivelock'), ('accessexclusivelock') ) AS tmp(mode) CROSS JOIN pg_catalog.pg_database LEFT JOIN (SELECT database, lower(mode) AS mode,count(*) AS count FROM pg_catalog.pg_locks WHERE database IS NOT NULL GROUP BY database, lower(mode) ) AS tmp2 ON tmp.mode=tmp2.mode and pg_database.oid = tmp2.databaseMetrics:
dbname
Database namemode
Lock typecount
Number of locksccp_pg_settings_checksum
SQL Query:
SELECT monitor.pg_settings_checksum() AS statusMetrics:
status
Value of checksum monitioring status for pg_catalog.pg_settings (postgresql.conf). 0 = valid config. 1 = settings changed. To reset current config to valid after alert, run monitor.pg_settings_checksum_set_valid().ccp_postgresql_version
SQL Query:
SELECT current_setting('server_version_num')::int AS currentMetrics:
current
The current version of PostgreSQL that this exporter is running on as a 6 digit integer (######).ccp_postmaster_runtime
SQL Query:
SELECT extract('epoch' from pg_postmaster_start_time) as start_time_seconds from pg_catalog.pg_postmaster_start_time()Metrics:
start_time_seconds
Time at which postmaster started. Note this metric has been deprecated as of pgMonitor 4.3 and will be removed in a future version. Use ccp_postmaster_uptime_seconds instead.ccp_postmaster_uptime
SQL Query:
SELECT extract(epoch from (now() - pg_postmaster_start_time() )) AS seconds;Metrics:
seconds
Time interval in seconds since PostgreSQL database was last restartedccp_sequence_exhaustion
SQL Query:
SELECT count FROM monitor.sequence_exhaustion(75)Metrics:
count
Count of sequences that have reached greater than or equal to 75% of their max available numbers. Function monitor.sequence_status() can provide more details if run directly on system.ccp_settings_gauge
SQL Query:
select (select setting::int from pg_catalog.pg_settings where name = 'checkpoint_timeout') as checkpoint_timeout , (select setting::float from pg_catalog.pg_settings where name = 'checkpoint_completion_target') as checkpoint_completion_target , (select 8192*setting::bigint as bytes from pg_catalog.pg_settings where name = 'shared_buffers') as shared_buffersMetrics:
checkpoint_timeout
Checkpoint timeout in secondscheckpoint_completion_target
Checkpoint completion target, ranging from 0 to 1shared_buffers
Size of shared_buffers in bytesccp_settings_pending_restart
SQL Query:
SELECT count(*) AS count FROM pg_catalog.pg_settings WHERE pending_restart = trueMetrics:
count
Number of settings from pg_settings catalog in a pending_restart stateccp_stat_bgwriter
SQL Query:
SELECT checkpoints_timed, checkpoints_req, checkpoint_write_time, checkpoint_sync_time, buffers_checkpoint, buffers_clean, maxwritten_clean, buffers_backend, buffers_backend_fsync, buffers_alloc, stats_reset FROM pg_catalog.pg_stat_bgwriterMetrics:
checkpoints_timed
Number of scheduled checkpoints that have been performedcheckpoints_req
Number of requested checkpoints that have been performedcheckpoint_write_time
Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk, in millisecondscheckpoint_sync_time
Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk, in millisecondsbuffers_checkpoint
Number of buffers written during checkpointsbuffers_clean
Number of buffers written by the background writermaxwritten_clean
Number of times the background writer stopped a cleaning scan because it had written too many buffersbuffers_backend
Number of buffers written directly by a backendbuffers_backend_fsync
Number of times a backend had to execute its own fsync call (normally the background writer handles those even when the backend does its own write)buffers_alloc
Number of buffers allocatedstats_reset
Time at which these statistics were last resetccp_stat_database
SQL Query:
SELECT s.datname as dbname, xact_commit, xact_rollback, blks_read, blks_hit, tup_returned, tup_fetched, tup_inserted, tup_updated, tup_deleted, conflicts, temp_files, temp_bytes, deadlocks FROM pg_catalog.pg_stat_database s JOIN pg_catalog.pg_database d on d.datname = s.datname WHERE d.datistemplate = falseMetrics:
dbname
Name of databasexact_commit
Number of transactions in this database that have been committedxact_rollback
Number of transactions in this database that have been rolled backblks_read
Number of disk blocks read in this databaseblks_hit
Number of times disk blocks were found already in the buffer cache, so that a read was not necessarytup_returned
Number of rows returned by queries in this databasetup_fetched
Number of rows fetched by queries in this databasetup_inserted
Number of rows inserted by queries in this databasetup_updated
Number of rows updated by queries in this databasetup_deleted
Number of rows deleted by queries in this databaseconflicts
Number of queries canceled due to conflicts with recovery in this databasetemp_files
Number of rows deleted by queries in this databasetemp_bytes
Total amount of data written to temporary files by queries in this databasedeadlocks
Number of deadlocks detected in this databaseccp_transaction_wraparound
SQL Query:
WITH max_age AS ( SELECT 2000000000 as max_old_xid, setting AS autovacuum_freeze_max_age FROM pg_catalog.pg_settings WHERE name = 'autovacuum_freeze_max_age'), per_database_stats AS ( SELECT datname , m.max_old_xid::int , m.autovacuum_freeze_max_age::int , age(d.datfrozenxid) AS oldest_current_xid FROM pg_catalog.pg_database d JOIN max_age m ON (true) WHERE d.datallowconn) SELECT max(oldest_current_xid) AS oldest_current_xid , max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS percent_towards_wraparound , max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float))) AS percent_towards_emergency_autovac FROM per_database_statsMetrics:
oldest_current_xid
Oldest current transaction ID in clusterpercent_towards_wraparound
Percentage towards transaction ID wraparoundpercent_towards_emergency_autovac
Percentage towards emergency autovacuum process startingqueries_per_db
ccp_stat_user_tables
SQL Query:
SELECT current_database() as dbname, schemaname, relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd, n_live_tup, n_dead_tup, vacuum_count, autovacuum_count, analyze_count, autoanalyze_count FROM pg_catalog.pg_stat_user_tablesMetrics:
dbname
Database nameschemaname
Name of the schema that this table is inrelname
Name of this tableseq_scan
Number of sequential scans initiated on this tableseq_tup_read
Number of live rows fetched by sequential scansidx_scan
Number of index scans initiated on this tableidx_tup_fetch
Number of live rows fetched by index scansn_tup_ins
Number of rows insertedn_tup_upd
Number of rows updatedn_tup_del
Number of rows deletedn_tup_hot_upd
Number of rows HOT updated (i.e., with no separate index update required)n_live_tup
Estimated number of live rowsn_dead_tup
Estimated number of dead rowsvacuum_count
Number of times this table has been manually vacuumed (not counting VACUUM FULL)autovacuum_count
Number of times this table has been vacuumed by the autovacuum daemonanalyze_count
Number of times this table has been manually analyzedautoanalyze_count
Number of times this table has been analyzed by the autovacuum daemonccp_table_size
SQL Query:
SELECT current_database() as dbname, n.nspname as schemaname, c.relname, pg_total_relation_size(c.oid) as size_bytes FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid WHERE NOT pg_is_other_temp_schema(n.oid) AND relkind IN ('r', 'm', 'f')Metrics:
dbname
Database nameschemaname
Schema namerelname
Table namesize_bytes
Table size including indexesqueries_pg10
ccp_connection_stats
SQL Query:
select ((total - idle) - idle_in_txn) as active , total , idle , idle_in_txn , (select coalesce(extract(epoch from (max(now() - state_change))),0) from pg_catalog.pg_stat_activity where state = 'idle in transaction') as max_idle_in_txn_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and state <> 'idle' ) as max_query_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and wait_event_type = 'Lock' ) as max_blocked_query_time , max_connections from ( select count(*) as total , coalesce(sum(case when state = 'idle' then 1 else 0 end),0) as idle , coalesce(sum(case when state = 'idle in transaction' then 1 else 0 end),0) as idle_in_txn from pg_catalog.pg_stat_activity) x join (select setting::float AS max_connections FROM pg_settings WHERE name = 'max_connections') xx ON (true);Metrics:
active
Total non-idle connectionstotal
Total idle and non-idle connectionsidle
Total idle connectionsidle_in_txn
Total idle in transaction connectionsmax_idle_in_txn_time
Length of time in seconds of the longest idle in transaction sessionmax_query_time
Length of time in seconds of the longest running querymax_blocked_query_time
Length of time in seconds of the longest running query that has been blocked by a heavyweight lockmax_connections
Value of max_connections for the monitored databaseccp_pg_hba_checksum
SQL Query:
SELECT monitor.pg_hba_checksum() AS statusMetrics:
status
Value of checksum monitioring status for pg_catalog.pg_hba_file_rules (pg_hba.conf). 0 = valid config. 1 = settings changed. To reset current config to valid after alert, run monitor.pg_hba_checksum_set_valid().ccp_replication_lag
SQL Query:
SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replay_timeMetrics:
replay_time
Length of time since the last transaction was replayed on replica. Will always increase if no writes on primary.ccp_replication_lag_size
SQL Query:
SELECT client_addr as replica , client_hostname as replica_hostname , client_port as replica_port , pg_wal_lsn_diff(sent_lsn, replay_lsn) as bytes FROM pg_catalog.pg_stat_replicationMetrics:
replica
Replica addressreplica_hostname
Replica hostnamereplica_port
Replica portbytes
Replication lag in bytesccp_replication_slots
SQL Query:
SELECT slot_name, active::int, pg_wal_lsn_diff(pg_current_wal_insert_lsn(), restart_lsn) AS retained_bytes FROM pg_catalog.pg_replication_slotsMetrics:
slot_name
Name of replication slotactive
Active state of slot. 1 = true. 0 = false.retained_bytes
The amount of WAL (in bytes) being retained for this slotccp_wal_activity
SQL Query:
SELECT last_5_min_size_bytes, (SELECT COALESCE(sum(size),0) FROM pg_catalog.pg_ls_waldir()) AS total_size_bytes FROM (SELECT COALESCE(sum(size),0) AS last_5_min_size_bytes FROM pg_catalog.pg_ls_waldir() WHERE modification > CURRENT_TIMESTAMP - '5 minutes'::interval) x;Metrics:
last_5_min_size_bytes
Current size in bytes of the last 5 minutes of WAL generation. Includes recycled WALs.total_size_bytes
Current size in bytes of the WAL directoryqueries_pg11
ccp_connection_stats
SQL Query:
select ((total - idle) - idle_in_txn) as active , total , idle , idle_in_txn , (select coalesce(extract(epoch from (max(now() - state_change))),0) from pg_catalog.pg_stat_activity where state = 'idle in transaction') as max_idle_in_txn_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and state <> 'idle' ) as max_query_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and wait_event_type = 'Lock' ) as max_blocked_query_time , max_connections from ( select count(*) as total , coalesce(sum(case when state = 'idle' then 1 else 0 end),0) as idle , coalesce(sum(case when state = 'idle in transaction' then 1 else 0 end),0) as idle_in_txn from pg_catalog.pg_stat_activity) x join (select setting::float AS max_connections FROM pg_settings WHERE name = 'max_connections') xx ON (true);Metrics:
active
Total non-idle connectionstotal
Total idle and non-idle connectionsidle
Total idle connectionsidle_in_txn
Total idle in transaction connectionsmax_idle_in_txn_time
Length of time in seconds of the longest idle in transaction sessionmax_query_time
Length of time in seconds of the longest running querymax_blocked_query_time
Length of time in seconds of the longest running query that has been blocked by a heavyweight lockmax_connections
Value of max_connections for the monitored databaseccp_pg_hba_checksum
SQL Query:
SELECT monitor.pg_hba_checksum() AS statusMetrics:
status
Value of checksum monitioring status for pg_catalog.pg_hba_file_rules (pg_hba.conf). 0 = valid config. 1 = settings changed. To reset current config to valid after alert, run monitor.pg_hba_checksum_set_valid().ccp_replication_lag
SQL Query:
SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replay_timeMetrics:
replay_time
Length of time since the last transaction was replayed on replica. Will always increase if no writes on primary.ccp_replication_lag_size
SQL Query:
SELECT client_addr as replica , client_hostname as replica_hostname , client_port as replica_port , pg_wal_lsn_diff(sent_lsn, replay_lsn) as bytes FROM pg_catalog.pg_stat_replicationMetrics:
replica
Replica addressreplica_hostname
Replica hostnamereplica_port
Replica portbytes
Replication lag in bytesccp_replication_slots
SQL Query:
SELECT slot_name, active::int, pg_wal_lsn_diff(pg_current_wal_insert_lsn(), restart_lsn) AS retained_bytes FROM pg_catalog.pg_replication_slotsMetrics:
slot_name
Name of replication slotactive
Active state of slot. 1 = true. 0 = false.retained_bytes
The amount of WAL (in bytes) being retained for this slotccp_wal_activity
SQL Query:
SELECT last_5_min_size_bytes, (SELECT COALESCE(sum(size),0) FROM pg_catalog.pg_ls_waldir()) AS total_size_bytes FROM (SELECT COALESCE(sum(size),0) AS last_5_min_size_bytes FROM pg_catalog.pg_ls_waldir() WHERE modification > CURRENT_TIMESTAMP - '5 minutes'::interval) x;Metrics:
last_5_min_size_bytes
Current size in bytes of the last 5 minutes of WAL generation. Includes recycled WALs.total_size_bytes
Current size in bytes of the WAL directoryqueries_pg12
ccp_connection_stats
SQL Query:
select ((total - idle) - idle_in_txn) as active , total , idle , idle_in_txn , (select coalesce(extract(epoch from (max(now() - state_change))),0) from pg_catalog.pg_stat_activity where state = 'idle in transaction') as max_idle_in_txn_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and state <> 'idle' ) as max_query_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and wait_event_type = 'Lock' ) as max_blocked_query_time , max_connections from ( select count(*) as total , coalesce(sum(case when state = 'idle' then 1 else 0 end),0) as idle , coalesce(sum(case when state = 'idle in transaction' then 1 else 0 end),0) as idle_in_txn from pg_catalog.pg_stat_activity) x join (select setting::float AS max_connections FROM pg_settings WHERE name = 'max_connections') xx ON (true);Metrics:
active
Total non-idle connectionstotal
Total idle and non-idle connectionsidle
Total idle connectionsidle_in_txn
Total idle in transaction connectionsmax_idle_in_txn_time
Length of time in seconds of the longest idle in transaction sessionmax_query_time
Length of time in seconds of the longest running querymax_blocked_query_time
Length of time in seconds of the longest running query that has been blocked by a heavyweight lockmax_connections
Value of max_connections for the monitored databaseccp_data_checksum_failure
SQL Query:
SELECT datname AS dbname , checksum_failures AS count , coalesce(extract(epoch from (now() - checksum_last_failure)), 0) AS time_since_last_failure_seconds FROM pg_catalog.pg_stat_database;Metrics:
dbname
Database namecount
Total number of checksum failures on this databasetime_since_last_failure_seconds
Time interval in seconds since the last checksum failure was encounteredccp_pg_hba_checksum
SQL Query:
SELECT monitor.pg_hba_checksum() AS statusMetrics:
status
Value of checksum monitioring status for pg_catalog.pg_hba_file_rules (pg_hba.conf). 0 = valid config. 1 = settings changed. To reset current config to valid after alert, run monitor.pg_hba_checksum_set_valid().ccp_replication_lag
SQL Query:
SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replay_timeMetrics:
replay_time
Length of time since the last transaction was replayed on replica. Will always increase if no writes on primary.ccp_replication_lag_size
SQL Query:
SELECT client_addr as replica , client_hostname as replica_hostname , client_port as replica_port , pg_wal_lsn_diff(sent_lsn, replay_lsn) as bytes FROM pg_catalog.pg_stat_replicationMetrics:
replica
Replica addressreplica_hostname
Replica hostnamereplica_port
Replica portbytes
Replication lag in bytesccp_replication_slots
SQL Query:
SELECT slot_name, active::int, pg_wal_lsn_diff(pg_current_wal_insert_lsn(), restart_lsn) AS retained_bytes FROM pg_catalog.pg_replication_slotsMetrics:
slot_name
Name of replication slotactive
Active state of slot. 1 = true. 0 = false.retained_bytes
The amount of WAL (in bytes) being retained for this slotccp_wal_activity
SQL Query:
SELECT last_5_min_size_bytes, (SELECT COALESCE(sum(size),0) FROM pg_catalog.pg_ls_waldir()) AS total_size_bytes FROM (SELECT COALESCE(sum(size),0) AS last_5_min_size_bytes FROM pg_catalog.pg_ls_waldir() WHERE modification > CURRENT_TIMESTAMP - '5 minutes'::interval) x;Metrics:
last_5_min_size_bytes
Current size in bytes of the last 5 minutes of WAL generation. Includes recycled WALs.total_size_bytes
Current size in bytes of the WAL directoryqueries_pg95
ccp_connection_stats
SQL Query:
select ((total - idle) - idle_in_txn) as active , total , idle , idle_in_txn , (select coalesce(extract(epoch from (max(now() - state_change))),0) from monitor.pg_stat_activity() where state = 'idle in transaction') as max_idle_in_txn_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from monitor.pg_stat_activity() where state <> 'idle') as max_query_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from monitor.pg_stat_activity() where waiting = 't' ) as max_blocked_query_time , max_connections from ( select count(*) as total , coalesce(sum(case when state = 'idle' then 1 else 0 end),0) as idle , coalesce(sum(case when state = 'idle in transaction' then 1 else 0 end),0) as idle_in_txn from monitor.pg_stat_activity()) x join (select setting::float AS max_connections FROM pg_settings WHERE name = 'max_connections') xx ON (true);Metrics:
active
Total non-idle connectionstotal
Total idle and non-idle connectionsidle
Total idle connectionsidle_in_txn
Total idle in transaction connectionsmax_idle_in_txn_time
Length of time in seconds of the longest idle in transaction sessionmax_query_time
Length of time in seconds of the longest running querymax_blocked_query_time
Length of time in seconds of the longest running query that has been blocked by a heavyweight lockmax_connections
Value of max_connections for the monitored databaseccp_replication_lag
SQL Query:
SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replay_timeMetrics:
replay_time
Length of time since the last transaction was replayed on replica. Will always increase if no writes on primary.ccp_replication_lag_size
SQL Query:
SELECT replica_addr::text as replica , replica_hostname::text , replica_port::text , byte_lag::bigint as bytes FROM monitor.streaming_replica_check();Metrics:
replica
Replica IP addressreplica_hostname
Hostname of replica system (if available)replica_port
Replica portbytes
Replication lag in bytes between primary and replicaccp_replication_slots
SQL Query:
SELECT slot_name, active::int, pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes FROM pg_catalog.pg_replication_slotsMetrics:
slot_name
Name of replication slotactive
Active state of slot. 1 = true. 0 = false.retained_bytes
The amount of WAL (in bytes) being retained for this slotccp_wal_activity
SQL Query:
SELECT (SELECT setting::int8 FROM pg_settings WHERE name = 'wal_segment_size') * (SELECT setting::int8 FROM pg_settings WHERE name = 'wal_block_size') * (SELECT count(*) FROM monitor.pg_ls_waldir()) AS total_size_bytes;Metrics:
total_size_bytes
Current size in bytes of the WAL directoryqueries_pg96
ccp_connection_stats
SQL Query:
select ((total - idle) - idle_in_txn) as active , total , idle , idle_in_txn , (select coalesce(extract(epoch from (max(now() - state_change))),0) from monitor.pg_stat_activity() where state = 'idle in transaction') as max_idle_in_txn_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from monitor.pg_stat_activity() where state <> 'idle') as max_query_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and wait_event_type = 'Lock' ) as max_blocked_query_time , max_connections from ( select count(*) as total , coalesce(sum(case when state = 'idle' then 1 else 0 end),0) as idle , coalesce(sum(case when state = 'idle in transaction' then 1 else 0 end),0) as idle_in_txn from monitor.pg_stat_activity()) x join (select setting::float AS max_connections FROM pg_settings WHERE name = 'max_connections') xx ON (true);Metrics:
active
Total non-idle connectionstotal
Total idle and non-idle connectionsidle
Total idle connectionsidle_in_txn
Total idle in transaction connectionsmax_idle_in_txn_time
Length of time in seconds of the longest idle in transaction sessionmax_query_time
Length of time in seconds of the longest running querymax_blocked_query_time
Length of time in seconds of the longest running query that has been blocked by a heavyweight lockmax_connections
Value of max_connections for the monitored databaseccp_replication_lag
SQL Query:
SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replay_timeMetrics:
replay_time
Length of time since the last transaction was replayed on replica. Will always increase if no writes on primary.ccp_replication_lag_size
SQL Query:
SELECT replica_addr::text as replica , replica_hostname::text , replica_port::text , byte_lag::bigint as bytes FROM monitor.streaming_replica_check();Metrics:
replica
Replica IP addressreplica_hostname
Hostname of replica system (if available)replica_port
Replica portbytes
Replication lag in bytes between primary and replicaccp_replication_slots
SQL Query:
SELECT slot_name, active::int, pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes FROM pg_catalog.pg_replication_slotsMetrics:
slot_name
Name of replication slotactive
Active state of slot. 1 = true. 0 = false.retained_bytes
The amount of WAL (in bytes) being retained for this slotccp_wal_activity
SQL Query:
SELECT (SELECT setting::int8 FROM pg_settings WHERE name = 'wal_segment_size') * (SELECT setting::int8 FROM pg_settings WHERE name = 'wal_block_size') * (SELECT count(*) FROM monitor.pg_ls_waldir()) AS total_size_bytes;Metrics:
total_size_bytes
Current size in bytes of the WAL directorypgnodemx
In addition to the metrics above, the pgnodemx PostgreSQL extension provides SQL functions to allow the capture of node OS metrics via SQL queries. For more information, please see the pgnodemx project page:
https://github.com/CrunchyData/pgnodemx