Navigation :

Crunchy PostgreSQL Operator > Advanced Topics > Crunchy Postgres Exporter

Crunchy Postgres Exporter

The crunchy-postgres-exporter container provides real time metrics about the PostgreSQL database via an API. These metrics are scraped and stored by a Prometheus time-series database and are then graphed and visualized through the open source data visualizer Grafana.

The crunchy-postgres-exporter container uses pgMonitor for advanced metric collection. It is required that the crunchy-postgres-ha container has the PGMONITOR_PASSWORD environment variable to create the appropriate user (ccp_monitoring) to collect metrics.

Custom queries to collect metrics can be specified by the user. By mounting a queries.yml file to /conf on the container, additional metrics can be specified for the API to collect. For an example of a queries.yml file, see here

Packages

The crunchy-postgres-exporter Docker image contains the following packages (versions vary depending on PostgreSQL version):

PostgreSQL (13.0, 12.4, 11.9, 10.14, 9.6.19 and 9.5.23)
CentOS7 - publicly available
UBI7 - customers only
PostgreSQL Exporter

Environment Variables

Required

Name	Default	Description
EXPORTER_PG_PASSWORD	none	Provides the password needed to generate the PostgreSQL URL required by the PostgreSQL Exporter to connect to a PG database. Should typically match the `PGMONITOR_PASSWORD` value set in the `crunchy-postgres` container.

Optional

Name	Default	Description
EXPORTER_PG_USER	ccp_monitoring	Provides the username needed to generate the PostgreSQL URL required by the PostgreSQL Exporter to connect to a PG database. Should typically be `ccp_monitoring` per the crunchy-postgres container specification (see environment varaible `PGMONITOR_PASSWORD`).
EXPORTER_PG_HOST	127.0.0.1	Provides the host needed to generate the PostgreSQL URL required by the PostgreSQL Exporter to connect to a PG database
EXPORTER_PG_PORT	5432	Provides the port needed to generate the PostgreSQL URL required by the PostgreSQL Exporter to connect to a PG database
EXPORTER_PG_DATABASE	postgres	Provides the name of the database used to generate the PostgreSQL URL required by the PostgreSQL Exporter to connect to a PG database
DATA_SOURCE_NAME	None	Explicitly defines the URL for connecting to the PostgreSQL database (must be in the form of `postgresql://`). If provided, overrides all other settings provided to generate the connection URL.
CRUNCHY_DEBUG	FALSE	Set this to true to enable debugging in logs. Note: this mode can reveal secrets in logs.
POSTGRES_EXPORTER_PORT	9187	Set the postgres-exporter port to listen on for web interface and telemetry.

Viewing Cluster Metrics

To view a particular cluster’s available metrics in a local browser window, port forwarding can be set up as follows. For a pgcluster, mycluster, deployed in the pgouser1 namespace, use

# If deployed to Kubernetes
kubectl port-forward -n pgouser1 svc/mycluster 9187:9187

# If deployed to OpenShift
oc port-forward -n pgouser1 svc/mycluster 9187:9187

Then, in your local browser, go to http://127.0.0.1:9187/metrics to view the available metrics for that cluster.

Crunchy Postgres Exporter Metrics Detail

Below are details on the various metrics available from the crunchy-postgres-exporter container. The name, SQL query and metric details are given for each available item.

queries_backrest

ccp_backrest_last_diff_backup

SQL Query:

Metrics:

config_file

Configuration file for this backup

stanza

PGBackrest Stanza Name

time_since_completion_seconds

Seconds since the last completed full or differential backup. Differential is always based off last full.

ccp_backrest_last_full_backup

SQL Query:

Metrics:

config_file

Configuration file for this backup

stanza

PGBackrest Stanza Name

time_since_completion_seconds

Seconds since the last completed full backup

ccp_backrest_last_incr_backup

SQL Query:

Metrics:

config_file

Configuration file for this backup

stanza

PGBackrest Stanza Name

time_since_completion_seconds

Seconds since the last completed full, differential or incremental backup. Incremental is always based off last full or differential.

ccp_backrest_last_info

SQL Query:

WITH all_backups AS ( SELECT config_file , jsonb_array_elements(data) AS stanza_data FROM monitor.pgbackrest_info(#PGBACKREST_INFO_THROTTLE_MINUTES#) ) , per_stanza AS ( SELECT config_file , stanza_data->>'name' AS stanza , jsonb_array_elements(stanza_data->'backup') AS backup_data FROM all_backups ) SELECT a.config_file , a.stanza , a.backup_data->>'type' AS backup_type , a.backup_data->'info'->'repository'->>'delta' AS repo_backup_size_bytes , a.backup_data->'info'->'repository'->>'size' AS repo_total_size_bytes , (a.backup_data->'timestamp'->>'stop')::bigint - (a.backup_data->'timestamp'->>'start')::bigint AS backup_runtime_seconds FROM per_stanza a JOIN ( SELECT config_file , stanza , backup_data->>'type' AS backup_type , max(backup_data->'timestamp'->>'start') AS max_backup_start , max(backup_data->'timestamp'->>'stop') AS max_backup_stop FROM per_stanza GROUP BY 1,2,3) b ON a.config_file = b.config_file AND a.stanza = b.stanza AND a.backup_data->>'type' = b.backup_type AND a.backup_data->'timestamp'->>'start' = b.max_backup_start AND a.backup_data->'timestamp'->>'stop' = b.max_backup_stop;

Metrics:

config_file

Configuration file for this backup

stanza

PGBackrest Stanza Name

backup_type

Backup type

repo_backup_size_bytes

Actual size of only this individual backup in the pgbackrest repository

repo_total_size_bytes

Total size of this backup in the pgbackrest repository, including all required previous backups and WAL

backup_runtime_seconds

Total runtime in seconds of this backup

queries_common

ccp_archive_command_status

SQL Query:

SELECT CASE WHEN EXTRACT(epoch from (last_failed_time - last_archived_time)) IS NULL THEN 0 WHEN EXTRACT(epoch from (last_failed_time - last_archived_time)) < 0 THEN 0 ELSE EXTRACT(epoch from (last_failed_time - last_archived_time)) END AS seconds_since_last_fail FROM pg_catalog.pg_stat_archiver

Metrics:

seconds_since_last_fail

Seconds since the last recorded failure of the archive_command

ccp_database_size

SQL Query:

SELECT datname as dbname, pg_database_size(datname) as bytes FROM pg_catalog.pg_database WHERE datistemplate = false

Metrics:

dbname

Database name

bytes

Database size in bytes

ccp_is_in_recovery

SQL Query:

SELECT CASE WHEN pg_is_in_recovery = true THEN 1 ELSE 2 END AS status from pg_is_in_recovery();

Metrics:

status

Return value of 1 means database is in recovery. Otherwise 2 it is a primary.

ccp_locks

SQL Query:

SELECT pg_database.datname as dbname, tmp.mode, COALESCE(count,0) as count FROM ( VALUES ('accesssharelock'), ('rowsharelock'), ('rowexclusivelock'), ('shareupdateexclusivelock'), ('sharelock'), ('sharerowexclusivelock'), ('exclusivelock'), ('accessexclusivelock') ) AS tmp(mode) CROSS JOIN pg_catalog.pg_database LEFT JOIN (SELECT database, lower(mode) AS mode,count(*) AS count FROM pg_catalog.pg_locks WHERE database IS NOT NULL GROUP BY database, lower(mode) ) AS tmp2 ON tmp.mode=tmp2.mode and pg_database.oid = tmp2.database

Metrics:

dbname

Database name

mode

Lock type

count

Number of locks

ccp_pg_settings_checksum

SQL Query:

SELECT monitor.pg_settings_checksum() AS status

Metrics:

status

Value of checksum monitioring status for pg_catalog.pg_settings (postgresql.conf). 0 = valid config. 1 = settings changed. To reset current config to valid after alert, run monitor.pg_settings_checksum_set_valid().

ccp_postgresql_version

SQL Query:

SELECT current_setting('server_version_num')::int AS current

Metrics:

current

The current version of PostgreSQL that this exporter is running on as a 6 digit integer (######).

ccp_postmaster_runtime

SQL Query:

SELECT extract('epoch' from pg_postmaster_start_time) as start_time_seconds from pg_catalog.pg_postmaster_start_time()

Metrics:

start_time_seconds

Time at which postmaster started. Note this metric has been deprecated as of pgMonitor 4.3 and will be removed in a future version. Use ccp_postmaster_uptime_seconds instead.

ccp_postmaster_uptime

SQL Query:

SELECT extract(epoch from (now() - pg_postmaster_start_time() )) AS seconds;

Metrics:

seconds

Time interval in seconds since PostgreSQL database was last restarted

ccp_sequence_exhaustion

SQL Query:

SELECT count FROM monitor.sequence_exhaustion(75)

Metrics:

count

Count of sequences that have reached greater than or equal to 75% of their max available numbers. Function monitor.sequence_status() can provide more details if run directly on system.

ccp_settings_gauge

SQL Query:

select (select setting::int from pg_catalog.pg_settings where name = 'checkpoint_timeout') as checkpoint_timeout , (select setting::float from pg_catalog.pg_settings where name = 'checkpoint_completion_target') as checkpoint_completion_target , (select 8192*setting::bigint as bytes from pg_catalog.pg_settings where name = 'shared_buffers') as shared_buffers

Metrics:

checkpoint_timeout

Checkpoint timeout in seconds

checkpoint_completion_target

Checkpoint completion target, ranging from 0 to 1

shared_buffers

Size of shared_buffers in bytes

ccp_settings_pending_restart

SQL Query:

SELECT count(*) AS count FROM pg_catalog.pg_settings WHERE pending_restart = true

Metrics:

count

Number of settings from pg_settings catalog in a pending_restart state

ccp_stat_bgwriter

SQL Query:

SELECT checkpoints_timed, checkpoints_req, checkpoint_write_time, checkpoint_sync_time, buffers_checkpoint, buffers_clean, maxwritten_clean, buffers_backend, buffers_backend_fsync, buffers_alloc, stats_reset FROM pg_catalog.pg_stat_bgwriter

Metrics:

checkpoints_timed

Number of scheduled checkpoints that have been performed

checkpoints_req

Number of requested checkpoints that have been performed

checkpoint_write_time

Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk, in milliseconds

checkpoint_sync_time

Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk, in milliseconds

buffers_checkpoint

Number of buffers written during checkpoints

buffers_clean

Number of buffers written by the background writer

maxwritten_clean

Number of times the background writer stopped a cleaning scan because it had written too many buffers

buffers_backend

Number of buffers written directly by a backend

buffers_backend_fsync

Number of times a backend had to execute its own fsync call (normally the background writer handles those even when the backend does its own write)

buffers_alloc

Number of buffers allocated

stats_reset

Time at which these statistics were last reset

ccp_stat_database

SQL Query:

SELECT s.datname as dbname, xact_commit, xact_rollback, blks_read, blks_hit, tup_returned, tup_fetched, tup_inserted, tup_updated, tup_deleted, conflicts, temp_files, temp_bytes, deadlocks FROM pg_catalog.pg_stat_database s JOIN pg_catalog.pg_database d on d.datname = s.datname WHERE d.datistemplate = false

Metrics:

dbname

Name of database

xact_commit

Number of transactions in this database that have been committed

xact_rollback

Number of transactions in this database that have been rolled back

blks_read

Number of disk blocks read in this database

blks_hit

Number of times disk blocks were found already in the buffer cache, so that a read was not necessary

tup_returned

Number of rows returned by queries in this database

tup_fetched

Number of rows fetched by queries in this database

tup_inserted

Number of rows inserted by queries in this database

tup_updated

Number of rows updated by queries in this database

tup_deleted

Number of rows deleted by queries in this database

conflicts

Number of queries canceled due to conflicts with recovery in this database

temp_files

Number of rows deleted by queries in this database

temp_bytes

Total amount of data written to temporary files by queries in this database

deadlocks

Number of deadlocks detected in this database

ccp_transaction_wraparound

SQL Query:

WITH max_age AS ( SELECT 2000000000 as max_old_xid, setting AS autovacuum_freeze_max_age FROM pg_catalog.pg_settings WHERE name = 'autovacuum_freeze_max_age'), per_database_stats AS ( SELECT datname , m.max_old_xid::int , m.autovacuum_freeze_max_age::int , age(d.datfrozenxid) AS oldest_current_xid FROM pg_catalog.pg_database d JOIN max_age m ON (true) WHERE d.datallowconn) SELECT max(oldest_current_xid) AS oldest_current_xid , max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS percent_towards_wraparound , max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float))) AS percent_towards_emergency_autovac FROM per_database_stats

Metrics:

oldest_current_xid

Oldest current transaction ID in cluster

percent_towards_wraparound

Percentage towards transaction ID wraparound

percent_towards_emergency_autovac

Percentage towards emergency autovacuum process starting

queries_per_db

ccp_stat_user_tables

SQL Query:

SELECT current_database() as dbname, schemaname, relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd, n_live_tup, n_dead_tup, vacuum_count, autovacuum_count, analyze_count, autoanalyze_count FROM pg_catalog.pg_stat_user_tables

Metrics:

dbname

Database name

schemaname

Name of the schema that this table is in

relname

Name of this table

seq_scan

Number of sequential scans initiated on this table

seq_tup_read

Number of live rows fetched by sequential scans

idx_scan

Number of index scans initiated on this table

idx_tup_fetch

Number of live rows fetched by index scans

n_tup_ins

Number of rows inserted

n_tup_upd

Number of rows updated

n_tup_del

Number of rows deleted

n_tup_hot_upd

Number of rows HOT updated (i.e., with no separate index update required)

n_live_tup

Estimated number of live rows

n_dead_tup

Estimated number of dead rows

vacuum_count

Number of times this table has been manually vacuumed (not counting VACUUM FULL)

autovacuum_count

Number of times this table has been vacuumed by the autovacuum daemon

analyze_count

Number of times this table has been manually analyzed

autoanalyze_count

Number of times this table has been analyzed by the autovacuum daemon

ccp_table_size

SQL Query:

SELECT current_database() as dbname, n.nspname as schemaname, c.relname, pg_total_relation_size(c.oid) as size_bytes FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid WHERE NOT pg_is_other_temp_schema(n.oid) AND relkind IN ('r', 'm', 'f')

Metrics:

dbname

Database name

schemaname

Schema name

relname

Table name

size_bytes

Table size including indexes

queries_pg10

ccp_connection_stats

SQL Query:

select ((total - idle) - idle_in_txn) as active , total , idle , idle_in_txn , (select coalesce(extract(epoch from (max(now() - state_change))),0) from pg_catalog.pg_stat_activity where state = 'idle in transaction') as max_idle_in_txn_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and state <> 'idle' ) as max_query_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and wait_event_type = 'Lock' ) as max_blocked_query_time , max_connections from ( select count(*) as total , coalesce(sum(case when state = 'idle' then 1 else 0 end),0) as idle , coalesce(sum(case when state = 'idle in transaction' then 1 else 0 end),0) as idle_in_txn from pg_catalog.pg_stat_activity) x join (select setting::float AS max_connections FROM pg_settings WHERE name = 'max_connections') xx ON (true);

Metrics:

active

Total non-idle connections

total

Total idle and non-idle connections

idle

Total idle connections

idle_in_txn

Total idle in transaction connections

max_idle_in_txn_time

Length of time in seconds of the longest idle in transaction session

max_query_time

Length of time in seconds of the longest running query

max_blocked_query_time

Length of time in seconds of the longest running query that has been blocked by a heavyweight lock

max_connections

Value of max_connections for the monitored database

ccp_pg_hba_checksum

SQL Query:

SELECT monitor.pg_hba_checksum() AS status

Metrics:

status

Value of checksum monitioring status for pg_catalog.pg_hba_file_rules (pg_hba.conf). 0 = valid config. 1 = settings changed. To reset current config to valid after alert, run monitor.pg_hba_checksum_set_valid().

ccp_replication_lag

SQL Query:

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replay_time

Metrics:

replay_time

Length of time since the last transaction was replayed on replica. Will always increase if no writes on primary.

ccp_replication_lag_size

SQL Query:

SELECT client_addr as replica , client_hostname as replica_hostname , client_port as replica_port , pg_wal_lsn_diff(sent_lsn, replay_lsn) as bytes FROM pg_catalog.pg_stat_replication

Metrics:

replica

Replica address

replica_hostname

Replica hostname

replica_port

Replica port

bytes

Replication lag in bytes

ccp_replication_slots

SQL Query:

SELECT slot_name, active::int, pg_wal_lsn_diff(pg_current_wal_insert_lsn(), restart_lsn) AS retained_bytes FROM pg_catalog.pg_replication_slots

Metrics:

slot_name

Name of replication slot

active

Active state of slot. 1 = true. 0 = false.

retained_bytes

The amount of WAL (in bytes) being retained for this slot

ccp_wal_activity

SQL Query:

SELECT last_5_min_size_bytes, (SELECT COALESCE(sum(size),0) FROM pg_catalog.pg_ls_waldir()) AS total_size_bytes FROM (SELECT COALESCE(sum(size),0) AS last_5_min_size_bytes FROM pg_catalog.pg_ls_waldir() WHERE modification > CURRENT_TIMESTAMP - '5 minutes'::interval) x;

Metrics:

last_5_min_size_bytes

Current size in bytes of the last 5 minutes of WAL generation. Includes recycled WALs.

total_size_bytes

Current size in bytes of the WAL directory

queries_pg11

ccp_connection_stats

SQL Query:

Metrics:

active

Total non-idle connections

total

Total idle and non-idle connections

idle

Total idle connections

idle_in_txn

Total idle in transaction connections

max_idle_in_txn_time

Length of time in seconds of the longest idle in transaction session

max_query_time

Length of time in seconds of the longest running query

max_blocked_query_time

Length of time in seconds of the longest running query that has been blocked by a heavyweight lock

max_connections

Value of max_connections for the monitored database

ccp_pg_hba_checksum

SQL Query:

SELECT monitor.pg_hba_checksum() AS status

Metrics:

status

ccp_replication_lag

SQL Query:

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replay_time

Metrics:

replay_time

Length of time since the last transaction was replayed on replica. Will always increase if no writes on primary.

ccp_replication_lag_size

SQL Query:

SELECT client_addr as replica , client_hostname as replica_hostname , client_port as replica_port , pg_wal_lsn_diff(sent_lsn, replay_lsn) as bytes FROM pg_catalog.pg_stat_replication

Metrics:

replica

Replica address

replica_hostname

Replica hostname

replica_port

Replica port

bytes

Replication lag in bytes

ccp_replication_slots

SQL Query:

SELECT slot_name, active::int, pg_wal_lsn_diff(pg_current_wal_insert_lsn(), restart_lsn) AS retained_bytes FROM pg_catalog.pg_replication_slots

Metrics:

slot_name

Name of replication slot

active

Active state of slot. 1 = true. 0 = false.

retained_bytes

The amount of WAL (in bytes) being retained for this slot

ccp_wal_activity

SQL Query:

Metrics:

last_5_min_size_bytes

Current size in bytes of the last 5 minutes of WAL generation. Includes recycled WALs.

total_size_bytes

Current size in bytes of the WAL directory

queries_pg12

ccp_connection_stats

SQL Query:

Metrics:

active

Total non-idle connections

total

Total idle and non-idle connections

idle

Total idle connections

idle_in_txn

Total idle in transaction connections

max_idle_in_txn_time

Length of time in seconds of the longest idle in transaction session

max_query_time

Length of time in seconds of the longest running query

max_blocked_query_time

Length of time in seconds of the longest running query that has been blocked by a heavyweight lock

max_connections

Value of max_connections for the monitored database

ccp_data_checksum_failure

SQL Query:

SELECT datname AS dbname , checksum_failures AS count , coalesce(extract(epoch from (now() - checksum_last_failure)), 0) AS time_since_last_failure_seconds FROM pg_catalog.pg_stat_database;

Metrics:

dbname

Database name

count

Total number of checksum failures on this database

time_since_last_failure_seconds

Time interval in seconds since the last checksum failure was encountered

ccp_pg_hba_checksum

SQL Query:

SELECT monitor.pg_hba_checksum() AS status

Metrics:

status

ccp_replication_lag

SQL Query:

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replay_time

Metrics:

replay_time

Length of time since the last transaction was replayed on replica. Will always increase if no writes on primary.

ccp_replication_lag_size

SQL Query:

SELECT client_addr as replica , client_hostname as replica_hostname , client_port as replica_port , pg_wal_lsn_diff(sent_lsn, replay_lsn) as bytes FROM pg_catalog.pg_stat_replication

Metrics:

replica

Replica address

replica_hostname

Replica hostname

replica_port

Replica port

bytes

Replication lag in bytes

ccp_replication_slots

SQL Query:

SELECT slot_name, active::int, pg_wal_lsn_diff(pg_current_wal_insert_lsn(), restart_lsn) AS retained_bytes FROM pg_catalog.pg_replication_slots

Metrics:

slot_name

Name of replication slot

active

Active state of slot. 1 = true. 0 = false.

retained_bytes

The amount of WAL (in bytes) being retained for this slot

ccp_wal_activity

SQL Query:

Metrics:

last_5_min_size_bytes

Current size in bytes of the last 5 minutes of WAL generation. Includes recycled WALs.

total_size_bytes

Current size in bytes of the WAL directory

queries_pg95

ccp_connection_stats

SQL Query:

select ((total - idle) - idle_in_txn) as active , total , idle , idle_in_txn , (select coalesce(extract(epoch from (max(now() - state_change))),0) from monitor.pg_stat_activity() where state = 'idle in transaction') as max_idle_in_txn_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from monitor.pg_stat_activity() where state <> 'idle') as max_query_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from monitor.pg_stat_activity() where waiting = 't' ) as max_blocked_query_time , max_connections from ( select count(*) as total , coalesce(sum(case when state = 'idle' then 1 else 0 end),0) as idle , coalesce(sum(case when state = 'idle in transaction' then 1 else 0 end),0) as idle_in_txn from monitor.pg_stat_activity()) x join (select setting::float AS max_connections FROM pg_settings WHERE name = 'max_connections') xx ON (true);

Metrics:

active

Total non-idle connections

total

Total idle and non-idle connections

idle

Total idle connections

idle_in_txn

Total idle in transaction connections

max_idle_in_txn_time

Length of time in seconds of the longest idle in transaction session

max_query_time

Length of time in seconds of the longest running query

max_blocked_query_time

Length of time in seconds of the longest running query that has been blocked by a heavyweight lock

max_connections

Value of max_connections for the monitored database

ccp_replication_lag

SQL Query:

SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replay_time

Metrics:

replay_time

Length of time since the last transaction was replayed on replica. Will always increase if no writes on primary.

ccp_replication_lag_size

SQL Query:

SELECT replica_addr::text as replica , replica_hostname::text , replica_port::text , byte_lag::bigint as bytes FROM monitor.streaming_replica_check();

Metrics:

replica

Replica IP address

replica_hostname

Hostname of replica system (if available)

replica_port

Replica port

bytes

Replication lag in bytes between primary and replica

ccp_replication_slots

SQL Query:

SELECT slot_name, active::int, pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes FROM pg_catalog.pg_replication_slots

Metrics:

slot_name

Name of replication slot

active

Active state of slot. 1 = true. 0 = false.

retained_bytes

The amount of WAL (in bytes) being retained for this slot

ccp_wal_activity

SQL Query:

SELECT (SELECT setting::int8 FROM pg_settings WHERE name = 'wal_segment_size') * (SELECT setting::int8 FROM pg_settings WHERE name = 'wal_block_size') * (SELECT count(*) FROM monitor.pg_ls_waldir()) AS total_size_bytes;

Metrics:

total_size_bytes

Current size in bytes of the WAL directory

queries_pg96

ccp_connection_stats

SQL Query:

select ((total - idle) - idle_in_txn) as active , total , idle , idle_in_txn , (select coalesce(extract(epoch from (max(now() - state_change))),0) from monitor.pg_stat_activity() where state = 'idle in transaction') as max_idle_in_txn_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from monitor.pg_stat_activity() where state <> 'idle') as max_query_time , (select coalesce(extract(epoch from (max(now() - query_start))),0) from pg_catalog.pg_stat_activity where backend_type = 'client backend' and wait_event_type = 'Lock' ) as max_blocked_query_time , max_connections from ( select count(*) as total , coalesce(sum(case when state = 'idle' then 1 else 0 end),0) as idle , coalesce(sum(case when state = 'idle in transaction' then 1 else 0 end),0) as idle_in_txn from monitor.pg_stat_activity()) x join (select setting::float AS max_connections FROM pg_settings WHERE name = 'max_connections') xx ON (true);

Metrics:

active

Total non-idle connections

total

Total idle and non-idle connections

idle

Total idle connections

idle_in_txn

Total idle in transaction connections

max_idle_in_txn_time

Length of time in seconds of the longest idle in transaction session

max_query_time

Length of time in seconds of the longest running query

max_blocked_query_time

Length of time in seconds of the longest running query that has been blocked by a heavyweight lock

max_connections

Value of max_connections for the monitored database

ccp_replication_lag

SQL Query:

SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replay_time

Metrics:

replay_time

Length of time since the last transaction was replayed on replica. Will always increase if no writes on primary.

ccp_replication_lag_size

SQL Query:

SELECT replica_addr::text as replica , replica_hostname::text , replica_port::text , byte_lag::bigint as bytes FROM monitor.streaming_replica_check();

Metrics:

replica

Replica IP address

replica_hostname

Hostname of replica system (if available)

replica_port

Replica port

bytes

Replication lag in bytes between primary and replica

ccp_replication_slots

SQL Query:

SELECT slot_name, active::int, pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes FROM pg_catalog.pg_replication_slots

Metrics:

slot_name

Name of replication slot

active

Active state of slot. 1 = true. 0 = false.

retained_bytes

The amount of WAL (in bytes) being retained for this slot

ccp_wal_activity

SQL Query:

Metrics:

total_size_bytes

Current size in bytes of the WAL directory

pgnodemx

In addition to the metrics above, the pgnodemx PostgreSQL extension provides SQL functions to allow the capture of node OS metrics via SQL queries. For more information, please see the pgnodemx project page:

https://github.com/CrunchyData/pgnodemx

queries_nodemx

ccp_nodemx_cpu

SQL Query:

SELECT monitor.kdapi_scalar_bigint('cpu_request') as request, monitor.kdapi_scalar_bigint('cpu_limit') as limit

Metrics:

request

CPU request value in milli cores

limit

CPU limit value in milli cores

ccp_nodemx_cpuacct

SQL Query:

SELECT monitor.cgroup_scalar_bigint('cpuacct.usage') as usage

Metrics:

usage

CPU usage in nanoseconds

ccp_nodemx_cpucfs

SQL Query:

SELECT monitor.cgroup_scalar_bigint('cpu.cfs_period_us') as period_us, monitor.cgroup_scalar_bigint('cpu.cfs_quota_us') as quota_us

Metrics:

period_us

the total available run-time within a period (in microseconds)

quota_us

the length of a period (in microseconds)

ccp_nodemx_cpustat

SQL Query:

select (SELECT val as nr_periods FROM monitor.cgroup_setof_kv('cpu.stat') where key='nr_periods'), (SELECT val as nr_throttled FROM monitor.cgroup_setof_kv('cpu.stat') where key='nr_throttled'), (SELECT val as throttled_time FROM monitor.cgroup_setof_kv('cpu.stat') where key='throttled_time')

Metrics:

nr_threads

number of periods that any thread was runnable

nr_throttled

number of runnable periods in which the application used its entire quota and was throttled

throttled_time

sum total amount of time individual threads within the monitor.cgroup were throttled

ccp_nodemx_data_disk

SQL Query:

SELECT mount_point,fs_type,total_bytes,available_bytes,total_file_nodes,free_file_nodes FROM monitor.proc_mountinfo() m JOIN monitor.fsinfo(m.mount_point) f USING (major_number, minor_number) WHERE m.mount_point IN ('/pgdata', '/pgwal') OR m.mount_point like '/tablespaces/%'

Metrics:

mount_point

mount point

fs_type

File system type

total_bytes

Size in bytes

available_bytes

Available size in bytes

total_file_nodes

Total file nodes

free_file_nodes

Available file nodes

ccp_nodemx_disk_activity

SQL Query:

SELECT mount_point,sectors_read,sectors_written FROM monitor.proc_mountinfo() m JOIN monitor.proc_diskstats() d USING (major_number, minor_number) WHERE m.mount_point IN ('/pgdata', '/pgwal') OR m.mount_point like '/tablespaces/%'

Metrics:

mount_point

mount point

sectors_read

Total sectors read

sectors_written

Total sectors writen

ccp_nodemx_mem

SQL Query:

SELECT monitor.kdapi_scalar_bigint('mem_request') as request, monitor.kdapi_scalar_bigint('mem_limit') as limit, (select val from monitor.cgroup_setof_kv('memory.stat') where key='cache') as cache, (select val from monitor.cgroup_setof_kv('memory.stat') where key='rss') as rss, (select val from monitor.cgroup_setof_kv('memory.stat') where key='shmem') as shmem, (select val from monitor.cgroup_setof_kv('memory.stat') where key='mapped_file') as mapped_file, (select val from monitor.cgroup_setof_kv('memory.stat') where key='dirty') as dirty, (select val from monitor.cgroup_setof_kv('memory.stat') where key='active_anon') as active_anon, (select val from monitor.cgroup_setof_kv('memory.stat') where key='inactive_anon') as inactive_anon, (select val from monitor.cgroup_setof_kv('memory.stat') where key='active_file') as active_file, (select val from monitor.cgroup_setof_kv('memory.stat') where key='inactive_file') as inactive_file

Metrics:

request

Memory request value in bytes

limit

Memory limit value in bytes

cache

Total bytes of page cache memory

rss

Total bytes of anonymous and swap cache memory

shmem

Total bytes of shared memory

mapped_file

Total bytes of mapped file (includes tmpfs/shmem)

mapped_file

Total bytes of mapped file (includes tmpfs/shmem)

dirty

Total bytes that are waiting to get written back to the disk

active_anon

Total bytes of anonymous and swap cache memory on active LRU list

inactive_anon

Total bytes of anonymous and swap cache memory on inactive LRU list

active_file

Total bytes of file-backed memory on active LRU list

inactive_file

Total bytes of file-backed memory on inactive LRU list

ccp_nodemx_network

SQL Query:

SELECT interface,tx_bytes,tx_packets, rx_bytes,rx_packets from monitor.proc_network_stats()