Friday, 11 May 2018

PostgreSQL Monitoring

Database level

Note: Stats collected by below queries are cumulative results i.e. If you want to check activity stats of an operation, then you have to consider stats before and after performing the operation.

Database - Transaction history / deadlocks detected

Following query gives information about no. of transaction rollbacks, commits and deadlocks detected in each database in the server.

SELECT 
    pg_stat_database.datname AS DBName,
 xact_commit AS "Transaction_Commits",
 xact_rollback AS "Transaction_Rollbacks",
 CASE (xact_commit + xact_rollback) 
  WHEN 0 THEN NULL
  ELSE xact_commit::float/(xact_commit + xact_rollback) 
 END AS "Successful_Transaction_ratio",
 deadlocks AS "Deadlocks_Detected",
 confl_deadlock AS "Queries_Cancelled_With_Deadlock",
 stats_reset AS "Stats_collection_starttime",
 ( clock_timestamp() - stats_reset ) AS "StatsDuration(HH_MM_SS_sss)"
FROM pg_stat_database
INNER JOIN pg_stat_database_conflicts
ON pg_stat_database_conflicts.datname = pg_stat_database.datname
ORDER BY "Successful_Transaction_ratio" ASC


Analysis tips:
  1. successful_Transaction_ratio should be close to 1.
  2. check if no. of TransactionRollbacks is equal to deadlocks detected.
  3. If you encounter deadlocks in the stats refer Deadlocks in PostgreSQL for understanding locking and deadlocks.
  4. For difference between Index sequential scan and Bitmap scan refer Index Scan VS Bitmap Scan

Database - Disk & Row read Latency

SELECT  datname AS "DBName",
 tup_inserted AS "Rows_Inserted",
 tup_updated AS "Rows_Updated",
 tup_deleted AS "Rows_Deleted",
 tup_returned AS "Rows_Fetched(FromSequentialScan)",
 tup_fetched  AS "Rows_Fetched(FromBitmapScan)",
 pg_size_pretty( pg_database_size(datname)) AS "Size",
 blk_write_time AS "TotalDiskWriteTime(ms)",
 stats_reset AS "Stats_collection_starttime",
 ( clock_timestamp() - stats_reset ) AS "StatsDuration(HH_MM_SS_sss)"
FROM pg_stat_database
ORDER BY "Rows_Inserted"

Database - IO Performance

Following query gives information about disk activity at database level in the server.

SELECT datname AS DBName,
blks_read AS "TotalDiskReads(Blocks)",
blk_read_time AS "TotalDiskReadTime(ms)",
CASE blks_read
WHEN 0 THEN 0
ELSE (blk_read_time / blks_read )
END AS "AvgDiskReadTime(ms)FromStatsReset",
blk_write_time AS "TotalDiskWriteTime(ms)",
( clock_timestamp() - stats_reset ) AS "StatsDuration(HH_MM_SS_sss)"
FROM pg_stat_database

Analysis tips: 
  1. Correlate machine disk activity with database level disk activity

Cache Usage

Following query gives cache hit details (MS SQL Server logical reads equivalent)

SELECT datname AS DBName, 
blks_read AS "DiskReads(Blocks)",
blks_hit AS "Cachehits(Blocks)",
CASE (blks_read + blks_hit)
WHEN 0 THEN 0
ELSE blks_hit::float/(blks_read + blks_hit)
END AS "CacheHitRatio"
FROM pg_stat_database
Analysis tips: 
  1. CacheHitRatio value should be close to 1

Temp Files Usage

Following query gives temp files usage i.e. files used for sorting operations

SELECT datname AS DBName, 
 temp_files AS "Temp_Files",

temp_bytes AS "Temp_Files_Size(Bytes)"

FROM pg_stat_database

Analysis tips: 
  1. If there is drastic increase in usage of temp files, it may happen that huge number of sorting operations are happening and may require index creation.

Table Level

Operations performed on table & IO

SELECT pg_stat_user_tables.relname AS "TableName",
    ( COALESCE(heap_blks_read,0) + COALESCE(idx_blks_read,0) + COALESCE(toast_blks_read,0) + COALESCE(tidx_blks_read,0)) AS "TotalDiskReads(Blocks)",
( COALESCE(heap_blks_hit,0) + COALESCE(idx_blks_hit,0) + COALESCE(toast_blks_hit,0) + COALESCE(tidx_blks_hit,0) ) AS "TotalCachHits",
    heap_blks_read AS "DiskReads_Data(Blocks)",
heap_blks_hit AS "CacheHits_Data",
idx_blks_read AS "DiskReads_Index(Blocks)",
idx_blks_hit AS "CacheHits_Index",
toast_blks_read AS "DiskReads_ToastData(Blocks)",
toast_blks_hit AS "CachHits_ToastData(Blocks)",
tidx_blks_read AS "DiskReads_ToastIndex(Blocks)",
tidx_blks_hit AS "CacheHits_ToastIndex(Blocks)",
n_tup_ins AS "RowsInserted",
n_tup_upd AS "RowsUpdated",
n_tup_del AS "RowsDeleted",
n_tup_hot_upd AS "RowsUpdated(HOT)",
n_live_tup AS "LiveRows",
n_dead_tup AS "DeadRows"
FROM pg_stat_user_tables 
INNER JOIN pg_statio_user_tables 
ON pg_statio_user_tables.relid = pg_stat_user_tables.relid

Analysis tips: 
  1. Identify table with huge sequential scans and try to create index to the columns.

Index Usage in table


SELECT relname AS "TableName",

seq_scan AS "SequentialScans",

seq_tup_read AS "RowsFetched(SequentialScans)",

idx_scan AS "IndexScans",
idx_tup_fetch AS "RowsFetched(IndexScans)"
FROM pg_stat_user_tables 
ORDER BY "SequentialScans" DESC


SELECT 
relname AS "Table",
indexrelname AS "Index",
idx_scan AS "Scans(WithIndex)",
idx_tup_read AS "IndexRows_Read",
idx_tup_fetch AS "TableRowsRead(UsingIndex)"
FROM pg_stat_user_indexes 
ORDER BY "Scans(WithIndex)" ASC
Analysis tips: 

  1. Identify unused index with 2nd query.

Vacuum and Analyze

SELECT relname AS "TableName",
vacuum_count AS "Vacuumed(Manual)",
autovacuum_count AS "Vacuumed(Auto)",
analyze_count AS "Analyzed(Manual)",
autoanalyze_count "Analyzed(Auto)",
last_vacuum AS "LastVacuum(Manual)",
last_vacuum AS "LastAutoVacuum(Auto)",
last_analyze AS "LastAnalyzed(Manual)",
last_autoanalyze AS "LastAnalyzed(Auto)",
n_mod_since_analyze AS "RowsModified(AfterAnalyze)"
FROM pg_stat_all_tables 





References
Tracking IO stats   https://malisper.me/track_io_timing/





No comments:

Post a Comment