Software Performance Engineering: 2018

Friday, 14 September 2018

MS SQL Server - Waits stats using extended events

Identify wait stats of given workload

Set wait stats baseline

Perform Operation

Capture wait stats

Identify the top wait type from the resultant list

Capture queries corresponding to wait type

Prepare extended event session for identified wait type

Start extended event session

Perform Operation

Extract results from the session

Stop extended events session

Performance overhead on using extended events

Identify wait stats of given workload

Benchmark wait statistics

We use DMV: sys.dm_os_wait_stats to extract the wait stats in the sql server. This DMV provides wait times categorized by wait type as following result set:

The wait time retrieved here is the cumulative time from last services restart and last wait stats reset.
In order to extract the wait stats for the workload we have applied, We have to either

Reset wait stats (or)
Baseline wait stats before applying workload.

Reset wait stats

Wait stats can be reset using below DBCC command.

DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR)

If the database is production database, it might not be optimum solution to clear wait stats. Better, considering the benchmark rather than resetting.

Baseline wait statistics

Create baseline with existing statistics using benchmark wait stats query. This creates table WaitStatsBench as below which we use to extract wait times for given workload.

Perform Operation

Perform the actual business transaction or run workload.

Capture wait stats

Extract the wait stats for the workload using Get wait statistics query.

from the DMV sys.dm_os_wait_stats or use modified query to get aggregate wait stats. This results:

Wait statistics captured above are the aggregated result from the last services restart or clear wait stats. Run the difference from baseline query to get the differences from the base line.

Identify the top wait type from the resultant list

Identify the top wait type with highest wait time. Consider ASYNC_NETWORK_IO as the target wait type to be investigated.

Capture queries corresponding to wait type

Prepare extended event session for identified wait type

Create stored procedure with extended event using create resource waits query.

Create stored procedure to extract the captured wait stats using extract events query.

Start extended event session

Insert the corresponding wait type into the table to which query should be captured. following table covers most of the wait types:

wait_type in sys.dm_os_wait_stats	map_value in sys.dm_xe_map_values
ASYNC_NETWORK_IO	NETWORK_IO
BROKER_TASK_STOP	SSB_TASK_STOP
CLR_JOIN	CLR_TASK_JOIN
CLR_MEMORY_SPY	CLR_MEMORY_SPY_ACCESS
CREATE_DATINISERVICE	GET_DATINISERVICE
DBCC_SCALE_OUT_EXPR_CACHE	CHECK_EXPRESSION_CACHE
DBSTATE	DB_STATE
DLL_LOADING_MUTEX	DLL_LOAD
ERROR_REPORTING_MANAGER	ERROR_REPORTING_MGR
EXECUTION_PIPE_EVENT_INTERNAL	TWO_THREAD_PIPE_EVENT
FS_FC_RWLOCK	FS_GC_RWLOCK
FT_IFTS_RWLOCK	FT_RWLOCK
FT_IFTS_SCHEDULER_IDLE_WAIT	FT_SCHEDULER_IDLE_WAIT
FULLTEXT GATHERER	FULLTEXT_GATHERER
HADR_ARCONTROLLER_NOTIFICATIONS_SUBSCRIBER_LIST	HADR_ARPROXY_NOTIFICATION_SUBSCRIBER_LIST
HADR_DATABASE_FLOW_CONTROL	HADR_PARTNER_FLOW
HADR_DATABASE_VERSIONING_STATE	HADR_VERSIONING_STATE
HADR_DATABASE_WAIT_FOR_RESTART	__indexMUTEX_HADR_DATABASE_WAIT_FOR_RESTART
HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING	HADR_WAIT_FOR_TRANSITION_TO_VERSIONING
HADR_FILESTREAM_BLOCK_FLUSH	HADRFS_BLOCK_FLUSH
HADR_FILESTREAM_FILE_CLOSE	HADRFS_FILE_CLOSE
HADR_FILESTREAM_FILE_REQUEST	HADRFS_FILE_REQUEST
HADR_FILESTREAM_IOMGR	HADRFS_IOMGR
HADR_FILESTREAM_IOMGR_IOCOMPLETION	HADRFS_IOMGR_IOCOMPLETION
HADR_FILESTREAM_MANAGER	HADRFS_MANAGER
HADR_RECOVERY_WAIT_FOR_CONNECTION	__indexMUTEX_HADR_RECOVERY_WAIT_FOR_CONNECTION
HADR_RECOVERY_WAIT_FOR_UNDO	__indexMUTEX_HADR_RECOVERY_WAIT_FOR_UNDO
HADR_TRANSPORT_FLOW_CONTROL	HADR_TRANSPORT_FLOW
HTBUILD	HASH_TABLE_BUILD
HTREPARTITION	HASH_TABLE_REPARTITION
INTERNAL_TESTING
LAZYWRITER_SLEEP	LZW_SLEEP
MD_AGENT_YIELD	METADATA_AGENT_YIELD
MD_LAZYCACHE_RWLOCK	METADATA_LAZYCACHE_RWLOCK
MISCELLANEOUS	UNKNOWN
MSSEARCH	MSSEARCH_COM
PREEMPTIVE_FSRECOVER_UNCONDITIONALUNDO	PREEMPTIVE_FSRECOVER_CONDITIONALUNDO
PREEMPTIVE_OS_SQMLAUNCH	PREEMPTIVE_SQMLAUNCH
PWAIT_ALL_COMPONENTS_INITIALIZED	ALL_COMPONENTS_INITIALIZED
PWAIT_COOP_SCAN	COOP_SCAN
PWAIT_EVENT_SESSION_INIT_MUTEX	EVENT_SESSION_INIT_MUTEX
PWAIT_HADR_ACTION_COMPLETED	HADR_ACTION_COMPLETED
PWAIT_HADR_CHANGE_NOTIFIER_TERMINATION_SYNC	HADR_ARPROXY_NOTIFICATION_SUBSCRIBER_LIST
PWAIT_HADR_CLUSTER_INTEGRATION	HADR_CHANGE_NOTIFIER_TERMINATION_SYNC
PWAIT_HADR_FAILOVER_COMPLETED	HADR_CLUSTER_INTEGRATION
PWAIT_HADR_OFFLINE_COMPLETED	HADR_FAILOVER_COMPLETED
PWAIT_HADR_ONLINE_COMPLETED	HADR_OFFLINE_COMPLETED
PWAIT_HADR_POST_ONLINE_COMPLETED	HADR_ONLINE_COMPLETED
PWAIT_HADR_SERVER_READY_CONNECTIONS	HADR_SERVER_READY_CONNECTIONS
PWAIT_HADR_WORKITEM_COMPLETED	HADR_WORKITEM_COMPLETED
PWAIT_MD_LOGIN_STATS	MD_LOGIN_STATS
PWAIT_MD_RELATION_CACHE	MD_RELATION_CACHE
PWAIT_MD_SERVER_CACHE	MD_SERVER_CACHE
PWAIT_MD_UPGRADE_CONFIG	MD_UPGRADE_CONFIG
PWAIT_PREEMPTIVE_AUDIT_ACCESS_WINDOWSLOG	PREEMPTIVE_AUDIT_ACCESS_WINDOWSLOG
PWAIT_QRY_BPMEMORY	QRY_BPMEMORY
PWAIT_REPLICA_ONLINE_INIT_MUTEX	REPLICA_ONLINE_INIT_MUTEX
PWAIT_RESOURCE_SEMAPHORE_FT_PARALLEL_QUERY_SYNC	RESOURCE_SEMAPHORE_FT_PARALLEL_QUERY_SYNC
PWAIT_SECURITY_CACHE_INVALIDATION	SECURITY_CACHE_INVALIDATION
QUERY_EXECUTION_INDEX_SORT_EVENT_OPEN	QUERY_EXEC_INDEXSORT_OPEN
REDO_THREAD_PENDING_WORK	REDO_SIGNAL
REDO_THREAD_SYNC	REDO_SYNC
RESOURCE_GOVERNOR_IDLE
SCAN_CHAR_HASH_ARRAY_INITIALIZATION	SCAN_CHAR_HASH_ARRAY_INIT
SERVER_IDLE_CHECK	SERVER_IDLE_LOCK
SNI_LISTENER_ACCESS	LISTENER_UPDATE
SNI_TASK_COMPLETION	SNI_WAIT_TASK_FINISH
SP_PREEMPTIVE_SERVER_DIAGNOSTICS_SLEEP	PREEMPTIVE_SP_SERVER_DIAGNOSTICS_SLEEP
THREADPOOL	SOS_WORKER
TRAN_MARKLATCH_DT	TRANMARKLATCH_DT
TRAN_MARKLATCH_EX	TRANMARKLATCH_EX
TRAN_MARKLATCH_KP	TRANMARKLATCH_KP
TRAN_MARKLATCH_NL	TRANMARKLATCH_NL
TRAN_MARKLATCH_SH	TRANMARKLATCH_SH
TRAN_MARKLATCH_UP	TRANMARKLATCH_UP
VIA_ACCEPT	VIA_ACCEPT_DONE
WAIT_XTP_GUEST	XTP_GUEST
WAIT_XTP_TASK_SHUTDOWN	XTP_TASK_SHUTDOWN
WAIT_XTP_TRAN_COMMIT	XTP_TRAN_COMMIT
WAITFOR_TASKSHUTDOWN	TASKSHUTDOWN
XE_CALLBACK_LIST	XE_CALLBACK
XTPPROC_PARTITIONED_STACK_CREATE	__indexXTPPROC_PARTITIONED_STACK_CREATE

DECLARE @WaitTypeTVP AS WaitType;INSERT INTO @WaitTypeTVP (Name)VALUES ('NETWORK_IO'), ('XE_CALLBACK')EXEC Waits.CaptureStatementsCausingWaits @TVP = @WaitTypeTVP;GO

Perform operation

Wait until actual workload completes.

Extract results from the session

exec [Waits].[GetStatementsCausingWaits]

Stop extended events session

exec Waits.CaptureStatementsCausingWaits @StopXevent = 1

Performance overhead on using extended events

Ringbuffer

File

References

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/07/03/what-are-you-waiting-for-introducing-wait-stats-support-in-query-store/

https://www.sqlskills.com/blogs/jonathan/an-xevent-a-day-30-of-31-tracking-session-and-statement-level-waits/

https://www.sqlskills.com/blogs/jonathan/mapping-wait-types-in-dm_os_wait_stats-to-extended-events/

https://www.brentozar.com/extended-events/

https://johnsterrett.com/2013/10/08/benchmark-sql-server-wait-statistics/

Saturday, 8 September 2018

PostgreSQL High Availability, Load Balancing and Replication

Introduction

A secondary server taking over quickly while primary server fails to serve is called High Availability.
Several database machines serving same data is called as Load Balancing.

Ideally, web servers serving static web pages and read-only databases can be combined easily by merely load balancing the requests to be served. Unfortunately, most of the database servers receive mix of read and write requests. This makes hard to get consistent data from one of the available servers at any given point of time. The data written to the server must be propagated to all other servers.

This synchronization problem is solved by many solutions:
One Server to modify data & other servers to perform read operations
Using one server (Master or Primary server) as write only and other servers (Standby or Secondary server) as read only servers which track changes in the primary server.

Primary or Master server
Secondary or Standby server

Warm standyby server (cannot be connected until it is promoted to a master server)
Hot standyby server (serves read only queries)

Data modification is not considered as committed until all the servers have committed the transaction

This solution guarantees:

fail over (There is no loss of data)
all load balanced servers will retrieve consistent data no matter whatever the serve is queried

Comparison of different solutions

Shared Disk Failover (sharing the hardware functionality)
Uses single disk array that is shared by multiple servers. As soon as primary server server fails, secondary server will be able to mount and start.
Is advantageous as there:

is no overhead of synchronization of the primary server with the secondary server.
and allows rapid failover with no data loss.

Also has some serious limitations since:

If shared disk array fails or corrupts all the servers becomes non-functional.
and standby server should never access the shared storage when primary server is accessing

File System Replication (file system replication Eg: DRBD)

has restriction that, writes to standby server must be done in the same order of the data changes in primary server to ensure consistency to the standby server.

Write-Ahead Log shipping

Synchronously or Asynchronously warm and hot standby servers are kept updated by reading stream of WAL records and is achieved by

either:

File based log shipping
Streaming replication
or combination of both

This is done on entire database

Logical Replication

Logical replication constructs a stream of logical data modifications (at table level is also possible) from the WAL of the server and sends to the other servers (can be from master to standby or standby to master).

Trigger-Based Master-Standyby Replication (Updates standby servers asynchronously in batches) Eg: Slony-I
This is 2 step process

Master-Standby replication setup server asynchronously sends all the modification queries to the master server.
Master server asynchronously sends the data to the standby server.

Asynchronously sends all the data modification queries to the standby servers as a batch with help of which standby servers can answer read-only queries while the primary server is running. This case may lead to possible data loss during failover.
Statement-Based Replication Middleware (Each server operated independently)
A program intercepts every SQL Query and broadcasts the same to all the servers to execute. This can cause data inconsistency if the queries contains functions like CURRENT_TIMESTAMP(), random(), sequences which can have different value on different servers etc. Care must be taken that transaction either commits or rollbacks on all the servers, perhaps using the 2 phase commit (PREPARE TRANSACTION and COMMIT TRANSACTION)

One solution for this would be that middleware should query server dependent functions from the one server and make to use same in all other servers.

Asynchronous Multimaster Replication

Each server works independently and periodically communicates each other and resolve the conflicting transactions. These conflicting transactions can be resolved by user or using some rules like Bucardo.

Synchronous Multimaster Replicatiom

Each server can accept the write requests and modified data is sent to all other servers. Since, data is sent to all other servers, we won't face any issue with non-deterministic functions like random() etc. Heavy write activity may lead to excessive locking leading to the poor performance. However, read operations take advantage since we need not partition the workload and the any server can serve the read request. The problem with write overhead is reduced by using shared disk.

This replication should be implemented in the application using 2 phase commit transactions since, PostgreSQL doesn't implicit;y provide us the functionality.

References

https://www.postgresql.org/docs/current/static/high-availability.html

Sunday, 22 July 2018

MS SQL Server - Extended events

Extended Events

Sunday, 13 May 2018

Performance Engineering Tools & References

Eclipse memory analyzer tool quick guide: https://eclipsesource.com/blogs/2013/01/21/10-tips-for-using-the-eclipse-memory-analyzer/

Postgresql Client: https://blog.devart.com/meet-the-most-anticipated-client-ever-for-postgresql-developers.html?utm_source=groups&utm_medium=social&utm_campaign=buffer&utm_content=product-release-studio-postgresql

PostgreSQL Internals: http://www.interdb.jp/pg/index.html
https://social.technet.microsoft.com/wiki/contents/articles/1515.perfguide-user-mode-versus-privileged-mode-processor-usage.aspx

Networking Concepts: https://hpbn.co/

Linux Commands: https://wiki.smartos.org/display/DOC/The+Linux-to-SmartOS+Cheat+Sheet

Saturday, 12 May 2018

PostgreSQL - pg_stat_statements

Enable pg_stat_statements

1. Open postgresql.conf.
2. Search of 'shared_preload_libraries' parameter and set it to 'pg_stat_statements'.
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
3. Save the file and restart services.
4. Connect using pgadmin and check for pg_stat_statements extension.

SELECT *

FROM

pg_available_extensions

WHERE

name = 'pg_stat_statements'

and installed_version is not null

5. If there is no extension, create one.

CREATE EXTENSION pg_stat_statements;

6. Check, whether enabled

SELECT * FROM pg_stat_statements

Reset statistics counters

SELECT pg_stat_statements_reset();

Reset statistics counters

SELECT pg_stat_reset();

All Statements

SELECT query AS "Query_Text",

calls AS "Execution_Count",

total_time AS "Total_Execution_Time(ms)",

round((100 * total_time /

sum(total_time::numeric) OVER ())::numeric, 2) AS "Percentage_Time(%)",

shared_blks_hit AS "Total_Cache_Hits(Shared_Blocks)",

shared_blks_read AS "Total_Reads(Shared_Blocks)",

CASE calls

WHEN 0 THEN 0

ELSE blk_read_time / calls

END AS "ReadTime_Per_Query",

CASE rows

WHEN 0 THEN 0

ELSE blk_read_time / rows

END AS "ReadTime_Per_Row",

blk_read_time AS "Total_Read_Time(ms)",

shared_blks_written AS "Total_Writes(Shared_Blocks)",

CASE calls

WHEN 0 THEN 0

ELSE blk_write_time / calls

END AS "WriteTime_Per_Query",

CASE rows

WHEN 0 THEN 0

ELSE blk_write_time / rows

END AS "WriteTime_Per_Row",

blk_write_time AS "Total_Write_Time(ms)",

shared_blks_dirtied AS "Total_Dirtied(Shared_Blocks)",

temp_blks_read AS "TempReads(Blocks)",

temp_blks_written AS "TempWrites(Blocks)",

local_blks_hit AS "Total_Cache_Hits(Local_Blocks)",

local_blks_read AS "Total_Reads(Local_Blocks)",

local_blks_written AS "Total_Writes(Local_Blocks)",

local_blks_dirtied AS "Total_Dirtied(Local_Blocks)"

FROM pg_stat_statements

ORDER BY "Percentage_Time(%)" DESC

SQL ordered by executions

SELECT roles.rolname,
statements.query AS "Query_Text",
statements.calls AS "Execution_Count",
statements.total_time AS "Total_Execution_Time(ms)"
FROM pg_stat_statements AS statements
INNER JOIN pg_stat_database AS db ON db.datid = statements.dbid
INNER JOIN pg_roles AS roles ON roles.oid = statements.userid
-- WHERE db.datname = 'nimbus_0_production' AND roles.rolname NOT LIKE '%backup' AND statements.query NOT LIKE '%FROM pg_%' AND roles.rolname IN ('nimbus', 'wfs') AND statements.query NOT LIKE '%from pg_%'
ORDER BY statements.calls DESC

LIMIT 50

SQL ordered by elapsed time

SELECT roles.rolname,
statements.query AS "Query_Text",
statements.total_time AS "Total_Execution_Time(ms)",
CASE statements.calls
WHEN 0 then 0
ELSE statements.total_time / statements.calls
END AS "TimePerExecution(ms)",
round((100 * statements.total_time /
sum(statements.total_time::numeric) OVER ())::numeric, 2) AS "Percentage_Time(%)",
statements.calls AS "Execution_Count"
FROM pg_stat_statements AS statements
INNER JOIN pg_stat_database ON pg_stat_database.datid = statements.dbid
INNER JOIN pg_roles roles ON roles.oid = statements.userid
-- WHERE pg_stat_database.datname = 'nimbus_0_production' AND roles.rolname NOT LIKE '%backup' AND statements.query NOT LIKE '%FROM pg_%' AND roles.rolname IN ('nimbus', 'wfs') AND statements.query NOT LIKE '%from pg_%'
ORDER BY "Percentage_Time(%)" DESC
LIMIT 50

SQL with highest disk reads

SELECT rolname,
query AS "Query_Text",
total_time AS "Total_Execution_Time(ms)",
(COALESCE(shared_blks_read,0) + COALESCE(local_blks_read,0) + COALESCE(temp_blks_read,0)) AS "Total_Reads(Blocks)",
statements.blk_read_time AS "Total_Read_Time(ms)",
CASE calls
WHEN 0 THEN 0
ELSE (COALESCE(shared_blks_read,0) + COALESCE(local_blks_read,0) + COALESCE(temp_blks_read,0)) / calls
END AS "ReadTime_Per_Query"
FROM pg_stat_statements AS statements
INNER JOIN pg_stat_database ON pg_stat_database.datid = statements.dbid
INNER JOIN pg_roles roles ON roles.oid = userid
-- WHERE pg_stat_database.datname = 'nimbus_0_production' AND rolname NOT LIKE '%backup' AND query NOT LIKE '%_pg%'
ORDER BY "Total_Reads(Blocks)" DESC

LIMIT 50

SQL with highest disk writes

SELECT rolname,
query AS "Query_Text",
total_time AS "Total_Execution_Time(ms)",
(COALESCE(shared_blks_written,0) + COALESCE(local_blks_written,0) + COALESCE(temp_blks_written,0)) AS "Total_Writes(Blocks)",
statements.blk_read_time AS "Total_Write_Time(ms)",
CASE calls
WHEN 0 THEN 0
ELSE (COALESCE(shared_blks_written,0) + COALESCE(local_blks_written,0) + COALESCE(temp_blks_written,0)) / calls
END AS "WriteTime_Per_Query"
FROM pg_stat_statements AS statements
INNER JOIN pg_stat_database ON pg_stat_database.datid = statements.dbid
INNER JOIN pg_roles roles ON roles.oid = userid
-- WHERE pg_stat_database.datname = 'nimbus_0_production' AND rolname NOT LIKE '%backup' AND query NOT LIKE '%_pg%'
ORDER BY "Total_Writes(Blocks)" DESC
LIMIT 50

Friday, 11 May 2018

PostgreSQL Monitoring

Database level

Note: Stats collected by below queries are cumulative results i.e. If you want to check activity stats of an operation, then you have to consider stats before and after performing the operation.

Database - Transaction history / deadlocks detected

Following query gives information about no. of transaction rollbacks, commits and deadlocks detected in each database in the server.

SELECT
pg_stat_database.datname AS DBName,
xact_commit AS "Transaction_Commits",
xact_rollback AS "Transaction_Rollbacks",
CASE (xact_commit + xact_rollback)
WHEN 0 THEN NULL
ELSE xact_commit::float/(xact_commit + xact_rollback)
END AS "Successful_Transaction_ratio",
deadlocks AS "Deadlocks_Detected",
confl_deadlock AS "Queries_Cancelled_With_Deadlock",
stats_reset AS "Stats_collection_starttime",
( clock_timestamp() - stats_reset ) AS "StatsDuration(HH_MM_SS_sss)"
FROM pg_stat_database
INNER JOIN pg_stat_database_conflicts
ON pg_stat_database_conflicts.datname = pg_stat_database.datname
ORDER BY "Successful_Transaction_ratio" ASC

Analysis tips:

successful_Transaction_ratio should be close to 1.
check if no. of TransactionRollbacks is equal to deadlocks detected.
If you encounter deadlocks in the stats refer Deadlocks in PostgreSQL for understanding locking and deadlocks.
For difference between Index sequential scan and Bitmap scan refer Index Scan VS Bitmap Scan

Database - Disk & Row read Latency

SELECT datname AS "DBName",
tup_inserted AS "Rows_Inserted",
tup_updated AS "Rows_Updated",
tup_deleted AS "Rows_Deleted",
tup_returned AS "Rows_Fetched(FromSequentialScan)",
tup_fetched AS "Rows_Fetched(FromBitmapScan)",
pg_size_pretty( pg_database_size(datname)) AS "Size",
blk_write_time AS "TotalDiskWriteTime(ms)",
stats_reset AS "Stats_collection_starttime",
( clock_timestamp() - stats_reset ) AS "StatsDuration(HH_MM_SS_sss)"
FROM pg_stat_database
ORDER BY "Rows_Inserted"

Database - IO Performance

Following query gives information about disk activity at database level in the server.

SELECT datname AS DBName,
blks_read AS "TotalDiskReads(Blocks)",
blk_read_time AS "TotalDiskReadTime(ms)",
CASE blks_read
WHEN 0 THEN 0
ELSE (blk_read_time / blks_read )
END AS "AvgDiskReadTime(ms)FromStatsReset",
blk_write_time AS "TotalDiskWriteTime(ms)",
( clock_timestamp() - stats_reset ) AS "StatsDuration(HH_MM_SS_sss)"
FROM pg_stat_database

Analysis tips:

Correlate machine disk activity with database level disk activity

Cache Usage

Following query gives cache hit details (MS SQL Server logical reads equivalent)

SELECT datname AS DBName,

blks_read AS "DiskReads(Blocks)",

blks_hit AS "Cachehits(Blocks)",

CASE (blks_read + blks_hit)

WHEN 0 THEN 0

ELSE blks_hit::float/(blks_read + blks_hit)

END AS "CacheHitRatio"

FROM pg_stat_database

Analysis tips:

CacheHitRatio value should be close to 1

Temp Files Usage

Following query gives temp files usage i.e. files used for sorting operations

SELECT datname AS DBName,
temp_files AS "Temp_Files",

temp_bytes AS "Temp_Files_Size(Bytes)"

FROM pg_stat_database

Analysis tips:

If there is drastic increase in usage of temp files, it may happen that huge number of sorting operations are happening and may require index creation.

Table Level

Operations performed on table & IO

SELECT pg_stat_user_tables.relname AS "TableName",
( COALESCE(heap_blks_read,0) + COALESCE(idx_blks_read,0) + COALESCE(toast_blks_read,0) + COALESCE(tidx_blks_read,0)) AS "TotalDiskReads(Blocks)",
( COALESCE(heap_blks_hit,0) + COALESCE(idx_blks_hit,0) + COALESCE(toast_blks_hit,0) + COALESCE(tidx_blks_hit,0) ) AS "TotalCachHits",
heap_blks_read AS "DiskReads_Data(Blocks)",
heap_blks_hit AS "CacheHits_Data",
idx_blks_read AS "DiskReads_Index(Blocks)",
idx_blks_hit AS "CacheHits_Index",
toast_blks_read AS "DiskReads_ToastData(Blocks)",
toast_blks_hit AS "CachHits_ToastData(Blocks)",
tidx_blks_read AS "DiskReads_ToastIndex(Blocks)",
tidx_blks_hit AS "CacheHits_ToastIndex(Blocks)",
n_tup_ins AS "RowsInserted",
n_tup_upd AS "RowsUpdated",
n_tup_del AS "RowsDeleted",
n_tup_hot_upd AS "RowsUpdated(HOT)",
n_live_tup AS "LiveRows",
n_dead_tup AS "DeadRows"
FROM pg_stat_user_tables
INNER JOIN pg_statio_user_tables
ON pg_statio_user_tables.relid = pg_stat_user_tables.relid

Analysis tips:

Identify table with huge sequential scans and try to create index to the columns.

Index Usage in table

SELECT relname AS "TableName",

 seq_scan AS "SequentialScans",

 seq_tup_read AS "RowsFetched(SequentialScans)",

 idx_scan AS "IndexScans",

 idx_tup_fetch AS "RowsFetched(IndexScans)"

FROM pg_stat_user_tables 

ORDER BY "SequentialScans" DESC

SELECT 

 relname AS "Table",

 indexrelname AS "Index",

 idx_scan AS "Scans(WithIndex)",

 idx_tup_read AS "IndexRows_Read",

 idx_tup_fetch AS "TableRowsRead(UsingIndex)"

FROM pg_stat_user_indexes 

ORDER BY "Scans(WithIndex)" ASC

Analysis tips: 

Identify unused index with 2nd query.

Vacuum and Analyze

SELECT relname AS "TableName",

 vacuum_count AS "Vacuumed(Manual)",

 autovacuum_count AS "Vacuumed(Auto)",

 analyze_count AS "Analyzed(Manual)",

 autoanalyze_count "Analyzed(Auto)",

 last_vacuum AS "LastVacuum(Manual)",

 last_vacuum AS "LastAutoVacuum(Auto)",

 last_analyze AS "LastAnalyzed(Manual)",

 last_autoanalyze AS "LastAnalyzed(Auto)",

 n_mod_since_analyze AS "RowsModified(AfterAnalyze)"

FROM pg_stat_all_tables

References
Tracking IO stats https://malisper.me/track_io_timing/