Friday, 17 November 2017

MS SQL Server Tips

Get List of Perfmon Counters

typeperf -q > counters.txt


--Display all indexes along with key columns, included columns and index type

use PERF11329PSP;

DECLARE @TempTable AS TABLE (SchemaName VARCHAR(100),
                                                 ObjectID INT,
                                                 TableName VARCHAR(100),
                                                 IndexID INT,
                                                 IndexName VARCHAR(100),
                                                 ColumnID INT,
                                                 column_index_id INT,
                                                 ColumnNames  VARCHAR(500),
                                                 IncludeColumns  VARCHAR(500),
                                                 NumberOfColumns INT,
                                                 IndexType  VARCHAR(20),
                                                LastColRecord INT);

WITH CTE_Indexes (SchemaName, ObjectID, TableName, IndexID, IndexName, ColumnID, column_index_id, ColumnNames, IncludeColumns, NumberOfColumns, IndexType)
AS
(
SELECT s.name, t.object_id, t.name, i.index_id, i.name, c.column_id, ic.index_column_id,
              CASE ic.is_included_column WHEN 0 THEN CAST(c.name AS VARCHAR(5000)) ELSE '' END,
              CASE ic.is_included_column WHEN 1 THEN CAST(c.name AS VARCHAR(5000)) ELSE '' END, 1, i.type_desc
       FROM  sys.schemas AS s
              JOIN sys.tables AS t ON s.schema_id = t.schema_id
                     JOIN sys.indexes AS i ON i.object_id = t.object_id
                           JOIN sys.index_columns AS ic ON ic.index_id = i.index_id AND ic.object_id = i.object_id
                                  JOIN sys.columns AS c ON c.column_id = ic.column_id AND c.object_id = ic.object_id
                                         AND ic.index_column_id = 1
UNION ALL
SELECT s.name, t.object_id, t.name, i.index_id, i.name, c.column_id, ic.index_column_id,
              CASE ic.is_included_column WHEN 0 THEN CAST(cte.ColumnNames + ', ' + c.name AS VARCHAR(5000))  ELSE cte.ColumnNames END,
              CASE 
                     WHEN ic.is_included_column = 1 AND cte.IncludeColumns != '' THEN CAST(cte.IncludeColumns + ', ' + c.name AS VARCHAR(5000))
                     WHEN ic.is_included_column =1 AND cte.IncludeColumns = '' THEN CAST(c.name AS VARCHAR(5000))
                     ELSE ''
              END,
              cte.NumberOfColumns + 1, i.type_desc
       FROM  sys.schemas AS s
              JOIN sys.tables AS t ON s.schema_id = t.schema_id
                     JOIN sys.indexes AS i ON i.object_id = t.object_id
                           JOIN sys.index_columns AS ic ON ic.index_id = i.index_id AND ic.object_id = i.object_id
                                  JOIN sys.columns AS c ON c.column_id = ic.column_id AND c.object_id = ic.object_id
                                  JOIN CTE_Indexes cte ON cte.column_index_id + 1 = ic.index_column_id 
                                  --JOIN CTE_Indexes cte ON cte.ColumnID + 1 = ic.index_column_id 
                                                AND cte.IndexID = i.index_id AND cte.ObjectID = ic.object_id

)
INSERT INTO  @TempTable
SELECT *, RANK() OVER (PARTITION BY ObjectID, IndexID ORDER BY NumberOfColumns DESC) AS LastRecord FROM CTE_Indexes AS cte;

SELECT SchemaName, TableName, IndexName, ColumnNames, IncludeColumns, IndexType FROM @TempTable
WHERE LastColRecord = 1
ORDER BY ObjectID, TableName, IndexID, IndexName
--ORDER BY IncludeColumns desc
--ORDER BY IncludeColumns desc




Friday, 25 August 2017

Index Organization

Indexing

Creating a clustered index on the table means, sorting rows on the give clustering key and placing in order. These rows are placed as a list in the pages. As no. of rows increases, we require more no. of pages. for eg, a table with 10k rows may be distributed over 5 pages. even though they are distributed in multiple pages, they logically represent a table. As no. of rows in a table (group of pages forming a logical table) increases say 1 million rows, it will be time taking factor to query from this huge list. In order to reduce time in selecting from this huge list, we further divide into partitions. Now, these 1 million row data is divided across multiple partitions say 10 partitions and each partition containing 100K rows. 
In short, creating a clustered index in ordering rows in a table. Pages containing rows are indexed using a balanced tree structure. traversing between pages is done using doubly linked list implemented on the pages.

Creating a non-clustered index on a column, takes row locations of all rows and sorts them forming a b-tree structure. When we traverse through b-tree doesn't point to actual rows, instead contains either pointer to row or clustered index key of the row.

Reference:


Questions to get answers for:

  1. how non-numeric columns are sorted ?

Sunday, 30 July 2017

MS SQL Server monitoring using DMV's

Reference Links: Column Ordering

Agenda of this presentaion is to provide an overview of, What dynamic management views are, kinds of data they can contain, types of problems they can solve

Slide 1 ;What are DMV's and introduction

DMV's are Views on SQL Server mata data:
As queries run on SQL Server database, SQL Server automatically records information about the activity that is taking place; into structures in memory. we can access this information via DMV's. In a nutshell, DMV's are views on SQL Server metadata which provides us window into activities being performed on sql server instances, and the resources that these activities are consuming. In other words, dmv's expose valuable information concerning connections, sessions, transactions, SQL Statements, resoure usage etc.

History:
In releases prior to MS SQL Server 2005, meta data is stored and referenced in physical tables of master database. later to that, microsoft introduced an abstraction layer between this meta data and user by moving this tables to another database mssqlsystemresource which are inaccessible to the end user and even dba. In their place microsoft created a set of compatability views based on this hidden resource tables which are identical to the original system tables. there is only change in the schema name. at the same time microsoft created catalog views and DMO's (DMV's and DMF's). with introduction of catalog views and dmo's microsoft took opportunnity of simplifiying naming conventions and normalising them. 
for example, uid in comapatablity views is represented as schema_id in catalog views and dmo's. similary, spid is represented as session_id, indid as index_id.

Aggregate Results:
The data shown via DMV's is cummulative since the last SQL Server reboot or restart. If you are interested only in the actions of the given query run. you can determine the effect of query by taking snapshot of the DMV data, run your query, and then take another snapshot. getting the delta between 2 snapshots will provide you with the details of the effect of the query.

Impact of running DMV's:
Typically, when you query the DMVs to extract important diagnostic information, this has minimal effect on the server and its resources. This is because the data is in memory and already calculated;you just need to retrieve it. To further reduce the impact we can use prefixed statements to ignore locks during DMV execution. There are some cases, where this DMV data is not readily available and need to be calculated. for example in case of sys.dm_index_physical_stats. In summary, compared to other methods of obtaining this information like sql profiler DMV's has little impact on the sql server.
There are a total of 233 dmv's in 2016 edition
SELECT * FROM sys.system_objects WHERE name LIKE 'dm_%'

consider an example of query execution in the sql server,
When a query is run in sql server, an immense information is recorded in the sql server which includes,
1. The queries cached plan. (which describes at a low level how the query is executed)
2. What indexes are used
3. what indexes the query would like to use but can't use, because they're missing indexes.
4. How much i/o occured (both physical and logical)
5. How much time spent on executing the query.
6. how much time was spent waiting on other resources
7. what resources the query is waiting on
Being able to retrive and analyse this inforamtion will allow us to understand how our query works.

The major DMV groups which will help to troubleshoot Performance problems are :
1. Index
2. Execution
3. SQL Server Operating System
4. Transaction
5. Input/Output
6. Database
In addition to this, are 10 more major DMV groups like Security, Extended events etc.

Slide : Connections, Sessions, Requests

Logical hierarchy of user interacting with sql server is broken as 
the 'connections' that are accessing our database instances and who/what owns them
the 'sessions' that are spawned inside this connections.
the 'requests' that are executed in this sessions. This request can consist of single sql query, or a batch or a remote procedure call.

As this hierarchy suggests, there is more with the user interactions, than just a query execution... an entity, be it an application or a user using sql server management studio must first make a connection to the sql server . consequently, a session or multiple sessions can be established in the context of that user connection. only then user submits request and fetches the information. 

In this, we will examine how users are connected to the SQL Server instance, how many sessions each connections has and the attributes of these sessions, which requests are running on the server, and what sql they are running. we will gather some raw performance data in terms of the CPU, I/O and Memory to show which currently executing queries are most expenive. All of this is accomplished by following dmv's which belong to "execution" related category, therefore, all start with sys.dm_exec_. sys.dm_exec_connections returns server scoped information which is more of "network centric". sys_dm_exec_sessions also returns server scoped information regarding session ownership, state of the session, and the work they are performing.

Slide : sys_dm_exec_connections

sys_dm_exec_connections DMV uses the following 3 columns to uniquely identify each connection and facilitate the joins with other dmv's.
connection_id    : uniquely identifies each connection at the instance level. we use this column when joining to requests dmv.
session_id        : identifies sessions assosiated with the connection and is used to join with executions dmv. This is foreign key referencing sessions dmv 
most_recent_session_id  : the session_id for most recent request associated with the connection.
This DMV also gives detailed information regarding physical connections to SQL Server, including protocols and security used, network, packet information.
encrypt_option : boolean column that specifies whether encryption is used or not.
auth_scheme : authentication scheme used whether windows(NTLM, KERBEROS, DIGEST, BASIC, NEGOTIATE) or SQL.
num_reads : no. of packet reads happened across this connection
num_writes : no. of packet writes happened across this connection
last_read_time : timestamp of occurence of last read over this connection
last_write_time : timestamp of occurence of last write over this connection
client_ip_address : ip address of the client connected to the sql instance

Slide : sys.dm_exec_sessions

returns a row for every session, one of the interesting point to note is values in the sessions dmv are updated only after all its session requests are completed. but, this is not same in the case of requests dmv, they provide real time view. unique identifier is provided for each session through session_id column. based on this session id, dmv can reveal information such as client program that initiated it through column (program_name), status of the session (status) whether sleeping or running, login under which session is executing (login_name), login under which session is initiated (original_login_name). This also provides no. of columns that reveal information related to workload and activity.
total_elapsed_time : time in milliseconds since the session has been initiated
last_request_start_time : start time for the most recent request to execute in this session
last_request_end_time : end time for the most recent request to execute in this session
cpu_time : amout of cpu time in milliseconds used by all of the requests related to this session
memory_usage : no. of 8kB pages memory used by all the requests in the session
total_scheduled_time : total time in milli seconds that requests associated with this session are scheduled for execution
logical_reads : no. of reads from data cache performed by all the requests associated with this session
reads : total no. of reads from disk performed by all the requests in the session
writes : total no. of writes to disk performed by all the requests in the session

Slide : questions

using above 2 dmv's now, we will be able to answer following questions
** Read Slide **
Slide : who is connected ?
using this dmv, we will able to know which ones are accessing sql server instance, when and how often. This query gives ip of the machine connected, name of the program, no. of open sessions for each connection. This helps us to know which middle tier servers are accessing database and how many sessions each one of them are maintaining.
From this results, we would review the possibility of issue relating to Helpsofton application if i know it doesn't require such activity.

Slide : who is connected using Sql Server Management Studio ?
Closely related to previous script, focuses on those who are directly connected to SQL Server Instance.Notice the use of CROSS apply operator to join the result set with the function. this function is executed once per each row from the result set of the outer row.

Slide : Session-level settings
This query gives a dump of some usefull columns related to the session. this is like one stop source for determining the sessions settings for current connection. you could easily supply a session_id in place of @@SPID if diagnosing a performance issue on an identified session_id. this not only determines how requests, tranasactions are handled but also how current session is interacting with the other sessions. i.e. the case when sessions with elavated priority makes other sessions as deadlock victims and rollback them during deadlock situation.

INFO
Slide : are there any logins with more than one session ?
It's especially useful for seeking out those logins that own more than a single session. This can also be useful if different applications use same database and each they respectively use application logins. 

Slide : Identify sessions with context switching 
Context switching is the act of executing T-SQL code under the role of another user connection, in order to utilize their credentials and level of rights. This dmv is helpful to identify such sessions.

Slide :  Identify inactive sessions
This dmv gives no. of sessions that have associated open transactions and have no requests running from last five days from the last_request_end_time. i have included workload activity columns in order to determine whether application may hold session from long time. if activity workload of the session is high as indicated by cpu_time, total_elapsed_time then there may be the case where appilcation keeps a session open permenantly and which may be acceptable.
Use Case : 
From these results, we can assume that, there might be an event happened on july 28th 2009 that interruppted connectivity to the server and as a result of which left many orphaned sessions. suppose, we had found a reason for this let it be server downtime due to monthly server patching. This orphaned transactions should have been rolled back after server start. 

Slide : Identify idle sessions with orphened transactions
This query inner joins session and transaction dmv's to get all sessions with open transactions and left joins requests table and returns open transactions that have no requests running.

Slide : sys.dm_exec_requests
sys.dm_exec_requests that returns one row for every request currently executing and provides a range of execution statistics such as elapsed time, cpu time, wait time so on.
sys.dm_exec_sql_text DMF returns sql text for a given sql handle may be for a batch or remote procedure.

request_id is the column which uniquely identifies a request within a session. however, this value is rarely used. session_id identifier of the session in which the request is run. connection_id is identifier of the connection associated with the request. transaction_id is the identifier of the transaction associated with the request.

Requests DMV's exposes columns that describe the blocking state and wait status of the active requests. status column of this DMV reveals the status of a given request within a session. if the request is currently executing, its status is running. if it is in runnable queue which simple means it is in the queue to get on the processor, its status is runnable. this is referred to as signal wait. if it is waiting for another resource, for example a locked page, to become available in order to proceed, or if a running request needs to perform I/O then it is moved to the waiter list, this is a resource wait. and waiting request status is recorded as suspended.
This columns reveals the nature of the wait.
blocking_session_id : this gives the id session that is blocking and NULL if there is no bloking exists. apart from the session id and NULL values, there are 3 other possible values, they are -2 or -3 or -4. 
where -2 means block is owned by an orphaned transaction. One example when this can happen is in the case, where MSDTC doesn't reconginse the data source during active transaction but still co-ordinator is waiting for acknowledgement. In this case, we have to kill the process using request_owner_guid extracted from sys.dm_tran_locks dmv.
reference : http://www.sqlservercentral.com/blogs/sqltechnet/2012/08/06/blocking-in-sql-server-by-spid-2-orphan-distributed-transactions/
where -3 means the block was owned by a deferred recovery transaction. 
where -4 means the session_id of the blocking latch owner could not be identified. 
wait_type: identifies the wait type for a request that is currently being waiting for a request.
wait_time: amount of time the request has been waiting for, this is the cummulative time to be waiting on.
wait_resource: the last resource the session waited on.
There are also columns that report the work being performed and resources used
percent_complete: completion status of the operation
cpu_time: total amount of processing time spent on the request in ms
row_count: no. of rows processed by the request
granted_query_memory: no. of 8KB sized allocated for execution of the request
reads: total physical reads happened for this request
writes: total physical writes happened for this request
logical_reads: no. of reads from the data cache for this request.
sys.dm_exec_sql_text: function returns sql statements for a given sql handle. 

Slide: Isolating executing statement within sql handle
       Generally, sql_handle column presents us with the complete T-SQL batch submitted to the query engine. This batch can contain hundreds of single sql statements. To isolate between each of the statements we use statement_start_offset and statement_end_offset columns which represent the no. of characters in the currently active sql statement. If the value returned in the statement_start_offset is 0 then, we can consider that active statement is at start of the batch script and if statement_end_offset is -1, then ending character of the batch is the end of the full batch script. There are can be 4 possible combinations of this 2 columns. They are 
       statement_start_offset = 0 and statement_end_offset = -1 : This mean full query text returned by sql_handle is currently being executed. 
       statement_start_offset = 0 and statement_end_offset <> -1/n : This mean active statement is the first statement in the full query text returned by sys.dm_exec_sql_text dmv and end of the active statement occurs after 'n' characters from the start of the statement. where n is value present in statement_end_offset column.
       statement_start_offset <> 0/m and statement_end_offset = -1 : This mean active statement starts after m characters from the starting of the sql text rerurned by sql_text function. end of the active sql statement is the end of the full sql text.
       statement_start_offset <> 0/m and statement_end_offset <> -1/n : This mean active statement starts after m characters and ends after n chareacters from the full sql text.
The values in these columns are unicode, with each character represented as 2 bytes. So, In order to parse the string we use SUBSTRING function passing in sql text and starting offset/2 and no. of characters to retrieved (i.e. is extracted from subtracting the ending offset from starting offset and dividing it by 2). 

Slide:
This query gives the executing statement from the given sql batch request. In this example, i have ran simple query that performes insert operations. on executing DMV in another tab, the result set is as shown below: The is shows that the active statement for the batch is at the begining of the sql and associated end offset tells us that the active statement is only 25 characters long. This corresponds to WAITFOR DELAY '00:01'; statement.

Slide : Investigating the workdone by the requests
using this dmv's we can isolate the statements that are consuming resources instead for whole batch. by adapting this script we can get CPU Usage, no. of pages allocated to the request in memory, amount of time spent in waiting, no. of physical reads and current execution time. we can uncomment this to get required results.

Slide: who is running what, right now ?
       above three dmv's provides deep insight into user activity from connections, to sessions spawned in connections, to requests that are used to submit queries for transaction processing. This query breaks down the disk IO information into reads and writes. finally, also includes wait metrics and isolation levels.to provide insight into how current session is interacting with other sessions.

Slide: Query Plan metadata
       The DMV's described here will provide with the query execution plan and statistics which is previously executed, that is the query which is in plan cache. DMV's not only provide statistics of SQL requests but also regarding execution of stored procedure associated with the batch. Statistics like amount of time spent, CPU time, I/O activities etc are provided. we will see how this query plan and statistics help to answer following questions:

Which are top x most expensive queries in cache in terms of CPU/IO/Memory?
what are most expensive stored procedures ?
are plans getting reused ?
how many ad-hoc or single use plan queries are taking place in plan cache ?
We use following DMV's for this :
sys.dm_exec_query_stats : returns aggregate performance statistics for a cached query plan.
sys.dm_exec_procedure_stats : with this we can find how man queries have been optimized since last server restart
In order to get query plan for a given batch and get some interesting attributes, we can pass that plan handle to the to one of the DMF's below:
sys.dm_exec_query_plan, sys.dm_exec_text_query_plan, sys.dm_exec_plan_attributes 

INFO: The query_plan DMV returns plan for entire batch, where batch may contain more than one query. qeury_stats DMV returns statistictics for individual statements. If we join query_stats and query_plan we get statistics for entire batch. Instead, If we join query_stats and text_query_plan , its posible to extract from the batch the plan for only the indvidual query.

Slide : Query Plan Metadata
One thing to notice with this DMV is, we can only examine plans for queries that are in cache and this cache will hold plans for all the most active costly queries where less costly and reused can fall out of the cache and some queries with nominal plans are never cached. In short some queries will be missed. secondly, plans may remain in the cache from when they are first complied until the object is dropped or recomplied or the cache is flushed.  this means some plans especially frequently used may remain in cache for a long time. if your SQL Server is up and running for a long period of time, it will contain lots of plans of different ages which will make it hard to analyze the statistics. for example if you are looking for most CPU-Intensive procedures, the results may be misleading towards those procedures, that have been in cache for long period of time compared to the plans that actually are. One way around this might be to clear the cache, using DBCC FREEPROCCACHE for all database on an instance, or using DBCC FLUSHPROCINDB for a single database, as shown below: This will make statistice much easier to interpret.

Slide: Viewing text of cached query plan
       The process of retrieving the query plan is similar to extracting the sql text we have done before, it returns same columns as in case if sql_text dmv the only difference is, that dmv returns sql text, this dmv returns sql plan. We simply extract plan_handle from sys.dm_exec_query_stats DMV and give it as parameter to following functions(sys.dm_exec_query_plan, sys.dm_exec_text_query_plan) to retun query plan. There are few differences between this 2 functions. They are 
1. the first difference is, sys.dm_exec_query_plan returns plan in xml format which can be saved as .SQLPLAN format and can be viewd in SSMS. while sys.dm_exec_text_query_plan as name suggests, returns plan in text format.
2. the second difference is, since sys.dm_exec_query_plan returns in xml format, there is limitation in the xml level, since XML datatype doesn't support more than 128 levels. There is no such limitation in the sys.dm_exec_text_query_plan function. 
3. the third difference is, xml format plan returned by the function wouldn't be in readable format, if not opened using an appropriate tool.
4. the fourth difference is, sys.dm_exec_query_plan takes only plan handle as input while sys.dm_exec_text_query_plan takes 2 more parameters as input i.e. statement_start_offset and statement_end_offset which marks the start and end points for the individual statements in given sql batch. 
and one more thing to note is sys.dm_exec_query_stats is not only one that provide plan_handle, we can take it from either sys.dm_exec_requests or sys.dm_exec_query_memory_grants or sys_dm_exec_cached_plan dmv's.

Slide : Returning Query Plan 


Slide : Isolating SQL statements from the cached batch/stored procedure’s
Prviously, we saw how to isolate single SQL statement from a batch or stored procedure with help of offset columns present in the requests dmv. We follow similar procedure here to isolate the SQL statements from a cached batch or stored procedure. We pass the plan_handle to the sql_text DMF. which returns SQL text associated with the plan for that batch. We extract the text for the individual statements using the offset functions.

Slide: Isolating the SQL statement plans from the cached batch / stored procedures
       Similarly, as we got the single sql statement, we can also get corresponding sub plan for each indivisual query. This can be achieved using sys.dm_exec_text_query_plan DMF, which returns the plan in text form. as shown, we get individual plan for each query.

Slide: sys.dm_exec_cached_plans
       size of plan in bytes, type of object attached to, use counts of the plan, reference count of the plan i.e. no. of cache objects that reference cache plans. sys.dm_exec_cached_plans returns snapshot of the data based on the current content. the columns are useful to know how the plans are being cached. 
cacheobjtype: compiled plan, parse tree (is bare bones plan for an object such as view). it specifies objects referenced by view but not full execution plan views are compiled into query at execution and do not have plans of thier own), CLR compiled function, CLR compiled procedure, Extended procedure
objtype:      proc, prepared statement, adhoc query, view

Slide: Invistigating plan reuse: 
If the text of the query submitted matches with text of the query in the cache, then plan of the cached query may be used. not only the text of the query, there are various parameters to be considered while reusing the query plan. If we use stored procedure as primary source of interface to the data server, then we get high probability of the plan reuse. a database setting that can be helpful to plan reuse is by setting forces parametrization. we need to be careful while interpreting the plan reuse. after restart, cache flush, procedure creation plan use counts will be low. 
plan reuse distribution curve: This query allows us to view the usecase distribution. So, for example, we have five plans that have been reused over 10,000 times, 359 plans that have been reused between 10 and 100 times, 1,988 plans that have only been used once, and so on. Clearly, this is only an overview, and just to give you an overall feel for plan reuse.

Slide Examining frequently reused plan 
Using sys.dm_exec_cached_plans in conjunction with sys.dm_exec_query_plan and sys.dm_exec_sql_text, we can construct a query to return the text of the plan, plus the text of the query that is associated with the plan and has the highest use counts. 
counts.

Slide Examining the ad-hoc single-use plans 
      This script provides text and size of single use ad-hoc queries that can cause unuasual bloating memory. This basically identifies the ad-hoc queries that has usecount of 1.

Slide: Gathering Query Execution statistics
      upto now, we have seen how extract the query and query plan from cached plans. since one plan is stored for batch or procedure. In this we are going to use sys.dm_exe_query_stats dmv that contains one row for each SQL statement and provides us columns like total_woker_time(CPU), total_physical_reads, total_logical_reads for the statement that gives very usefull resource utilization information that queries are eating up and no. of times the query is been executed. the lifetime of this rows is tied to the plans in the cached plans dmv. as soon as we flush cache, its corresponding statistics rows are also removed from this view. so, overall this provides us information about the resource hungry queries whose plans are cached. 

Slide: Overview of sys.dm_exec_query_stats
       the query and plan in the batch is uniquely identified using sql_handle, plan_handle columns. this contained offset columns to determine the line number of the sql statement in the sql batch or procedure. In addition to this also provides activity and workload information through worker time, physical reads, logical reads, elapsed time columns

Slide: 
Putting sys.dm_exec_query_stats to work
       using this prevous dmv, we will only get the amount of time cpu spent in executing. There may be the case where one query being executed millions time and other thousand times. In that case, it would be better if we consider even execution count. This DMV provides which query is spent most time in execution by also considering the execution count. 
  
Slide: Investigating expensive cached stored procedures
  This view returns aggregate statistics of the stored procedures present in cache. returns one row for each stored procedure in cache and this row in removed as soon as plan is flushed from the cache. Depending on what columns we include in this view, returns more expensive stored procedures. In this case, i have placed total_logaical_reads, this might be useful in case, if there is memory pressure in the system, which can be noticed from SQL Server perfmon counters like persistently low Avg. Page life expectancy and persistently values above zero for memory grants pending counters. simply, by selecting total_physical_reads column we can perform similar analysis from the perspective of disk I/O pressure. 
  
  INFO : There are some cases to be considered that skews up the result set. This can be notified using the column cached_time. If there are multiple entried to this column for a single stored procedure. In order to eliminate this, one of the contraversial solution to be applied is to clear the plan cached before performing the tests using DBCC FREEPROCCACHE with SQL agent. Second thing to be considered is only cached stored procedures will be shown in this table. If we use, RECOMPILE or OPTION(RECOMPILE), then those plans are not cached. 
  
INFO : Slide: Getting aggregate query optimization statistics
  SELECT COUNTER ,
     OCCURRENCE ,
 VALUE
  FROM SYS.DM_EXEC_QUERY_OPTIMIZER_INFO
  WHERE COUNTER IN ( 'optimizations', 'elapsed time', 'final cost' ) ;
  

Transaction

A Sequence of operations performed as a single logical unit of work is called transaction. This logical unit of work must exhibit four logical properties ACID. 
For a transaction to qualify acid test, should either complete or rollback all the data modifications(Atomic). The end result should be such that all the data and supporting structures should be consistent(Consistent). One transaction should not be impacted by other transaction that is concurrently running(Isolated). The end result should be persistent and permanently recorded in RDBMS(Durabile). 

Every statement executed in SQL Server is transactional. If single statement is run on the
SQL Server instance an implicit transaction is run under covers, which auto starts and
auto completes. In Explicit transactions, we group SQL statements between BEGIN TRAN
and COMMIT/ROLLABCK TRAN commands.

SQL Server ensures that each statement may be in implicit transaction or multiple statements
in explicit transaction satisfy the ACID test characteristics.

Using DMV's we will observe, how this these units of works. Lifespan of these transactions will
be in milliseconds. Our focus will be on those transactions, that has difficulty in completing in
time may be due to some resource contention or poor tuning or other issues.

Investigating Locking and Blocking

Locking is integral part of the RDBMS that allow data consistency in the system.

Isolation Levels

READ COMMITTED mode transaction lock, As soon as SQL statement holding READ COMMITTED MODE isolation completes, other transaction is allowed to perform data modifications on the rows. This may cause non-repeatable reads i.e. if access same data again in the same transaction, the result may be different. In order avoid this non-repeatable reads, we can enable REPEATABLE READ isolation level, this isolation doesn't allow to perform actions on the rows that are accessed in the transaction. But, there is also some chance of non-repeatable reads in this case. There may be extra rows added into the table during this transaction by other tables, these rows are called phantom rows. This can also be handled by going further in terms of the isolation levels, i.e. by enabling the SEREALIZABLE LOCKs, which is most restrictive isolation level i.e. transaction is completely unaffected by other transaction. 

As requirement of data consistency increases level isolation increases at the same cost. This results in transaction locks and sometimes may cause dead locks i.e. transaction A is waiting for the resource held by transaction B which is waiting for the resource held by transaction A.

This locking and the blocking issues can be investigated in 2 ways:

  1. Using Activity Monitor in SSMS.(Ctrl+Alt+A)
  2. Using sys.dm_exec_trans DMV.
Activity Monitor gives graphical representation of the system processes(sysprocesses) and it gives the information about the blocked and blocking sessions. However, this gives pretty less information about blocking. From sys.dm_exec_trans dmv provides all the information needed for the debugging the blocking issue in a single result set.

Overview of sys.dm_exec_trans dmv

This DMV gives insight into current state of locking in entire SQL Server instance. It returns a row for every currently active request to Lock manager. This also returns details about the request owner and resource on which lock being held. 
The resource related column identifies the resource being locked, type of the resource, database to which it belongs to. following are the columns in the dmv:
  • Resource Columns 
    • resource_type: Table (Object Locks), Page (Page Locks), Row (RID Locks), Key (Key Locks)
    • resource_database_id: ID of the database on which locked resource resides.
      • resource_associated_entity_id: depending on the resource type, this returns:
      • object_id of the target object if the resource_type is object
      • object-id of the parent object if the resource_type is Page, Key or ROW
The remaining columns returns the information about the status, nature and the owner of the lock request.
  • Request Columns
    • request_mode
      • Shared (S)
      • Update (U)
      • Exclusive (E)
      • Intent Exclusive (IE)
    • request_status
      • GRANT - indicates lock has been taken
      • CONVERT -  lock is in the process of fullfilled
      • WAIT - resource is not locked, but trying to lock
    • request_owner_type
      • TRANSACTION
      • CURSOR
      • SESSION
      • SHARED_TRANSACTION_WORKSPACE
      • EXCLUSIVE_TRANSCTION_WORKSPACE
  • Lock Address
    • lock_owner_address
      • is binary address to internally track the lock. This column is joined with the sys.dm_os_waiting_tasks dmv to track the tasks that waiting for the locks to happen.
In order to interpret properly data exposed by this DMV correctly, we need to review little more detail on lock types (resource_type) and lock modes (request_mode).

Lock Types

SQL Server can lock number of different type resources, the most obvious being table, page, row, key in order of increasing granularity. The locks are granted and released in this form in order to satisfy the isolation levels between the sessions. The lower the lock level, higher will be the concurrent user support. but, this causes high memory overhead since, have to maintain many locks. SQL Server chooses automatically highest possible granular lock. however, if there are too many row locks in same table which leads to memory pressure may escalate the lock to Table lock. This will result in low overhead on the SQL Server, the cost being low concurrency. If process running are causing lock escalations, it is worth looking in for the reason of escalation.
In addition to above tables, pages, keys, rows lock can also be acquired on following resources:

  • Key Ranges: only in case of SERIALIZABLE isolation level occurs where a range keys to be locked in order to avoid insertion in the range that is scanned.
  • Extents: Tables and indexes grow, results in growth of extents.
  • Databases: Process will acquire shared locks in databases while accessing it.
  • Allocation Units: Locked when de-allocating the resources.
  • Metadata: occur when transaction is trying to change definition of the object.
Lock Modes
  • Shared (S): Shared locks are issued for read-only operations, such as SELECT statements. They prevent other transactions from updating the same data while the query is processing. Multiple shared locks can be issued for the same resource, hence the term "shared."
  • Update (U): A U lock is obtained while data is located and this is converted to X lock before updating.
  • Exclusive (X): An X lock is used for data modification requests (INSERT, UPDATE, DELETE). This prevents any other transaction from reading and updating data.
  • Intent (IX, IU, IS): As the name suggests, it indicates the intention of the lock on the lower hierarchy level. for example, just before taking shared lock on page or row, intent shared lock is placed on the table which will prevent another transaction from obtaining exclusive lock on the table.All intent locks are placed only either on tables or pages.
  • Conversion: This locks are result of conversion from one locking mode to another. 
    • SIX (shared intent exclusive)
    • SIU (shared intent update)
    • UIX (update intent exclusive)
  • Schema: Schema locks are placed when operation effecting schema are done.
    • Sch-S: Scheme stability locks are obtained when compiling query in the server.
    • Sch-M:Scheme Modification locks are granted when underlying database scheme is being modified by the DDL.

Investigating Locking

A Single query may cause locks to be granted in several resources. 
I will run following query against  adventure works database to place update locks.

-- An uncommitted update of the Production table in AdventureWorks
BEGIN TRANSACTION
UPDATE [Production].[ProductCategory]
SET [Name] = 'Parts'
WHERE [Name] = 'Components';
--ROLLBACK TRANSACTION   Lock Types :

DMV used to find the lock resource type and mode is:

-- Locking due to single UPDATE statement against a user table in SQL Server
SELECT [resource_type] ,
 DB_NAME([resource_database_id]) AS [Database Name] ,
 CASE WHEN DTL.resource_type IN ( 'DATABASE', 'FILE', 'METADATA' )
   THEN DTL.resource_type
   WHEN DTL.resource_type = 'OBJECT'
   THEN OBJECT_NAME(DTL.resource_associated_entity_id,
    DTL.[resource_database_id])
  WHEN DTL.resource_type IN ( 'KEY', 'PAGE', 'RID' )
   THEN ( SELECT OBJECT_NAME([object_id])
       FROM sys.partitions
       WHERE sys.partitions.hobt_id =
      DTL.resource_associated_entity_id
     )
  ELSE 'Unidentified'
 END AS requested_object_name ,
 [request_mode] ,
 [resource_description]
FROM sys.dm_tran_locks DTL
WHERE DTL.[resource_type] <> 'DATABASE' ;

This query is actually straight forward, but looks intimidating due to number of CASE statement which is used to return locked object name for a give resource type.

  • For lock on DATABASE, FILE or METADATA lock type is simply returned. 
  • For object locks parent lock is directly obtained from the associated object id column. 
  • For PAGE, RID or KEY locks we need to lockup to the object_id column in the by joining hobt_id (heap or binary tree id) in the sys.partitions table to the resource_associated_entity_id column in the sys.dm_tran-locks DMV.
Each of the session creates shared lock on the database, which is filtered out using the where clause 

The output of query will be:

As seen in the result set, 

  • there is intent exclusive lock on the object (product category table). 
  • there are also IX locks on two pages, in the resource description, we can see that 1:759 which mean page is in 1st partition and 759th page. similar, in the case of other page.
  • there are X locks on KEY and the resource description column gives hashes of the key.


Now, i will run following query to create blocking in the database:

-- A simple query against the ProductCategory table, which will be blocked.
SELECT *
FROM [Production].[ProductCategory] ;

By running following query, will get blocked session details and SQL text:
-- Which sessions are causing blocking and what statement are they running?

SELECT DTL.[request_session_id] AS [session_id] ,
 DB_NAME(DTL.[resource_database_id]) AS [Database] ,
 DTL.resource_type ,
 CASE WHEN DTL.resource_type IN ( 'DATABASE', 'FILE', 'METADATA' )
    THEN DTL.resource_type
   WHEN DTL.resource_type = 'OBJECT'
    THEN OBJECT_NAME(DTL.resource_associated_entity_id,
    DTL.[resource_database_id])
   WHEN DTL.resource_type IN ( 'KEY', 'PAGE', 'RID' )
   THEN ( SELECT OBJECT_NAME([object_id])
       FROM sys.partitions
       WHERE sys.partitions.hobt_id =
      DTL.resource_associated_entity_id
     )
  ELSE 'Unidentified'
 END AS [Parent Object] ,
 DTL.request_mode AS [Lock Type] ,
 DTL.request_status AS [Request Status] ,
 DER.[blocking_session_id] ,
 DES.[login_name] ,
 CASE DTL.request_lifetime
  WHEN 0 THEN DEST_R.TEXT
  ELSE DEST_C.TEXT
 END AS [Statement]
FROM sys.dm_tran_locks DTL
LEFT JOIN sys.[dm_exec_requests] DER
 ON DTL.[request_session_id] = DER.[session_id]
INNER JOIN sys.dm_exec_sessions DES
 ON DTL.request_session_id = DES.[session_id]
INNER JOIN sys.dm_exec_connections DEC
 ON DTL.[request_session_id] = DEC.[most_recent_session_id]
OUTER APPLY sys.dm_exec_sql_text(DEC.[most_recent_sql_handle]) AS DEST_C
OUTER APPLY sys.dm_exec_sql_text(DER.sql_handle) AS DEST_R
WHERE DTL.[resource_database_id] = DB_ID()
 AND DTL.[resource_type] NOT IN ( 'DATABASE', 'METADATA' )
ORDER BY DTL.[request_session_id] ;

I have joined the transaction DMV to requests DMV to extract the blocking session id. Have used LEFT JOIN because requests no longer exists in the requests DMV immediately after request execution. Using inner join might avoid the corresponding transactions in the transactions DMV to escape. This is also the part of reason that why we use two outer apply clauses for extracting the SQL text of the request.  sql_handle from sys.dm_exec_sessions and most_recent_sql_handle from sys.dm_exec_connections. Since, request no longer exists in the requests dmv after execution, sql_handle is supplied by sys.dm_exec_connections dmv. conversely, sys.dm_exec_connections is only updated after all its associated requests are completed. The blocked query is still live which doesn't update in the sys.dm_exec_connections dmv, Therefore, we can't rely on the sys.dm_exec_connections DMV alone. The condition we have used, how to call sys.dm_exec_sql_text function is request_lifetime column from the sys.dm_trans_lock DMV. If this request_lifetime value is zero, the request is still active request and it corresponding sql_handle can be extracted from sys.dm_exec_sessions DMV else, if this request_lifetime is non-zero then sql_handle can be extracted from the sys.dm_exec_connections DMV.

Blocking Analysis using sys.dm_tran_locks and sys.dm_os_waiting_tasks

The most common reasons of causing blocking can be:
Poorly written transactions:  these include transactions containing unnecessarily high number of statements, and statements that would process large amount of data due to lack of proper where conditions and proper filter predicates. The best query is that passes through data as low as possible and returns as few rows as is necessary and only returns columns that are requires by the end user.
Poorly designed databases: absence of required indexes, foreign keys, inadequate clustering keys, poorly chosen data types. 
Poorly maintained databases: fragmented indexes, outdated statistics may result in sub-optimal plan executions, which may lead to happen table or index scans that can be seeks
Poorly written applications: Applications exclusively using SELECT * statements, submitting ad-hoc queries and not using the stored procedures.

In order to analyze the blocking issue, we need minimum information like, transactions that are blocked, their SQL text, users who own these sessions.
In order to achieve this, we can start our investigation from sys.dm_os_waiting_tasks DMV, where we get resource_address to join back to the sys.dm_tran_locks DMV on lock_owner_address.

-- Investigating locking and blocking based on waiting tasks

USE [AdventureWorks] ;
GO
SELECT DTL.[resource_type] AS [resource type] ,
 CASE WHEN DTL.[resource_type] IN ( 'DATABASE', 'FILE', 'METADATA' )
   THEN DTL.[resource_type]
   WHEN DTL.[resource_type] = 'OBJECT'
   THEN OBJECT_NAME(DTL.resource_associated_entity_id)
   WHEN DTL.[resource_type] IN ( 'KEY', 'PAGE', 'RID' )
   THEN ( SELECT OBJECT_NAME([object_id])
       FROM sys.partitions
       WHERE sys.partitions.[hobt_id] =
      DTL.[resource_associated_entity_id]
     )
   ELSE 'Unidentified'
  END AS [Parent Object] ,
  DTL.[request_mode] AS [Lock Type] ,
  DTL.[request_status] AS [Request Status] ,
  DOWT.[wait_duration_ms] AS [wait duration ms] ,
  DOWT.[wait_type] AS [wait type] ,
  DOWT.[session_id] AS [blocked session id] ,
  DES_blocked.[login_name] AS [blocked_user] ,
  SUBSTRING(dest_blocked.text, der.statement_start_offset / 2,
   ( CASE WHEN der.statement_end_offset = -1
      THEN DATALENGTH(dest_blocked.text)
       ELSE der.statement_end_offset
       END - der.statement_start_offset ) / 2
  AS [blocked_command] ,
   DOWT.[blocking_session_id] AS [blocking session id] ,
   DES_blocking.[login_name] AS [blocking user] ,
   DEST_blocking.[text] AS [blocking command] ,
   DOWT.resource_description AS [blocking resource detail]
FROM sys.dm_tran_locks DTL
INNER JOIN sys.dm_os_waiting_tasks DOWT
 ON DTL.lock_owner_address = DOWT.resource_address
INNER JOIN sys.[dm_exec_requests] DER
 ON DOWT.[session_id] = DER.[session_id]
INNER JOIN sys.dm_exec_sessions DES_blocked
 ON DOWT.[session_id] = DES_Blocked.[session_id]
INNER JOIN sys.dm_exec_sessions DES_blocking
 ON DOWT.[blocking_session_id] = DES_Blocking.[session_id]
INNER JOIN sys.dm_exec_connections DEC
 ON DTL.[request_session_id] = DEC.[most_recent_session_id]
CROSS APPLY sys.dm_exec_sql_text(DEC.[most_recent_sql_handle])
 AS DEST_Blocking
CROSS APPLY sys.dm_exec_sql_text(DER.sql_handle) AS DEST_Blocked
WHERE DTL.[resource_database_id] = DB_ID()

To see in action, we will setup some activity on Production.Culture table.
-- An uncommitted UPDATE transaction on the Production.Culture table.

BEGIN TRANSACTION
UPDATE Production.Culture
SET Name = 'English-British'
WHERE Name = 'English' ;
--ROLLBACK TRANSACTION
In separate session, run following query to access data

-- A blocked query against the Production.Culture table

SELECT ModifiedDate
FROM Production.Culture
WHERE Name = 'English' ;
Finally, in third session, perform insert operation
-- An INSERT against the Production.Culture table.

INSERT INTO Production.Culture
( CultureID, Name )
VALUES ( 'jp', 'Japanese' ) ;
SELECT * FROM Production.Culture ;
Having executed all three queries, run the DMO script in Listing 4.6. What we expect to
see is that the UPDATE query blocks both subsequent SELECT queries from the other
sessions, but not the INSERT, as confirmed by the results shown in Figure








Indexing Strategy and Maintainance

Well designed SQL will touch data in base tables as few times as possible. However, regardless of how intelligently SQL is designed, you will still read more data than is necessary, and perform poorly, unless you also make proper use of index. using index category DMV's, we will be able to answer following questions: 

  1. Are there any indexes that are no longer in use, or have never been used?(index_usage_stats)?
  2. For indexes that are in use, what is the usage pattern? (index_operational_stats)?
  3. Which indexes are missing? (missing_index_details,missing_index_group_stats)?
  4. Is there any index fragmentation? (We will also consider index maintenance,since as data is added, removed and modified ordering of data in index is disturbed, gaps appear and so on. This process is known as fragmentation)
In order to get index name and index id's, we use sys.indexes objects. This contains index details irrespective of databases in the instance, tables in the database. So, there can be multiple rows with same object_id and index_id. since, index_id is only unique  to particular table i.e. there can be 2 indexes with same id in a database. There can also be 2 tables with same object_id for an instance (Since, instance can contain multiple databases). In order to correctly fetch required index details from sys.indexes view, we need to give database id, object id and then index id.

Following DMV joins sys.dm_db_index_usage_stats to sys,indexes on three conditions i.e. on object_id, index_id and then filtered with index id.

-- Querying index use in the AdventureWorks database

SELECT DB_NAME(ddius.[database_id]) AS database_name 
 , OBJECT_NAME(ddius.[object_id]
 , DB_ID('AdventureWorks'))  AS [object_name]
 , asi.[name] AS index_name 
 , ddius.user_seeks + ddius.user_scans + ddius.user_lookups AS user_reads
FROM sys.dm_db_index_usage_stats ddius
INNER JOIN AdventureWorks.sys.indexes asi
 ON ddius.[object_id] = asi.[object_id]
  AND ddius.index_id = asi.index_id ;
WHERE ddius.database_id = DB_ID();


In addition to sys.indexes, we use sys.objects, sys.partitions, sys,sysusers views.

Investigating index usage

We will use sys.dm_db_index_usage_stats along with other dmv's to provide answers to the above. This dmv contains following important columns:
database_id: ID of the database
object_id: identifies the table or index to which index belongs to 
index_id: index_id is only unique to the object level. 
where index_id - 0 signifies heap, no clustered index on the table
                          - 1 clustered index
                         > 1 non-clustered index
user_seeks: no. of seeks using index
user_scans: no. of scans using index
user_lookups: no. of lookups using index
user_updates: no. of times index has been modified according to the index

for each of the user actions, has a corresponding last occured time stamp. eg: last_user_seek etc.
similarly, we also have last_system_seek etc.
this 
following query, returns information of indexes that have been used at-least once in query execution. A high number of scans may indicate need to update the statistics. Sometimes, optimiser may choose scan instead of seek even though there are updated statistics due to low table size, optimiser may feel seeking is costly than scanning due to low table size. so, this data should be considered in conjunction with indexes sizes returned by sys.dm_db_index_physical_stats.
-- Usage stats for indexes that have been used to resolve a query

SELECT OBJECT_NAME(ddius.[object_id]
 , ddius.database_id) AS [object_name]
 , ddius.index_id 
 , ddius.user_seeks 
 , ddius.user_scans 
 , ddius.user_lookups 
 , ddius.user_seeks + ddius.user_scans + ddius.user_lookups
  AS user_reads 
 , ddius.user_updates AS user_writes 
 , ddius.last_user_scan 
 , ddius.last_user_update
FROM sys.dm_db_index_usage_stats ddius
 WHERE ddius.database_id > 4 -- filter out system tables
  AND OBJECTPROPERTY(ddius.object_id, 'IsUserTable') = 1
  AND ddius.index_id > 0 -- filter out heaps
ORDER BY ddius.user_scans DESC


This query provides information regarding total index reads and index writes. This can solve performance issues w.r.t how indexes are being used in the following cases:
  1. Which indexes are not being used ?
  2. Which indexes are not being used and frequently updated ?
Finding unused index:
We simple select rows other than row from the previous statement as shown below:

-- Finding unused indexes

-- List unused indexes
SELECT OBJECT_NAME(i.[object_id]) AS [Table Name] 
 , i.name
FROM sys.indexes AS i
INNER JOIN sys.objects AS o 
 ON i.[object_id] = o.[object_id]
WHERE i.index_id NOT IN 
 ( SELECT ddius.index_id
   FROM sys.dm_db_index_usage_stats AS ddius
   WHERE ddius.[object_id] = i.[object_id]
  AND i.index_id = ddius.index_id
  AND database_id = DB_ID() )
AND o.[type] = 'U'
ORDER BY OBJECT_NAME(i.[object_id]) ASC ;


Identify Indexes that have been maintained but not used:

Below, query returns indexes which are updated multiple times but never read i.e. used.

-- Querying sys.dm_db_index_usage_stats for indexes that are being
maintained but not used

SELECT '[' + DB_NAME() + '].[' + su.[name] + '].[' + o.[name] + ']' AS [statement] ,
 i.[name] AS [index_name] ,
 ddius.[user_seeks] + ddius.[user_scans] + ddius.[user_lookups] AS [user_reads] ,
 ddius.[user_updates] AS [user_writes] ,
 SUM(SP.rows) AS [total_rows] 
FROM sys.dm_db_index_usage_stats ddius
INNER JOIN sys.indexes i 
 ON ddius.[object_id] = i.[object_id]
 AND i.[index_id] = ddius.[index_id]
INNER JOIN sys.partitions SP 
 ON ddius.[object_id] = SP.[object_id]
 AND SP.[index_id] = ddius.[index_id]
INNER JOIN sys.objects o 
 ON ddius.[object_id] = o.[object_id]
INNER JOIN sys.sysusers su 
 ON o.[schema_id] = su.[UID]
WHERE ddius.[database_id] = DB_ID() -- current database only
 AND OBJECTPROPERTY(ddius.[object_id], 'IsUserTable') = 1
 AND ddius.[index_id] > 0
GROUP BY su.[name] ,
 o.[name] ,
 i.[name] ,
 ddius.[user_seeks] + ddius.[user_scans] + ddius.[user_lookups] ,
 ddius.[user_updates]
HAVING ddius.[user_seeks] + ddius.[user_scans] + ddius.[user_lookups] = 0
ORDER BY ddius.[user_updates] DESC ,
 su.[name] ,
 o.[name] ,
 i.[name]






















After carefully investigating this statistics, we can get drop command to execute by running previous SQL after including following statement in the SELECT clause
-- Drop Statement in the Drop Command Column

-- 'DROP INDEX [' + i.[name] + '] ON [' + su.[name] + '].[' + o.[name]
+ '] WITH ( ONLINE = OFF )' AS [drop_command]

SELECT '[' + DB_NAME() + '].[' + su.[name] + '].[' + o.[name] + ']' AS [statement] ,
 i.[name] AS [index_name] ,
 ddius.[user_seeks] + ddius.[user_scans] + ddius.[user_lookups] AS [user_reads] ,
 ddius.[user_updates] AS [user_writes] ,
 SUM(SP.rows) AS [total_rows] ,
 'DROP INDEX [' + i.[name] + '] ON [' + su.[name] + '].[' + o.[name]
+ '] WITH ( ONLINE = OFF )' AS [drop_command]
FROM sys.dm_db_index_usage_stats ddius
INNER JOIN sys.indexes i 
 ON ddius.[object_id] = i.[object_id]
 AND i.[index_id] = ddius.[index_id]
INNER JOIN sys.partitions SP 
 ON ddius.[object_id] = SP.[object_id]
 AND SP.[index_id] = ddius.[index_id]
INNER JOIN sys.objects o 
 ON ddius.[object_id] = o.[object_id]
INNER JOIN sys.sysusers su 
 ON o.[schema_id] = su.[UID]
WHERE ddius.[database_id] = DB_ID() -- current database only
 AND OBJECTPROPERTY(ddius.[object_id], 'IsUserTable') = 1
 AND ddius.[index_id] > 0
GROUP BY su.[name] ,
 o.[name] ,
 i.[name] ,
 ddius.[user_seeks] + ddius.[user_scans] + ddius.[user_lookups] ,
 ddius.[user_updates]
HAVING ddius.[user_seeks] + ddius.[user_scans] + ddius.[user_lookups] = 0
ORDER BY ddius.[user_updates] DESC ,
 su.[name] ,
 o.[name] ,
 i.[name]

Now, pick the drop statement and execute to delete corresponding index. Note that drop command should only be executed after careful investigation. There can be 2 cases, where this statistics may misguide:

  • We are considering usage stats for only small period of time, where there is no index usage occurrence. We can at-least  check how recently the usage stats are cleared by using following query:

    -- How old are the index usage stats?
    SELECT DATEDIFF(DAY, sd.crdate, GETDATE()) AS days_history
    FROM sys.sysdatabases sd
    WHERE sd.[name] = 'AdventureWorks2008' ;






  • Also, index has not been used simple because the functionality is cyclic in nature and didn't occur in our test period.

Identify Inefficient Indexes

Now, we will return those columns where index writes greater than index reads using following query:
-- Finding rarely-used indexes

-- Potentially inefficient non-clustered indexes (writes > reads)
SELECT OBJECT_NAME(ddius.[object_id]) AS [Table Name] ,
 i.name AS [Index Name] ,
 i.index_id ,
 user_updates AS [Total Writes] ,
 user_seeks + user_scans + user_lookups AS [Total Reads] ,
 user_updates - ( user_seeks + user_scans + user_lookups )
AS [Difference]
FROM sys.dm_db_index_usage_stats AS ddius WITH ( NOLOCK )
INNER JOIN sys.indexes AS i WITH ( NOLOCK )
 ON ddius.[object_id] = i.[object_id]
 AND i.index_id = ddius.index_id
WHERE OBJECTPROPERTY(ddius.[object_id], 'IsUserTable') = 1
 AND ddius.database_id = DB_ID()
 AND user_updates > ( user_seeks + user_scans + user_lookups )
 AND i.index_id > 1
ORDER BY [Difference] DESC ,
 [Total Writes] DESC ,
 [Total Reads] ASC ;

Before concluding from this statistics, it is recommended to ensure that SQL Server instance is running long enough such that, it covers entire workload. It is also safer to consider workload that periodically occurs like generating reports. Even though this workloads be infrequent, their presence will be performance critical.

Determine index usage patterns of current indexes:

This is achieved using sys.dm_db_index_operational_stats DMF. This primarily accepts database_id, object_id, index_id, partition_number as parameters in order to identify the object (whether it is heap or clustered index or non-clustered index) and provide further more detailed index usage and operation stats for each partition.
This DMF can provide following information:
  1. evidence of potential lock or latch on the object
  2. excessive IO being issued by object
data returned by sys.dm_db_index_operation_stats exists only as long as the metadata object exists in the Cache. since, objects in the cache are one that are most recently used. objects that less recently used may not have available operational stats persisted. One thing to notice is, since grain of this function is partition, one row per partition is returned. where as usage stats result list contains one row per object.
usage stats gives a feel for how an index being used to satisfy a query, while operational stats provides detailed information at physical level using columns leaf_insert_count, leaf_update_count, leaf_delete_count as well as non-leaf equivalent columns for the same.
For diagnosis of resource contention on the object, the following columns are particularly useful:
row_lock_count: no. of row locks requested against this index.
row_lock_wait_count: no. of times a session waited for a row lock against this index
row_lock_wait_time: amount of time waited to acquire row lock by sessions against this index.
page_lock_count, page_lock_wait_count, page_lock_wait_time: This provide similar information as row locks w.r.t pages.
index_lock_promotion_attempt_count, index_lock_promotion_count: no. of times row or page locks have been attempted to get or granted to get escalation to index locks.
page_latch_wait_count, page_latch_wait_in_ms: no. of waits and time waited on physical pages.
page_io_latch_wait_count, page_io_latch_wait_in_ms: no. of waits and time waited while SQL loads page from disk into memory for an index operation.

Detailed activity information of the indexes that are not participated in the user reads

using below query we can get detailed information about those indexes that have 0 reads and frequent writes. This basically gives leaf level information i.e. no. of leaf insers, deletes, updates etc.
-- detailed write information for unused indexes

SELECT '[' + DB_NAME() + '].[' + su.[name] + '].[' + o.[name] + ']' AS [statement] ,


i.[name] AS [index_name] ,


ddius.[user_seeks] + ddius.[user_scans] + ddius.[user_lookups] AS [user_reads] ,


ddius.[user_updates] AS [user_writes] ,


ddios.[leaf_insert_count] ,


ddios.[leaf_delete_count] ,


ddios.[leaf_update_count] ,


ddios.[nonleaf_insert_count] ,


ddios.[nonleaf_delete_count] ,


ddios.[nonleaf_update_count]


FROM sys.dm_db_index_usage_stats ddius


INNER JOIN sys.indexes i ON ddius.[object_id] = i.[object_id]


 AND i.[index_id] = ddius.[index_id]


INNER JOIN sys.partitions SP ON ddius.[object_id] = SP.[object_id]


 AND SP.[index_id] = ddius.[index_id]


INNER JOIN sys.objects o ON ddius.[object_id] = o.[object_id]


INNER JOIN sys.sysusers su ON o.[schema_id] = su.[UID]


INNER JOIN sys.[dm_db_index_operational_stats](DB_ID(), NULL, NULL, NULL) AS ddios


 ON ddius.[index_id] = ddios.[index_id]


 AND ddius.[object_id] = ddios.[object_id]


 AND SP.[partition_number] = ddios.[partition_number]


 AND ddius.[database_id] = ddios.[database_id]


WHERE OBJECTPROPERTY(ddius.[object_id], 'IsUserTable') = 1


 AND ddius.[index_id] > 0


 AND ddius.[user_seeks] + ddius.[user_scans] + ddius.[user_lookups] = 0


ORDER BY ddius.[user_updates] DESC ,


 su.[name] ,


 o.[name] ,


 i.[name]














From above DMV , it is clear that there are some indexes that are getting updated even though SQL havn't gain benfit of it.

****************************** To Be Added ********************************

Finding Missing Indexes

The execution plan generated by optimizer determines most optimal data access path. SQL Server sees if existing indexes offer this path. If ideal index doesn't exists, chooses best available index or goes for a heap scan. SQL Server records this missing indexes details into sys.dm_db_missing_index*DMO's.
Following are the dmv's used for this:

  1. sys.dm_db_missing_index_details
  2. sys.dm_db_missing_index_columns
  3. sys.dm_db_missing_index_group_stats
  4. sys.dm_db_missing_index_groups
None of the above DMO's has index_id column. This is because result set returned by these DMO's are recommendations for the indexes yet to be created and are not materialized in the server. Rows in this DMO's are uniquely identified by index_handle column. 
On SQL Server restart, query statistics are no longer valid. At that point information provided by these DMO's will be no longer valid. It is important to ensure that normal query workload is considered before implementing missing indexes.


Missing index details : sys.dm_db_missing_index_details

Contains identification columns that uniquely identifies indexes in the SQL Server instance, the columns are index_handle, object_id, database_id. Some other useful columns are:
equality_columns: csv list of columns using equality predicate in the actual workload query and are performance recommended to add them in the index.
inequality_columns: csv list of columns using inequality predicate in the actual workload query and are performance recommended to add them in the index.
included_columns: csv list of columns that are performance recommended to include in the index
statement: database and schema of the identified database ??

Missing index columns: sys.dm_db_missing_index_columns

This DMF takes index_handle as input and returns column usage in the actual workload query. This function returns only 3 columns:

  1. column_id
  2. column_name
  3. column_usage (EQUALITY, INEQUALITY, INCLUDED)
Missing Index Groups: sys.dm_db_missing_index_groups

This DMV is used to resolve many to many relationship between sys.dm_db_index_details and sys.dm_db_index_groups using below 2 columns
  1. index_group_handle
  2. index_handle

Missing Index Group Stats: sys.dm_db_missing_index_group_stats
This provides benefit server can get in implementing missing indexes. group_id is identifier column in this view.
unique_compiles: no. of plans may have been compiled using this missing index
user_seeks: no. of seek operations have been performed using this index
user_scans: no. of scan operations have been performed using this index
last_user_seek: last user seek operation would have been performed using this index.
last_user_scan: last user scan operation would have been performed using this index.
avg_total_user_cost: avg cost saving that could have been happened with the missing index
avg_user_impact: estimated percentage by which av query cost would be reduced 
last_user_* are columns that will help to know whether implementing those indexes would be beneficial or not. for eg, if last_user_seek is long ago, implementing corresponding index might not give benefits.

-- Finding beneficial missing indexes

use $(database);

select ddmigs.avg_user_impact * ddmigs.avg_total_user_cost * ddmigs.user_seeks as improvement_measure, 
    ddmigs.avg_user_impact as AvgUserImpact,
    ddmigs.avg_total_user_cost as TotalUserCost, 
    ddmigs.user_seeks as UserSeeks, 
    db_name(ddmid.database_id) as DBName, 
       OBJECT_NAME(ddmid.object_id, ddmid.database_id) as ObjectName, 
    ddmid.equality_columns, 
    ddmid.inequality_columns, 
    ddmid.included_columns, 
    ddmid.statement,
       'USE [' + DB_NAME(ddmid.database_id) + ']; 
       CREATE INDEX idx_' + replace(replace(replace(replace(ISNULL(equality_columns, '') + ISNULL(ddmid.inequality_columns, ''), ', ', '_'), '[', ''), ']', ''), ' ', '') + ' ON [' + schema_name(d.schema_id) + '].[' + OBJECT_NAME(ddmid.object_id, ddmid.database_id) + ']
       (' + ISNULL(equality_columns, '') + CASE WHEN ddmid.equality_columns IS NOT NULL AND ddmid.inequality_columns IS NOT NULL THEN ', ' ELSE '' END + ISNULL(ddmid.inequality_columns, '') + ')
       ' + CASE WHEN included_columns IS NOT NULL THEN 'INCLUDE (' + included_columns + ')' ELSE '' END + '
       WITH (FILLFACTOR=70, ONLINE=ON)' as create_index_statement
from sys.dm_db_missing_index_group_stats a
join sys.dm_db_missing_index_groups b on ddmigs.group_handle = ddmig.index_group_handle 
join sys.dm_db_missing_index_details c on ddmig.index_handle = ddmid.index_handle 
join sys.objects d on ddmid.object_id = d.object_id
where ddmid.database_id = db_id()
order by DBName, ISNULL(equality_columns, '') + ISNULL(ddmid.inequality_columns, ''), improvement_measure desc

Above query, returns create indexes statements for the missing indexes

Index Maintenance: sys.dm_db_index_physical_stats

Reference: Partitioning
Partitioning: is dividing single large table into logical partitions. Partitioning helps in index maintenance. 
This DMO is used to identify fragmentation in clustered tables, non-clustered indexes and heaps. data is stored in the SQL Server in 8KB pages. During data inserting or updating process, if SQL Server finds that there is no enough space in the existing page, data is written into new page known as page split. This page might not in the physical order similar to the logical order of the rows. Similarly, during deletion process empty space is created in a page. During data retrieval process, amount pages scanned will increase as this increases, which may result in high I/O. In order to reduce this we rebuild/reorganize indexes to reduce fragmentation. 
Following DMF function takes db_id, object_id, partition_number, index_id as parameters and returns This also accepts one more parameter called mode, whether LIMITED(scanning onky non-leaf levels) or SAMPLED(scanning 1% sample of all pages) or DETAILED(entire scan) and returns fragmentation details.

-- Index Fragementation

USE $(database); 

SELECT TOP 100 PERCENT dbschemas.[name] 
                       AS 'Schema', 
                       dbtables.[name] 
                       AS 'Table', 
                       dbindexes.[name] 
                       AS 'Index', 
                       dbindexes.index_id 
                       AS IdxID, 
                       Cast(indexstats.avg_fragmentation_in_percent AS DECIMAL(6 
                            , 2)) AS Prcent, 
                       indexstats.page_count, 
                       Replace(index_type_desc, ' index', '') 
                       AS IndxType, 
                       fragment_count 
                       AS Fragments, 
                       index_depth 
                       AS IdxDeep, 
                       Cast(avg_fragment_size_in_pages AS DECIMAL(10, 2)) 
                       AS AvgFragSize 
FROM   sys.Dm_db_index_physical_stats (Db_id(), NULL, NULL, NULL, NULL) AS 
       indexstats 
       INNER JOIN sys.tables dbtables 
               ON dbtables.[object_id] = indexstats.[object_id] 
       INNER JOIN sys.schemas dbschemas 
               ON dbtables.[schema_id] = dbschemas.[schema_id] 
       INNER JOIN sys.indexes AS dbindexes 
               ON dbindexes.[object_id] = indexstats.[object_id] 
                  AND indexstats.index_id = dbindexes.index_id 
WHERE  indexstats.database_id = Db_id() 
       AND Cast(indexstats.avg_fragmentation_in_percent AS DECIMAL(6, 2)) > 0 
ORDER  BY indexstats.avg_fragmentation_in_percent DESC