Database Design
Rule | Description | Value | Source | Problem Description |
1 | High Frequency queries having a high number of table joins. | >4 | Sys.dm_exec_sql_text Sys.dm_exec_cached_plans | High frequency queries with lots of joins can be too normalized for high OLTP scalability. |
2 | Frequently updated tables having # indexes. | >3 | Sys.indexes sys.dm_db_operational_index_stats | Excessive index maintenance for OLTP. |
3 | Big IOs Table Scans Range Scans | >1 | Perfmon object SQL Server Access Methods Sys.dm_exec_query_stats | A missing index flushes the cache. |
4 | Unused Indexes. | Index not in Sys.dm_db_index_usage_stats. If an index is NEVER used, it will not appear in the DMV sys.dm_db_index_usage_stats | Avoid Index maintenance for unused indexes. |
CPU
Rule | Description | Value | Source | Problem Description |
1 | Signal Waits | >25% | Sys.dm_os_wait_stats | Time in runnable queue is pure CPU wait. |
2 | Plan reuse | <90% | Perfmon object SQL Server Statistics | OLTP identical transactions should ideally have >95% plan reuse. |
3 | Parallelism: Cxpacket waits | >5% | Sys.dm_os_wait_stats | Parallelism reduces OLTP throughput. CXPACKET indicates that multiple CPUs are working in parallel, dividing up the query in smaller pieces. Ordinarily a well tuned OLTP application would not parallelize unless an index is missing, there is an incomplete WHERE clause, or the query is not a true OLTP transaction. |
Memory
Rule | Description | Value | Source | Problem Description |
1 | Page life expectancy | <300 sec | Perfmon object SQL Server Buffer Manager SQL Server Buffer Nodes | Page life expectancy is the average number of seconds a data page stays in cache. Low values could indicate a cache flush that is caused by a big read. Pure OLTP workloads do NOT issue big reads, thus possible missing index. |
2 | Page life expectancy | Drops by 50% | Perfmon object SQL Server Buffer Manager | Page life expectancy is the average number of seconds a data page stays in cache. Low values could indicate a cache flush that is caused by a big read. Pure OLTP workloads do NOT issue big reads, thus possible missing index. |
3 | Memory Grants Pending | >1 | Perfmon object SQL Server Memory Manager | Current number of processes waiting for a workspace memory grant. |
4 | SQL cache hit ratio | <90% | SQL cache hit ratio falls under 90% for sustained periods of time greater than 60 sec. | It is likely that large scans have to be performed, which in turn flushes out the buffer cache. |
IO
Rule | Description | Value | Source | Problem Description |
1 | Average Disk sec/read | >20 ms | Perfmon object Physical Disk | Reads should take 4-8 ms without any IO pressure. |
2 | Average Disk sec/write | >20 ms | Perfmon object Physical Disk | Writes (sequential) can be as fast as 1 ms for transaction log. |
3 | Big IOs Table Scans Range Scans | >1 | Perfmon object SQL Server Access Methods | A missing index flushes the cache. |
4 | If Top 2 values for wait stats are any of the following: ASYNCH_IO_COMPLETION IO_COMPLETION LOGMGR WRITELOG PAGEIOLATCH_x | Top 2 | Sys.dm_os_wait_stats | If top 2 wait_stats values include IO, there is an IO bottleneck. |
5 | Low bytes per sec. | Perfmon object Physical Disk |
Blocking
Rule | Description | Value | Source | Problem Description |
1 | Block percentage | >2% | Sys.dm_db_index_operational_stats | Frequency of blocks. |
2 | Block process report | 30 sec | Sp_configure profiler | Report of statements. |
3 | Average Row Lock Waits | >100ms | Sys.dm_db_index_operational_stats | Duration of blocks. |
4 | If Top 2 values for wait stats are any of the following: LCK_M_BU LCK_M_IS LCK_M_IU LCK_M_IX LCK_M_RIn_NL LCK_M_RIn_S LCK_M_RIn_U LCK_M_RIn_X LCK_M_RS_S LCK_M_RS_U LCK_M_RX_S LCK_M_RX_U LCK_M_RX_X LCK_M_S LCK_M_SCH_M LCK_M_SCH_S LCK_M_SIU LCK_M_SIX LCK_M_U LCK_M_UIX LCK_M_X | Top 2 | Sys.dm_os_wait_stats | If top 2 wait_stats values include locking, there is a blocking bottleneck. |
5 | High number of deadlocks | >5 per hour | Trace flag 1204 to display in the errorlog and or the profiler deadlock graph. | If the deadlock occurs with the same participant SQL commands or operations multiple times, it is likely that there is a locking problem. |
Network
Rule | Description | Value | Source | Problem Description |
1 | High network latency coupled with an application that has many round trips to the database. | Output queue length >2 | Perfmon object: Network Interface | Indicates that the latency between the application server and the database is high. Could be caused by significant network infrastructure between the application and the instance of SQL Server. |
2 | Network bandwidth is used up. | Packets Outbound Discarded Packets Outbound Errors Packets Received Discarded Packets Received Errors | Perfmon object: Network Interface | Dropped packets are detected. |
In summary, given the high volume of identical small transactions that characterize OLTP, transactions per second and resource usage can be improved as follows:
1. Database designs usually keep the number of indexes to a functional minimum as every insert, update, and delete incurs index maintenance.
2. CPU can be reduced with plan reuse and join reduction.
3. IO performance can be reduced with good indexing, join reduction, and high page life expectancy.
4. Memory is optimal when there are no sudden drops in Page Life Expectancy.
5. Sorts can be limited with index usage. That is, a certain sort order is supported by an index that is sorted the same way, either ascending or descending.
6. Blocking can be reduced with index design and short transactions.
...Happy SQLing
Thanks for Visiting and Sharing your Views
No comments:
Post a Comment