Friday, July 1, 2011

BLUEPRINT:- Common scenarios to avoid in OLTP

Database Design


Rule
Description
Value
Source
Problem Description
1
High Frequency queries having a high number of table joins.
>4
Sys.dm_exec_sql_text
Sys.dm_exec_cached_plans
High frequency queries with lots of joins can be too normalized for high OLTP scalability.
2
Frequently updated tables having # indexes.
>3
Sys.indexes
sys.dm_db_operational_index_stats
Excessive index maintenance for OLTP.
3
Big IOs
Table Scans
Range Scans
>1
Perfmon object
SQL Server Access Methods
Sys.dm_exec_query_stats
A missing index flushes the cache.
4
Unused Indexes.
Index not in Sys.dm_db_index_usage_stats.  If an index is NEVER used, it will not appear in the DMV sys.dm_db_index_usage_stats
Avoid Index maintenance for unused indexes.


CPU


Rule
Description
Value
Source
Problem Description
1
Signal Waits
>25%
Sys.dm_os_wait_stats
Time in runnable queue is pure CPU wait.
2
Plan reuse
<90%
Perfmon object
SQL Server Statistics
OLTP identical transactions should ideally have >95% plan reuse.
3
Parallelism: Cxpacket waits
>5%
Sys.dm_os_wait_stats
Parallelism reduces OLTP throughput. CXPACKET indicates that multiple CPUs are working in parallel, dividing up the query in smaller pieces. Ordinarily a well tuned OLTP application would not parallelize unless an index is missing, there is an incomplete WHERE clause, or the query is not a true OLTP transaction.


Memory


Rule
Description
Value
Source
Problem Description
1
Page life expectancy
<300 sec
Perfmon object
SQL Server Buffer Manager
SQL Server Buffer Nodes
Page life expectancy is the average number of seconds a data page stays in cache.  Low values could indicate a cache flush that is caused by a big read.  Pure OLTP workloads do NOT issue big reads, thus possible missing index.
2
Page life expectancy
Drops by 50%
Perfmon object
SQL Server Buffer Manager
Page life expectancy is the average number of seconds a data page stays in cache.  Low values could indicate a cache flush that is caused by a big read.  Pure OLTP workloads do NOT issue big reads, thus possible missing index.
3
Memory Grants Pending
>1
Perfmon object
SQL Server Memory Manager
Current number of processes waiting for a workspace memory grant.
4
SQL cache hit ratio
<90%
SQL cache hit ratio falls under 90% for sustained periods of time greater than 60 sec.
It is likely that large scans have to be performed, which in turn flushes out the buffer cache.


IO


Rule
Description
Value
Source
Problem Description
1
Average Disk sec/read
>20 ms
Perfmon object
Physical Disk
Reads should take 4-8 ms without any IO pressure.
2
Average Disk sec/write
>20 ms
Perfmon object
Physical Disk
Writes (sequential) can be as fast as 1 ms for transaction log.
3
Big IOs
Table Scans
Range Scans
>1
Perfmon object
SQL Server Access Methods
A missing index flushes the cache.
4
If Top 2 values for wait stats are any of the following:
ASYNCH_IO_COMPLETION
IO_COMPLETION
LOGMGR
WRITELOG
PAGEIOLATCH_x
Top 2
Sys.dm_os_wait_stats
If top 2 wait_stats values include IO, there is an IO bottleneck.
5
Low bytes per sec.

Perfmon object
Physical Disk



Blocking


Rule
Description
Value
Source
Problem Description
1
Block percentage
>2%
Sys.dm_db_index_operational_stats
Frequency of blocks.
2
Block process report
30 sec
Sp_configure profiler
Report of statements.
3
Average Row Lock Waits
>100ms
Sys.dm_db_index_operational_stats
Duration of blocks.
4
If Top 2 values for wait stats are any of the following:
LCK_M_BU
LCK_M_IS
LCK_M_IU
LCK_M_IX
LCK_M_RIn_NL
LCK_M_RIn_S
LCK_M_RIn_U
LCK_M_RIn_X
LCK_M_RS_S
LCK_M_RS_U
LCK_M_RX_S
LCK_M_RX_U
LCK_M_RX_X
LCK_M_S
LCK_M_SCH_M
LCK_M_SCH_S
LCK_M_SIU
LCK_M_SIX
LCK_M_U
LCK_M_UIX
LCK_M_X
Top 2
Sys.dm_os_wait_stats
If top 2 wait_stats values include locking, there is a blocking bottleneck.
5
High number of deadlocks
>5 per hour
Trace flag 1204 to display in the errorlog and or the profiler deadlock graph.
If the deadlock occurs with the same participant SQL commands or operations multiple times, it is likely that there is a locking problem.


Network


Rule
Description
Value
Source
Problem Description
1
High network latency coupled with an application that has many round trips to the database.
Output queue length >2
Perfmon object: Network Interface
Indicates that the latency between the application server and the database is high.
Could be caused by significant network infrastructure between the application and the instance of SQL Server.
2
Network bandwidth is used up.
Packets Outbound Discarded
Packets Outbound Errors
Packets Received Discarded
Packets Received Errors
Perfmon object: Network Interface
Dropped packets are detected.


In summary, given the high volume of identical small transactions that characterize OLTP, transactions per second and resource usage can be improved as follows: 
1.    Database designs usually keep the number of indexes to a functional minimum as every insert, update, and delete incurs index maintenance.
2.    CPU can be reduced with plan reuse and join reduction.
3.    IO performance can be reduced with good indexing, join reduction, and high page life expectancy.
4.    Memory is optimal when there are no sudden drops in Page Life Expectancy.
5.    Sorts can be limited with index usage. That is, a certain sort order is supported by an index that is sorted the same way, either ascending or descending.
6.    Blocking can be reduced with index design and short transactions.


...Happy SQLing
Thanks for Visiting and Sharing your Views

No comments:

Post a Comment