Friday, July 1, 2011

PERFMON Counters, correlation, possible conclusions and actions

Resource Component Disk
Perfmon Object: Physical Disk


Counters to Monitor
Description
Possible conclusions / actions
Current Disk Queue Length
Sustained high queues mean your IO subsystem is not keeping up.
Confirm IO issues with disk sec/read and disk sec/write.
Waitstats correlation:
1. IO_COMPLETION
2. ASYNC_IO_COMPLETION
3. WRITELOG
4. LOGMGR
Average Disk Queue Length
Average of disk queues over time. If this number is consistently high, disk sec/read and disk sec/write is also high indicating IO bandwidth issues.
Confirm IO issues with disk sec/read and disk sec/write. Waitstats correlation:
1. IO_COMPLETION
2. ASYNC_IO_COMPLETION
3. WRITELOG
4. LOGMGR
Disk Sec/Read
Under typical circumstances, reads should take 4-8 ms (confirm with hardware vendor for exact read time). Sustained queues skew this number higher because disk sec/read factors in the effects of disk queues. High numbers mean your IO subsystem is not keeping up with requests
Check individual drive performance if there are multiple drives. If it is a broad problem affecting all drives, the IO subsystem is not keeping up. More drives could be useful. If there is ONE very hot drive, examine disk activity such as location of paging file, database, transaction log, and other read/write activity.
If disk sec/read > normal read time (ask vendor for typical read time) you can consider the following options:
1. Resolve IO bottleneck by adding more drives; spreading IO across new drives if it is possible. For example, move files such as database, transaction log, other application files that are being written to or read from.
2. Check for memory pressure, see memory component.
3. Check for appropriate indexing of SQL tables. Correct indexing can save IO. Check SQL query plans looking for scans and sorts, and so on. Showplan identifies sorting steps.
4. Run SQL Profiler to identify Transact-SQL statements doing scans. In Profiler, select the scans event class and scan stopped event. Click the data column tab and add object Id. Run the trace. Save the profiler trace to a trace table, and then search for the scans event. Alternatively, you can search for high duration, reads, and writes.
Waitstats correlation:
1. IO_COMPLETION
2. ASYNC_IO_COMPLETION
3. WRITELOG
4. LOGMGR
Disk Sec/Write
Under typical circumstances, reads should take 4-8 ms (confirm with hardware vendor). Sustained queues skew this disk sec/write higher because this counter factors in the effects of disk queues. High numbers mean your IO subsystem is not keeping up with requests. In some SAN environments, writes can be as low as 1-2 ms.
See disk sec/read.
High performance (significant insert, update, and delete activity) requires the transaction log to be on a separate drive from the database.
Waitstats correlation:
1. IO_COMPLETION
2. ASYNC_IO_COMPLETION
3. WRITELOG
4. LOGMGR


Resource Component: Memory / Cache
Perfmon Object: Memory


Counters to Monitor
Description
Possible conclusions / actions
Page Faults/sec
Includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle a large numbers of soft faults without significant consequences. However, hard faults that require disk access can cause significant delays. See the disk component for more information.
Check for memory pressure (see SQL Server buffer manager), low data page hit rates, and memory grants pending.
Pages/sec
Number of pages read from or written to disk to resolve hard page faults.
These are hard faults that require physical IO to fetch the page.
Compare with Page Faults/sec.
Check for memory pressure (see SQL Server buffer manager), low data page hit rates, and memory grants pending.


Resource Component: CPU
Perfmon Object: Processor


Counters to Monitor
Description
Possible conclusions / actions
% User Time
Percentage of time SQL Server runs in User mode. Privileged mode is designed for operating system components and enables direct access to hardware and all memory.
Make sure % user time >70%. Check task manager (taskmgr.exe) to see how much CPU sqlserver.exe is getting. If user time <70%, check on %Processor Time and % Privileged activity.
% Privileged Time
The operating system switches application threads to privileged mode to access operating system services
Should be <20%. Check task manager (taskmgr.exe) to see how much CPU sqlserver.exe is getting. If % privileged time >20%, check on % Processor Time and % User Time.
% Processor Time
Percentage of time the CPU is executing over sample interval.
Common uses of CPU resources:
1. Compilation and recompilation use CPU resources. Plan reuse and parameterization minimizes CPU consumption because of compilation. For more information about compilation, recompilation, parameterization, and plan reuse, see http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
Plan reuse is where usecounts are > 1
Select cacheobjtype, objtype, usecounts, or refcounts from sys.dm_exec_cached_plans and order by usecounts
Matches to PERFMON counters:
1. System: Processor Queue length
2. SQL Statistics: Compilations/sec
3. SQL Statistics: Re-Compilations/sec
4. SQL Statistics: Requests/sec
If both of the following are true, you are CPU bound:
1. Proc time >85% on average
2. Context switches (see system object) >20K / sec
Light weight pooling can provide a 15% boost. Light weight pooling (also known as fiber mode) divides a thread into 10 fibers. Overhead per fiber is less than that of individual threads.
% Idle Time
Percentage of time CPU is idle over sample interval

Interrupts/sec
Interrupts/sec is the average rate, in incidents per second, at which the processor received and serviced hardware interrupts.
Correlate with other perfmon counters such as IO, Network.


Resource Component Thread
Perfmon Object: Process


Counters to Monitor
Description
Possible conclusions / actions
Page Faults/sec
This counter includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle large numbers of soft faults without significant consequences. However, hard faults, which require disk access, can cause significant delays. See the disk component for more information.
Check for memory pressure (see SQL Server buffer manager), low data page hit rates, and memory grants pending, page life expectancy.


Resource Component:  System
Perfmon Object: System


Counters to Monitor
Description
Possible conclusions / actions
Processor Queue Length

Number of threads waiting to be scheduled for CPU time. Some common uses of CPU resources that can be avoided:
1. Unnecessary compilation and recompilation. Parameterization and plan reuse would reduce CPU consumption. See http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
2. memory pressure
3. lack of appropriate indexing
Context Switches/sec




Resource Component: SQL Server
Perfmon Object: SQLServer:Access Method


Counters to Monitor
Description
Possible conclusions / actions
Forwarded Records/sec
Number of records fetched through forwarded record pointers.
Tables without a clustered index. If you start with a short row, and update the row creating a wider row, the row might not fit on the data page. A pointer is put in its location and the row is forwarded to another page.
Look at code to determine where the short row is inserted followed by an update.
Can be avoided by:
1. Using Default values (so that an update does not result in a longer row that is the root cause of forwarded records).
2. Using Char instead of varchar (fixes length so that an update does not result in a longer row
Full Scan/sec
The entire table or index is scanned. Scans can cause excessive IO if an index would be useful.
SQL Profiler can be used to identify which Transact-SQL statements do scan. Select the scans event class and events scan:started and scan:completed. Include the object Id data column. Save the profiler trace to a trace table, and then search for the scans event.
The scan:completed event provides associated IO so that you can also search for high reads, writes, and duration.
Index Searches/
sec
Number of index searches. Index searches are used to start range scans, single index record fetches, and to reposition in an index.
Compare to Full Scan/sec. You want to see high values for index searches.
Page Splits/sec
Number of page splits occurring as the result of index pages overflowing. Typically associated with leaf pages of clustered indexes and non-clustered indexes.
Page splits are extra IO overhead that results from random inserts.
When there is no room on a data page, and the row must be inserted on the page (because of index order), SQL splits the page by inserting a new row and moving the rests of the rows to a new page.
Compare to Disk: page sec/write. If this is very high, you might want to reorganize the indexes on the tables causing the page splits, to reduce page splits temporarily. Fillfactor can be used to leave space for inserts.


Resource Component: SQL Server
Perfmon Object: SQLServer:Memory Mgr


Counters to Monitor
Description
Possible conclusions / actions
Memory Grants Pending
Memory resources are required for each user request. If sufficient memory is not available, the user waits until there is adequate memory for the query to run.
Compare with Memory grants outstanding. If grants pending increases, you can do the following:
1. add more memory to SQL Server
2. add more physical memory to the box.
3. check for memory pressure. See and correct indexing if you experience “out of memory” conditions.
Correlate to Waittype
1. RESOURCE_SEMAPHORE


Resource Component: SQL Server
Perfmon Object: SQLServer:Buffer Manager


Counters to Monitor
Description
Possible conclusions / actions
Buffer cache hit ratio
Percentage of time that the pages requested are already in cache
Check for memory pressure. See Checkpoint pages/sec, Lazy writes/sec and Page life expectancy.
Checkpoint pages/sec
Pages written to disk during the checkpoint process. This frees the SQL cache
Memory pressure is indicated if this counter is high together with high lazy writes/sec and low page life expectancy (<300 seconds)
Lazy writes/sec
Pages written to disk by the lazy writer, This frees the SQL cache
Memory pressure is indicated if this counter is high together with high lazy writes/sec and low page life expectancy (<300 seconds)
Page life expectancy
Time in seconds the data pages, on average, stay in SQL cache. Low page life <300 can indicate (1) SQL cache is cold, (2) memory problems or (3) missing indexes. Correlate to Lazy writes/sec and Checkpoint pages/sec
Memory pressure is indicated if this counter is low (<300) together with high lazy writes/sec and checkpoint pages/sec.
Check for missing indexes and bad query plans (scans in profiler)
Check for high page faults/sec.
Readahead pages/sec
If there are memory shortages, a cold cache, or low hit rates, SQL Server might use worker threads to readahead (bring in pages beforehand) to raise hit rates. By itself readahead is not a problem unless users are flushing each other’s pages consistently.
Correlate to counters for SQL buffer mgr: buffer cache hit ratio, page life expectancy, lazy writes, and checkpoint pages for memory pressure.
Check for appropriate indexing and bad query plans (scans in profiler)


Resource Component: SQL Server
Perfmon Object: SQLServer:Plan Cache


Counters to Monitor
Description
Possible conclusions / actions
Cache Hit Ratio
Percentage of time that the procedure plan pages are already in cache. For example,. procedure cache hits. That is, how frequently a compiled procedure is found in the procedure cache (therefore avoiding the need to recompile).
Check for memory pressure. See Checkpoint pages/sec, Lazy writes/sec and Page life expectancy.
See SQL Profiler: Stored Procedure: CacheHit, CacheMiss, and CacheInsert to see what stored procedure query plans are already in cache (Hit) vs. those not in cache (Miss,Insert)
Check for appropriate plan reuse. See section “Query Plan Reuse”. It is frequently desirable for query plans to be reused for similar SQL although not always.
See SQL Statistics: Compilations/sec for discussion of plan reuse.
If there is memory pressure, plans are discarded to make room for other data or procedure plans.


Resource Component: SQL Server
Perfmon Object: SQLServer:Databases


Counters to Monitor
Description
Possible conclusions / actions
Log Flush Wait Time
Waiting for transaction log writes (ms)
See disk performance counters
Check transaction log file sys.dm_io_virtual_file_stats(dbid, file#) for Io_stall (waits in ms)
Log Flush Waits/sec
This is the number of commits waiting on a log flush.
See disk performance counters, sys.dm_io_virtual_file_stats for Io_stall.
Log Growths
Microsoft Windows® automatically extends the transaction log to accommodate insert, update, and delete activity.
Generally, growths of the transaction log temporarily freezes writes to the transaction log when Windows extends the transaction log file. Check to see that the extend increment is large enough. If not, performance will decrease when the log is extending frequently.
Transactions /sec
SQL Server transactions per second



Resource Component: SQL Server
Perfmon Object: SQLServer:General Statistics


Counters to Monitor
Description
Possible conclusions / actions
Logins/sec
Number of logins per second
User connections
Logout/sec
Number of logouts per second

User connections
Number of user connections



Resource Component: SQL Server
Perfmon Object: SQLServer:Latches


Counters to Monitor
Description
Possible conclusions / actions
Average Latch Wait Time(ms)
Latches are short term light weight synchronization object. Latches are not held for the duration of a transaction. Typical latching operations during row transfers to memory, controlling modifications to row offset table, and so on.
If high, check PERFMON DISK and MEMORY objects for
1. IO bottlenecks
2. memory pressure
Typically reduced with more memory or IO capacity
Latch Waits/sec
See Average Latch Wait Time(ms)

Total Latch Wait Time(ms)
Short term light weight synchronization object. Latches are not held for the duration of a transaction. Typical latching operations during row transfers to memory, controlling modifications to row offset table, and so on.
If high, check PERFMON DISK and MEMORY objects for
1. IO bottlenecks
2. memory pressure
Typically reduced with more memory or IO capacity


Resource Component: SQL Server
Perfmon Object: SQLServer:Locks


Counters to Monitor
Description
Possible conclusions / actions
Average Wait Time(ms)
Transactions should be as short as possible to limit the blocking of other users.
Hint: check for memory pressure, which causes more physical IO, therefore prolonging the duration of transactions and locks.
Lock Wait Time(ms)
Transactions should be as short as possible to limit the blocking of other users.
Hint: check for memory pressure, which causes more physical IO, therefore prolonging the duration of transactions and locks
Lock Waits/sec
Transactions should be as short as possible to limit the blocking of other users.
Hint: check for memory pressure, which causes more physical IO, therefore prolonging the duration of transactions and locks


Resource Component: SQL Server
Perfmon Object: SQLServer:SQL Statistics


Counters to Monitor / Description
Possible conclusions / actions
SQL Compilations/sec
Before a SQL statement can be executed, the query optimizer must have created a query plan.  A query plan consists of steps that return the results for a given SQL statement.
For more information about SQL Server 2005 Optimizer and Statistics refer to http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx
Compilations/sec Includes both initial compiles and subsequent recompiles. Compilation and recompilation are CPU intensive operations.
Unnecessary compilation can sometimes be avoided with query plan reuse. Check for appropriate plan reuse. See section “Query Plan Reuse”. It is frequently desirable for query plans to be reused for similar SQL although not always.
Parameterization is important for plan reuse. In addition, some types of recompilation can be avoided. See the SQL Server 2005 recompilation paper for more information: http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
To obtain initial compilations only, you must subtract recompilations/sec from compilations/sec.
Compare to batch requests/sec to see extent of compilation.
SQL Re-Compilations/sec
Only contains recompiles. SQL Profiler can provide information about what processors are recompiling, what statement, and the reason for recompilation. In Profiler, select the stored procedure event class and SP:recompilation event, and include the data column eventsubclass. Review the trace searching for eventsubclass values 1 through 6. The previous statements caused the recompilation. For more information about recompilation, see http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
Batch Requests/sec
Total batch requests should be compared with compilations/sec
Auto-Param Attempts/sec
Auto-param tries should be compared to failed auto-params/sec. Appropriate parameterization is important for plan reuse. In some cases, Sp_executeSQL could be used with ad hoc SQL. For more information about recompilation, see http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
Failed Auto-Params/sec
Auto-param tries should be compared to failed auto-params/sec. Appropriate parameterization is important for plan reuse. In some cases, Sp_executeSQL could be used with ad hoc SQL. For more information about recompilation, see http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx


...Happy SQLing
Thanks for Visiting and Sharing your Views

No comments:

Post a Comment