Resource Component Disk
Perfmon Object: Physical Disk
Counters to Monitor | Description | Possible conclusions / actions |
Current Disk Queue Length | Sustained high queues mean your IO subsystem is not keeping up. | Confirm IO issues with disk sec/read and disk sec/write. Waitstats correlation: 1. IO_COMPLETION 2. ASYNC_IO_COMPLETION 3. WRITELOG 4. LOGMGR |
Average Disk Queue Length | Average of disk queues over time. If this number is consistently high, disk sec/read and disk sec/write is also high indicating IO bandwidth issues. | Confirm IO issues with disk sec/read and disk sec/write. Waitstats correlation: 1. IO_COMPLETION 2. ASYNC_IO_COMPLETION 3. WRITELOG 4. LOGMGR |
Disk Sec/Read | Under typical circumstances, reads should take 4-8 ms (confirm with hardware vendor for exact read time). Sustained queues skew this number higher because disk sec/read factors in the effects of disk queues. High numbers mean your IO subsystem is not keeping up with requests Check individual drive performance if there are multiple drives. If it is a broad problem affecting all drives, the IO subsystem is not keeping up. More drives could be useful. If there is ONE very hot drive, examine disk activity such as location of paging file, database, transaction log, and other read/write activity. | If disk sec/read > normal read time (ask vendor for typical read time) you can consider the following options: 1. Resolve IO bottleneck by adding more drives; spreading IO across new drives if it is possible. For example, move files such as database, transaction log, other application files that are being written to or read from. 2. Check for memory pressure, see memory component. 3. Check for appropriate indexing of SQL tables. Correct indexing can save IO. Check SQL query plans looking for scans and sorts, and so on. Showplan identifies sorting steps. 4. Run SQL Profiler to identify Transact-SQL statements doing scans. In Profiler, select the scans event class and scan stopped event. Click the data column tab and add object Id. Run the trace. Save the profiler trace to a trace table, and then search for the scans event. Alternatively, you can search for high duration, reads, and writes. Waitstats correlation: 1. IO_COMPLETION 2. ASYNC_IO_COMPLETION 3. WRITELOG 4. LOGMGR |
Disk Sec/Write | Under typical circumstances, reads should take 4-8 ms (confirm with hardware vendor). Sustained queues skew this disk sec/write higher because this counter factors in the effects of disk queues. High numbers mean your IO subsystem is not keeping up with requests. In some SAN environments, writes can be as low as 1-2 ms. | See disk sec/read. High performance (significant insert, update, and delete activity) requires the transaction log to be on a separate drive from the database. Waitstats correlation: 1. IO_COMPLETION 2. ASYNC_IO_COMPLETION 3. WRITELOG 4. LOGMGR |
Resource Component: Memory / Cache
Perfmon Object: Memory
Counters to Monitor | Description | Possible conclusions / actions |
Page Faults/sec | Includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle a large numbers of soft faults without significant consequences. However, hard faults that require disk access can cause significant delays. See the disk component for more information. | Check for memory pressure (see SQL Server buffer manager), low data page hit rates, and memory grants pending. |
Pages/sec | Number of pages read from or written to disk to resolve hard page faults. These are hard faults that require physical IO to fetch the page. | Compare with Page Faults/sec. Check for memory pressure (see SQL Server buffer manager), low data page hit rates, and memory grants pending. |
Resource Component: CPU
Perfmon Object: Processor
Counters to Monitor | Description | Possible conclusions / actions |
% User Time | Percentage of time SQL Server runs in User mode. Privileged mode is designed for operating system components and enables direct access to hardware and all memory. | Make sure % user time >70%. Check task manager (taskmgr.exe) to see how much CPU sqlserver.exe is getting. If user time <70%, check on %Processor Time and % Privileged activity. |
% Privileged Time | The operating system switches application threads to privileged mode to access operating system services | Should be <20%. Check task manager (taskmgr.exe) to see how much CPU sqlserver.exe is getting. If % privileged time >20%, check on % Processor Time and % User Time. |
% Processor Time | Percentage of time the CPU is executing over sample interval. | Common uses of CPU resources: 1. Compilation and recompilation use CPU resources. Plan reuse and parameterization minimizes CPU consumption because of compilation. For more information about compilation, recompilation, parameterization, and plan reuse, see http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx Plan reuse is where usecounts are > 1 Select cacheobjtype, objtype, usecounts, or refcounts from sys.dm_exec_cached_plans and order by usecounts Matches to PERFMON counters: 1. System: Processor Queue length 2. SQL Statistics: Compilations/sec 3. SQL Statistics: Re-Compilations/sec 4. SQL Statistics: Requests/sec If both of the following are true, you are CPU bound: 1. Proc time >85% on average 2. Context switches (see system object) >20K / sec Light weight pooling can provide a 15% boost. Light weight pooling (also known as fiber mode) divides a thread into 10 fibers. Overhead per fiber is less than that of individual threads. |
% Idle Time | Percentage of time CPU is idle over sample interval | |
Interrupts/sec | Interrupts/sec is the average rate, in incidents per second, at which the processor received and serviced hardware interrupts. | Correlate with other perfmon counters such as IO, Network. |
Resource Component Thread
Perfmon Object: Process
Counters to Monitor | Description | Possible conclusions / actions |
Page Faults/sec | This counter includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle large numbers of soft faults without significant consequences. However, hard faults, which require disk access, can cause significant delays. See the disk component for more information. | Check for memory pressure (see SQL Server buffer manager), low data page hit rates, and memory grants pending, page life expectancy. |
Resource Component: System
Perfmon Object: System
Counters to Monitor | Description | Possible conclusions / actions |
Processor Queue Length | Number of threads waiting to be scheduled for CPU time. Some common uses of CPU resources that can be avoided: 1. Unnecessary compilation and recompilation. Parameterization and plan reuse would reduce CPU consumption. See http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx 2. memory pressure 3. lack of appropriate indexing | |
Context Switches/sec |
Resource Component: SQL Server
Perfmon Object: SQLServer:Access Method
Counters to Monitor | Description | Possible conclusions / actions |
Forwarded Records/sec | Number of records fetched through forwarded record pointers. Tables without a clustered index. If you start with a short row, and update the row creating a wider row, the row might not fit on the data page. A pointer is put in its location and the row is forwarded to another page. | Look at code to determine where the short row is inserted followed by an update. Can be avoided by: 1. Using Default values (so that an update does not result in a longer row that is the root cause of forwarded records). 2. Using Char instead of varchar (fixes length so that an update does not result in a longer row |
Full Scan/sec | The entire table or index is scanned. Scans can cause excessive IO if an index would be useful. | SQL Profiler can be used to identify which Transact-SQL statements do scan. Select the scans event class and events scan:started and scan:completed. Include the object Id data column. Save the profiler trace to a trace table, and then search for the scans event. The scan:completed event provides associated IO so that you can also search for high reads, writes, and duration. |
Index Searches/ sec | Number of index searches. Index searches are used to start range scans, single index record fetches, and to reposition in an index. | Compare to Full Scan/sec. You want to see high values for index searches. |
Page Splits/sec | Number of page splits occurring as the result of index pages overflowing. Typically associated with leaf pages of clustered indexes and non-clustered indexes. | Page splits are extra IO overhead that results from random inserts. When there is no room on a data page, and the row must be inserted on the page (because of index order), SQL splits the page by inserting a new row and moving the rests of the rows to a new page. Compare to Disk: page sec/write. If this is very high, you might want to reorganize the indexes on the tables causing the page splits, to reduce page splits temporarily. Fillfactor can be used to leave space for inserts. |
Resource Component: SQL Server
Perfmon Object: SQLServer:Memory Mgr
Counters to Monitor | Description | Possible conclusions / actions |
Memory Grants Pending | Memory resources are required for each user request. If sufficient memory is not available, the user waits until there is adequate memory for the query to run. | Compare with Memory grants outstanding. If grants pending increases, you can do the following: 1. add more memory to SQL Server 2. add more physical memory to the box. 3. check for memory pressure. See and correct indexing if you experience “out of memory” conditions. Correlate to Waittype 1. RESOURCE_SEMAPHORE |
Resource Component: SQL Server
Perfmon Object: SQLServer:Buffer Manager
Counters to Monitor | Description | Possible conclusions / actions |
Buffer cache hit ratio | Percentage of time that the pages requested are already in cache | Check for memory pressure. See Checkpoint pages/sec, Lazy writes/sec and Page life expectancy. |
Checkpoint pages/sec | Pages written to disk during the checkpoint process. This frees the SQL cache | Memory pressure is indicated if this counter is high together with high lazy writes/sec and low page life expectancy (<300 seconds) |
Lazy writes/sec | Pages written to disk by the lazy writer, This frees the SQL cache | Memory pressure is indicated if this counter is high together with high lazy writes/sec and low page life expectancy (<300 seconds) |
Page life expectancy | Time in seconds the data pages, on average, stay in SQL cache. Low page life <300 can indicate (1) SQL cache is cold, (2) memory problems or (3) missing indexes. Correlate to Lazy writes/sec and Checkpoint pages/sec | Memory pressure is indicated if this counter is low (<300) together with high lazy writes/sec and checkpoint pages/sec. Check for missing indexes and bad query plans (scans in profiler) Check for high page faults/sec. |
Readahead pages/sec | If there are memory shortages, a cold cache, or low hit rates, SQL Server might use worker threads to readahead (bring in pages beforehand) to raise hit rates. By itself readahead is not a problem unless users are flushing each other’s pages consistently. | Correlate to counters for SQL buffer mgr: buffer cache hit ratio, page life expectancy, lazy writes, and checkpoint pages for memory pressure. Check for appropriate indexing and bad query plans (scans in profiler) |
Resource Component: SQL Server
Perfmon Object: SQLServer:Plan Cache
Counters to Monitor | Description | Possible conclusions / actions |
Cache Hit Ratio | Percentage of time that the procedure plan pages are already in cache. For example,. procedure cache hits. That is, how frequently a compiled procedure is found in the procedure cache (therefore avoiding the need to recompile). | Check for memory pressure. See Checkpoint pages/sec, Lazy writes/sec and Page life expectancy. See SQL Profiler: Stored Procedure: CacheHit, CacheMiss, and CacheInsert to see what stored procedure query plans are already in cache (Hit) vs. those not in cache (Miss,Insert) Check for appropriate plan reuse. See section “Query Plan Reuse”. It is frequently desirable for query plans to be reused for similar SQL although not always. See SQL Statistics: Compilations/sec for discussion of plan reuse. If there is memory pressure, plans are discarded to make room for other data or procedure plans. |
Resource Component: SQL Server
Perfmon Object: SQLServer:Databases
Counters to Monitor | Description | Possible conclusions / actions |
Log Flush Wait Time | Waiting for transaction log writes (ms) | See disk performance counters Check transaction log file sys.dm_io_virtual_file_stats(dbid, file#) for Io_stall (waits in ms) |
Log Flush Waits/sec | This is the number of commits waiting on a log flush. | See disk performance counters, sys.dm_io_virtual_file_stats for Io_stall. |
Log Growths | Microsoft Windows® automatically extends the transaction log to accommodate insert, update, and delete activity. | Generally, growths of the transaction log temporarily freezes writes to the transaction log when Windows extends the transaction log file. Check to see that the extend increment is large enough. If not, performance will decrease when the log is extending frequently. |
Transactions /sec | SQL Server transactions per second |
Resource Component: SQL Server
Perfmon Object: SQLServer:General Statistics
Counters to Monitor | Description | Possible conclusions / actions |
Logins/sec | Number of logins per second | User connections |
Logout/sec | Number of logouts per second | |
User connections | Number of user connections |
Resource Component: SQL Server
Perfmon Object: SQLServer:Latches
Counters to Monitor | Description | Possible conclusions / actions |
Average Latch Wait Time(ms) | Latches are short term light weight synchronization object. Latches are not held for the duration of a transaction. Typical latching operations during row transfers to memory, controlling modifications to row offset table, and so on. | If high, check PERFMON DISK and MEMORY objects for 1. IO bottlenecks 2. memory pressure Typically reduced with more memory or IO capacity |
Latch Waits/sec | See Average Latch Wait Time(ms) | |
Total Latch Wait Time(ms) | Short term light weight synchronization object. Latches are not held for the duration of a transaction. Typical latching operations during row transfers to memory, controlling modifications to row offset table, and so on. | If high, check PERFMON DISK and MEMORY objects for 1. IO bottlenecks 2. memory pressure Typically reduced with more memory or IO capacity |
Resource Component: SQL Server
Perfmon Object: SQLServer:Locks
Counters to Monitor | Description | Possible conclusions / actions |
Average Wait Time(ms) | Transactions should be as short as possible to limit the blocking of other users. | Hint: check for memory pressure, which causes more physical IO, therefore prolonging the duration of transactions and locks. |
Lock Wait Time(ms) | Transactions should be as short as possible to limit the blocking of other users. | Hint: check for memory pressure, which causes more physical IO, therefore prolonging the duration of transactions and locks |
Lock Waits/sec | Transactions should be as short as possible to limit the blocking of other users. | Hint: check for memory pressure, which causes more physical IO, therefore prolonging the duration of transactions and locks |
Resource Component: SQL Server
Perfmon Object: SQLServer:SQL Statistics
Counters to Monitor / Description | Possible conclusions / actions |
SQL Compilations/sec | Before a SQL statement can be executed, the query optimizer must have created a query plan. A query plan consists of steps that return the results for a given SQL statement. For more information about SQL Server 2005 Optimizer and Statistics refer to http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx Compilations/sec Includes both initial compiles and subsequent recompiles. Compilation and recompilation are CPU intensive operations. Unnecessary compilation can sometimes be avoided with query plan reuse. Check for appropriate plan reuse. See section “Query Plan Reuse”. It is frequently desirable for query plans to be reused for similar SQL although not always. Parameterization is important for plan reuse. In addition, some types of recompilation can be avoided. See the SQL Server 2005 recompilation paper for more information: http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx To obtain initial compilations only, you must subtract recompilations/sec from compilations/sec. Compare to batch requests/sec to see extent of compilation. |
SQL Re-Compilations/sec | Only contains recompiles. SQL Profiler can provide information about what processors are recompiling, what statement, and the reason for recompilation. In Profiler, select the stored procedure event class and SP:recompilation event, and include the data column eventsubclass. Review the trace searching for eventsubclass values 1 through 6. The previous statements caused the recompilation. For more information about recompilation, see http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx |
Batch Requests/sec | Total batch requests should be compared with compilations/sec |
Auto-Param Attempts/sec | Auto-param tries should be compared to failed auto-params/sec. Appropriate parameterization is important for plan reuse. In some cases, Sp_executeSQL could be used with ad hoc SQL. For more information about recompilation, see http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx |
Failed Auto-Params/sec | Auto-param tries should be compared to failed auto-params/sec. Appropriate parameterization is important for plan reuse. In some cases, Sp_executeSQL could be used with ad hoc SQL. For more information about recompilation, see http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx |
...Happy SQLing
Thanks for Visiting and Sharing your Views
No comments:
Post a Comment