All About and For Microsoft SQL Server DBA: PERFMON Counters, correlation, possible conclusions and actions

Resource Component Disk

Perfmon Object: Physical Disk

Counters to Monitor	Description	Possible conclusions / actions
Current Disk Queue Length	Sustained high queues mean your IO subsystem is not keeping up.	Confirm IO issues with disk sec/read and disk sec/write. Waitstats correlation: 1. IO_COMPLETION 2. ASYNC_IO_COMPLETION 3. WRITELOG 4. LOGMGR
Average Disk Queue Length	Average of disk queues over time. If this number is consistently high, disk sec/read and disk sec/write is also high indicating IO bandwidth issues.	Confirm IO issues with disk sec/read and disk sec/write. Waitstats correlation: 1. IO_COMPLETION 2. ASYNC_IO_COMPLETION 3. WRITELOG 4. LOGMGR
Disk Sec/Read	Under typical circumstances, reads should take 4-8 ms (confirm with hardware vendor for exact read time). Sustained queues skew this number higher because disk sec/read factors in the effects of disk queues. High numbers mean your IO subsystem is not keeping up with requests Check individual drive performance if there are multiple drives. If it is a broad problem affecting all drives, the IO subsystem is not keeping up. More drives could be useful. If there is ONE very hot drive, examine disk activity such as location of paging file, database, transaction log, and other read/write activity.	If disk sec/read > normal read time (ask vendor for typical read time) you can consider the following options: 1. Resolve IO bottleneck by adding more drives; spreading IO across new drives if it is possible. For example, move files such as database, transaction log, other application files that are being written to or read from. 2. Check for memory pressure, see memory component. 3. Check for appropriate indexing of SQL tables. Correct indexing can save IO. Check SQL query plans looking for scans and sorts, and so on. Showplan identifies sorting steps. 4. Run SQL Profiler to identify Transact-SQL statements doing scans. In Profiler, select the scans event class and scan stopped event. Click the data column tab and add object Id. Run the trace. Save the profiler trace to a trace table, and then search for the scans event. Alternatively, you can search for high duration, reads, and writes. Waitstats correlation: 1. IO_COMPLETION 2. ASYNC_IO_COMPLETION 3. WRITELOG 4. LOGMGR
Disk Sec/Write	Under typical circumstances, reads should take 4-8 ms (confirm with hardware vendor). Sustained queues skew this disk sec/write higher because this counter factors in the effects of disk queues. High numbers mean your IO subsystem is not keeping up with requests. In some SAN environments, writes can be as low as 1-2 ms.	See disk sec/read. High performance (significant insert, update, and delete activity) requires the transaction log to be on a separate drive from the database. Waitstats correlation: 1. IO_COMPLETION 2. ASYNC_IO_COMPLETION 3. WRITELOG 4. LOGMGR

Resource Component: Memory / Cache

Perfmon Object: Memory

Counters to Monitor	Description	Possible conclusions / actions
Page Faults/sec	Includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle a large numbers of soft faults without significant consequences. However, hard faults that require disk access can cause significant delays. See the disk component for more information.	Check for memory pressure (see SQL Server buffer manager), low data page hit rates, and memory grants pending.
Pages/sec	Number of pages read from or written to disk to resolve hard page faults. These are hard faults that require physical IO to fetch the page.	Compare with Page Faults/sec. Check for memory pressure (see SQL Server buffer manager), low data page hit rates, and memory grants pending.

Resource Component: CPU

Perfmon Object: Processor

Counters to Monitor	Description	Possible conclusions / actions
% User Time	Percentage of time SQL Server runs in User mode. Privileged mode is designed for operating system components and enables direct access to hardware and all memory.	Make sure % user time >70%. Check task manager (taskmgr.exe) to see how much CPU sqlserver.exe is getting. If user time <70%, check on %Processor Time and % Privileged activity.
% Privileged Time	The operating system switches application threads to privileged mode to access operating system services	Should be <20%. Check task manager (taskmgr.exe) to see how much CPU sqlserver.exe is getting. If % privileged time >20%, check on % Processor Time and % User Time.
% Processor Time	Percentage of time the CPU is executing over sample interval.	Common uses of CPU resources: 1. Compilation and recompilation use CPU resources. Plan reuse and parameterization minimizes CPU consumption because of compilation. For more information about compilation, recompilation, parameterization, and plan reuse, see http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx Plan reuse is where usecounts are > 1 Select cacheobjtype, objtype, usecounts, or refcounts from sys.dm_exec_cached_plans and order by usecounts Matches to PERFMON counters: 1. System: Processor Queue length 2. SQL Statistics: Compilations/sec 3. SQL Statistics: Re-Compilations/sec 4. SQL Statistics: Requests/sec If both of the following are true, you are CPU bound: 1. Proc time >85% on average 2. Context switches (see system object) >20K / sec Light weight pooling can provide a 15% boost. Light weight pooling (also known as fiber mode) divides a thread into 10 fibers. Overhead per fiber is less than that of individual threads.
% Idle Time	Percentage of time CPU is idle over sample interval
Interrupts/sec	Interrupts/sec is the average rate, in incidents per second, at which the processor received and serviced hardware interrupts.	Correlate with other perfmon counters such as IO, Network.

Resource Component Thread

Perfmon Object: Process

Counters to Monitor	Description	Possible conclusions / actions
Page Faults/sec	This counter includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle large numbers of soft faults without significant consequences. However, hard faults, which require disk access, can cause significant delays. See the disk component for more information.	Check for memory pressure (see SQL Server buffer manager), low data page hit rates, and memory grants pending, page life expectancy.

Resource Component: System

Perfmon Object: System

Counters to Monitor	Description	Possible conclusions / actions
Processor Queue Length		Number of threads waiting to be scheduled for CPU time. Some common uses of CPU resources that can be avoided: 1. Unnecessary compilation and recompilation. Parameterization and plan reuse would reduce CPU consumption. See http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx 2. memory pressure 3. lack of appropriate indexing
Context Switches/sec