Using wait stats on SQL Server

I wrote two posts about wait stats:

  1.  What wait means
  2. About SQL Server saving wait stats on DMVs

Now, going deeper on wait stats and see “why SQL Server is running slow?”. To answer that question I like to start with the DMV sys.dm_os_wait_stats, because this DMV provides a running total of all waits encontered by executing threads in SQL Server instance.

SQL Server categorizes waits across several different type and some of these types only indicate quit period on the instance where threads stay in waiting.

The script below shows the top wait types that have accumulated since SQL Server started or was cleared.

WITH Waits AS (
SELECT  wait_type
    , CAST(wait_time_ms / 1000. AS DECIMAL(12, 2)) AS [wait_time_s]
    , CAST(100. * wait_time_ms / SUM(wait_time_ms) OVER () AS DECIMAL(12, 2)) AS [pct]
    , ROW_NUMBER() OVER (ORDER BY wait_time_ms DESC) AS rn 
FROM sys.dm_os_wait_stats WITH (NOLOCK) 
SELECT  W1.wait_type
    , wait_time_s
    , pct
FROM   Waits AS W1)
    , wait_time_s
    , pct
    , running_pct
FROM      Running_Waits
WHERE     running_pct - pct = 99
ORDER BY  running_pct

This query I got from Paul Randal blog, his blog has a lot of information about waits. Since not all wait types are indicators of real issue, the where clause is removing unnecessary type.

The common wait types for me are in the table below, there are much more, but you can start with that list.

ASYNC_IO_COMPLETIONI/OUsed to indicate a worker is waiting on a asynchronous I/O operation to complete not associated with database pagesSince this is used for various reason you need to find out what query or task is associated with the wait. Two examples of where this wait type is used is to create files associated with a CREATE DATABASE and for “zeroing” out a transaction log file during log creation or growth.
CHECKPOINT_QUEUEBufferUsed by background worker that waits on events on queue to process checkpoint requests. This is an “optional” wait type see Important Notes section in blogYou should be able to safely ignore this one as it is just indicates the checkpoint background worker is waiting for work to do. I suppose if you thought you had issues with checkpoints not working or log truncation you might see if this worker ever “wakes up”. Expect higher wait times as this will only wake up when work to do
CHKPTBufferUsed to coordinate the checkpoint background worker thread with recovery of master so checkpoint won’t start accepting queue requests until master onlineYou should be able to safely ignore. You should see 1 wait of this type for the server unless the checkpoint worker crashed and had to be restarted.. If though this is technically a “sync” type of event I left its usage as Background
CXPACKETQueryUsed to synchronize threads involved in a parallel query. This wait type only means a  parallel query is executing.You may not need to take any action. If you see high wait times then it means you have a long running parallel query. I would first identify the query and determine if you need to tune it. Note sys.dm_exec_requests only shows the wait type of the request even if multiple tasks have different wait types. When you see CXPACKET here look at all tasks associated with the request. Find the task that doesn’t have this wait_type and see its status. It may be waiting on something else slowing down the query. wait_resource also has interesting details about the tasks and its parallel query operator
IO_COMPLETIONI/OUsed to indicate a wait for I/O for operation (typically synchronous)  like sorts and various situations where the engine needs to do a synchronous I/OIf wait times are high then you have a disk I/O bottleneck. The problem will be determining what type of operation and where the bottleneck exists. For sorts, it is on the storage system associated with tempdb. Note that database page I/O does not use this wait type. Instead look at PAGEIOLATCH waits.
LAZYWRITER_SLEEPBufferUsed by the Lazywriter background worker to indicate it is sleeping waiting to wake up and check for work to doYou should be able to safely ignore this one. The wait times will appear to “cycle” as LazyWriter is designed to sleep and wake-up every 1 second. Appears as LZW_SLEEP in Xevent
LOGBUFFERTransaction LogUsed to indicate a worker thread is waiting for a log buffer to write log blocks for a transactionThis is typically a symptom of I/O bottlenecks because other workers waiting on WRITELOG will hold on to log blocks. Look for WRITERLOG waiters and if found the overall problem is I/O bottleneck on the storage system associated with the transaction log
RESOURCE_SEMAPHOREQueryUsed to indicate a worker is waiting to be allowed to perform an operation requiring “query memory” such as hashes and sortsHigh wait times indicate too many queries are running concurrently that require query memory. Operations requiring query memory are hashes and sorts. Use DMVs such as dm_exec_query_resource_semaphores and dm_exec_query_memory_grants
SOS_SCHEDULER_YIELDSQLOSUsed to indicate a worker has yielded to let other workers run on a schedulerThis wait is simply an indication that a worker yielded for someone else to run. High wait counts with low wait times usually mean CPU bound queries. High wait times here could be non-yielding problems
THREADPOOLSQLOSIndicates a wait for a  task to be assigned to a worker threadLook for symptoms of high blocking or contention problems with many of the workers especially if the wait count and times are high. Don’t jump to increase max worker threads especially if you use default setting of 0. This wait type will not show up in sys.dm_exec_requests because it only occurs when the task is waiting on a worker thread. You must have a worker to become a request. Furthermore, you may not see this “live” since there may be no workers to process tasks for logins or for queries to look at DMVs.
WRITELOGI/OIndicates a worker thread is waiting for LogWriter to flush log blocks.High waits and wait times indicate an I/O bottleneck on the storage system associated with the transaction log

The next posts I will show an example for each wait in the list.


Difference between Lock and Latch

Locks everywhere is a good start to understand how SQL Server provides logical consistency. Every operation has a lock and latch.

What does Latch mean? Latch protects memory on Buffer Pool, is a method that provides physical consistency.

SQL Server does operations in memory, that means, it read the page from disk and put that page on buffer pool to work there. If someone tries to update the data, the page is changed in memory and SQL Server writes the change in the transaction log file. (More about logging)

Basically, Latches are physical locks and hold the lock only for the duration of the physical operation, while the Locks are logical and maintain the lock until the transaction finishes. Both types guarantee data consistency.


Wait Statistics

The first post about waits on SQL Server was regarding what SQL Server Wait means, a brief explanation and the concept might be difficult to catch at first. When the task needs to wait for a resource, it is placed on a list until the resource is available.

SQL Server keeps detailed internal records of how long the tasks spend wainting and the types of resources they are waiting. Wait statistics information can be useful resource for troubleshooting performance problems. I like to start with wait stats information to understand the environment I’m working.

To see the Wait stats details, you can query the following system views:

sys.dm_os_waiting_tasks -> Returns information about the wait queue 
of tasks that are waiting on some resource
sys_dm_os_wait_stats -> Returns information about all the waits 
encountered by threads that executed. 
You can use this aggregated view to diagnose performance issues with 
SQL Server and also with specific queries and batches. 
sys.dm_exec_session_wait_stats (Transact-SQL)

I have scripts on Git I gatter on internet to query wait stats in a easy way to read.

Wait wait wait…

Let’s talk about why we have to wait and how to understand the wait types.

Paul Randal in his post Wait statistics, or please tell me where it hurts said:

A thread is using the CPU (called RUNNING) until it needs to wait for a resource. It then moves to an unordered list of threads that are SUSPENDED. In the meantime, the next thread on the FIFO (first-in-first-out) queue of threads waiting for the CPU (called being RUNNABLE) is given the CPU and becomes RUNNING. If a thread on the SUSPENDED list is notified that it’s resource is available, it becomes RUNNABLE and is put on the bottom of the RUNNABLE queue. Threads continue this clockwise movement from RUNNING to SUSPENDED to RUNNABLE to RUNNING again until the task is completed

That’s explain a lot, because the SQL Server threads doesn’t run all in the same time. A good example is when our query is doing physical reads. The IO subsystem is the slowest part of our resources and probably will take some time if the query is reading gigabytes of data.


After the CPU request the data from the disk, the disk will run for it, but before send the data back. All data need to go to memory first and that may don’t have the necessary space.  The thread is going to wait until some resources been released first. There are many scenarios, for example, how many threads are running this query? How long will take to the application to show that data?

So, every time a thread needs to wait for a resource it will increase a wait time type, such as PAGEIOLATCH_XX , PAGELATCH_XX, ASYNC_NETWORK_IO, CXPACKET, RESOURCE_SEMAPHORE. I will talk more about waits in the next posts.