SQL Server threads architecture

execution-pipeline

SQL Server retrieves threads from Windows. The SQL Server configuration setting max worker threads (set at instance level) determines how many threads will be retrieved. SQL Server has its own internal scheduling system, independent of the scheduling performed by the operating system.

Instead of using Windows threads directly, SQL Server creates a pool of worker threads that are mapped to Windows threads whenever work needs to be performed.
When a SQL Server component needs to execute code, the component creates a task that represents the unit of work to be done.

For example, if you send a batch of Transact-SQL commands to the server, it’s
likely that the batch will be executed within a task.
When a SQL Server component creates a task, it is assigned the next available worker thread that is not in use. If no worker threads are available, SQL Server will try to retrieve another Windows thread, up to the point that the max worker threads configuration limit is reached.

At that point, the new task would need to wait to get a worker thread. THREADPOOL wait type is when there are no available threads in the server’s thread pool, during parallelism multiple worker threads are required. So, this wait type ir related with CXPACKET.

I suggest reducing parallelism not setting MAXDOP 1. Try to increase Cost threshold for parallelism if the setting is too low (default is 5) and analyze the queries with parallelism, see the indexes and statistics to understand if the query optimizer is estimating right.

 

Heap Tables

What’s a heap table? I would say it’s a table without clustered index.

What’s the characteristic of a heap table? The data isn’t ordered.

What’s the consequence having a heap table? There are a few:

  1. Specific data is not retrived quickly
  2. Data pages aren’t linked, that means sequential access needs to refer to the index allocation map (IAM) pages
  3. No cost to update indexes
  4. No additional space to store clustered index

How Update works?

Update data in SQL Server is a simple task. Using the command update table set column = value where column = value. Don’t forget the where clause :).

But, what SQL Server does internally? Delete plus Insert? Modify? Well, the answer depends.

Let’s see examples how that works.

First, create a database and table for the test.

createdatebase

As you can see, I didn’t create an index so I will update a heap table. Before update the row, I will drop the log register and then update the row I want.

firstupdate

Ok, now the result of the DBCC LOG

updateCol2

As we can see, the result is a modified row in the log, what about using an index? Creating a nonclustered index in the column I’m updating.

nonclusteredindexupdate

nonclusteredindexupdateresult

The result is very different, now we can see a modified row in the heap and in the nonclustered index I got a delete/insert row. Very nice!

What if was a clustered index?

clusteredindex

I dropped the nonclustered index and created a clustered and result is:

clusteredindexresult

Again, only the modified row in the index clustered. And, if we try to update our key in the clustered index?

clusteredindexkey

clusteredindexkeyresult

I got the result with delete/insert as I was expecting because I changed the key like the nonclustered index. Now, if I try to update the row with the same value, what SQL Server will do?

noupdate

Nothing, no change at all.

Don’t shrink the trees


2000px-N-ary_to_binary.svg

The most common and widely used index that we know are nonclustered index. These indexes are created using the famous b-tree (balanced tree). B-tree is a data-structured tree where we have the root and leaves.

A nonclustered index contains the index key values and row locators that point to the storage location of the table data. Generally, nonclustered indexes should be designed to improve the performance of frequently used queries that are not covered by the clustered index.

IC88960

 

These indexes need a regular maintenance because as data is updated, inserted or deleted these indexes are updated for the new data or delete the reference of a particular deleted data.

When this happen we called fragmentation and Brent Ozar can explain more about it.

The objective here is about maintenance the index, well not exactly who to do this, but what do not do. I’ve seen in same cases a good reindex job, everything ok, tables distributed by number of rows, cluster, then no cluster, but after all of it a step with shrink.

Do not use shrink database, this is not a good idea after a reindex. What is this? What will do in the database? In MSDN page we can see this:

Shrinking data files recovers space by moving pages of data from the end of the file to unoccupied space closer to the front of the file. When enough free space is created at the end of the file, data pages at end of the file can be deallocated and returned to the file system

The process SQL uses is ugly and results in index fragmentation again and all that work to reindex was thrown away. If don’t belive me, well Paul Randal explain something in his article.