Difference between Lock and Latch

Locks everywhere is a good start to understand how SQL Server provides logical consistency. Every operation has a lock and latch.

What does Latch mean? Latch protects memory on Buffer Pool, is a method that provides physical consistency.

SQL Server does operations in memory, that means, it read the page from disk and put that page on buffer pool to work there. If someone tries to update the data, the page is changed in memory and SQL Server writes the change in the transaction log file. (More about logging)

Basically, Latches are physical locks and hold the lock only for the duration of the physical operation, while the Locks are logical and maintain the lock until the transaction finishes. Both types guarantee data consistency.

LATCH

Sending files to AWS S3 using Powershell

Amazon has a Powershell module to manage the the principal services available. I’ve been working with EC2, RDS and S3 and I wrote a tip to fast copy data to S3 and I created the function bellow to help to send files to S3.

I’m using the function bellow to send my backups to S3. It’s configured to send all files in the paths passed by parameter

Import-Module AWSPowerShell

# Author: Douglas Correa

function Send-S3Files {
 <#
 .SYNOPSIS
 Send the files from a local path to AWS S3
 .DESCRIPTION
 The function will copy all files from a specific path to AWS S3
 .EXAMPLE
 Send-S3Files -BucketName 'backups' -Region 'sa-east-1' -AKey '####' -SKey '####' -LocalSource 'c:\temp' , 'd:\backups' 
 #>
 [CmdletBinding()]
 Param (
       [Parameter(Mandatory=$True, ValueFromPipeline=$False, HelpMessage='Name of the bucket in AWS')][string]$BucketName
     , [Parameter(Mandatory=$True, ValueFromPipeline=$False, HelpMessage='Region used in AWS')][string]$Region
     , [Parameter(Mandatory=$True, HelpMessage='AWS access key')][string]$Akey
     , [Parameter(Mandatory=$True, HelpMessage='AWS secret key')][string]$SKey
     , [Parameter(Mandatory=$True, ValueFromPipeline=$False, HelpMessage='Local machine paths')][string[]]$LocalSource
 )

    process {

        Initialize-AWSDefaultConfiguration -AccessKey $AKey -SecretKey $SKey -Region $region

        foreach($source in $sources) {
            Set-Location $source
            $files = Get-ChildItem '*.*' | Select-Object -Property Name #get all files in the folder

            try {
            if(Test-S3Bucket -BucketName $bucket) {
                foreach($file in $files) {
                    if(!(Get-S3Object -BucketName $bucket -Key $file.Name)) {
                        Write-Host "Copying file : $file "
                        Write-S3Object -BucketName $bucket -File $file.Name -Key $file.Name -CannedACLName private -region $region
                    } 
                }
            } Else {
                Write-Host "The bucket $bucket does not exist."
            }
        } catch {
            Write-Host "Error uploading file $file"
            $Error
        }
    }
}

}

 

SQL Server threads architecture

execution-pipeline

SQL Server retrieves threads from Windows. The SQL Server configuration setting max worker threads (set at instance level) determines how many threads will be retrieved. SQL Server has its own internal scheduling system, independent of the scheduling performed by the operating system.

Instead of using Windows threads directly, SQL Server creates a pool of worker threads that are mapped to Windows threads whenever work needs to be performed.
When a SQL Server component needs to execute code, the component creates a task that represents the unit of work to be done.

For example, if you send a batch of Transact-SQL commands to the server, it’s
likely that the batch will be executed within a task.
When a SQL Server component creates a task, it is assigned the next available worker thread that is not in use. If no worker threads are available, SQL Server will try to retrieve another Windows thread, up to the point that the max worker threads configuration limit is reached.

At that point, the new task would need to wait to get a worker thread. THREADPOOL wait type is when there are no available threads in the server’s thread pool, during parallelism multiple worker threads are required. So, this wait type ir related with CXPACKET.

I suggest reducing parallelism not setting MAXDOP 1. Try to increase Cost threshold for parallelism if the setting is too low (default is 5) and analyze the queries with parallelism, see the indexes and statistics to understand if the query optimizer is estimating right.

 

How SQL Server stores data?

SQL Server stores data in heaps or b-tree structures. Heaps are unordered set and balanced trees are ordered by their keys.

Heaps and b-tree use collection of pages within the structure and it’s called allocation units.

  • IN_ROW_DATA -> contains all data.
  • LOB_DATA -> structure for large objects used to stored in xml, varchar(max), nvarchar(max), varbinary(max) data types.
  • ROW_OVERFLOW_DATA -> when the data type is variable like varchar and nvarchar exceed 8,060 bytes size limit per row.

To see this information is used the system view allocation_units.

allocation_units

SELECT OBJECT_NAME(I.object_id)
 , I.name
 , AU.total_pages
 , AU.used_pages
 , AU.data_pages
 , P.rows
 , AU.type_desc
 FROM sys.allocation_units AS AU
 INNER JOIN sys.partitions AS P ON AU.container_id = P.partition_id
 INNER JOIN sys.indexes AS I ON I.index_id = P.index_id AND I.object_id = P.object_id
 WHERE P.object_id = OBJECT_ID('TabelaDeTamanhoVariavel')

 

Column name Data type Description
allocation_unit_id bigint ID of the allocation unit. Is unique within a database.
type tinyint Type of allocation unit:

0 = Dropped

1 = In-row data (all data types, except LOB data types)

2 = Large object (LOB) data (text, ntext, image, xml, large value types, and CLR user-defined types)

3 = Row-overflow data

type_desc nvarchar(60) Description of the allocation unit type:

  • DROPPED
  • IN_ROW_DATA
  • LOB_DATA
  • ROW_OVERFLOW_DATA
container_id bigint ID of the storage container associated with the allocation unit.

If type = 1 or 3, container_id = sys.partitions.hobt_id.

If type is 2, then container_id = sys.partitions.partition_id.

0 = Allocation unit marked for deferred drop

data_space_id int ID of the filegroup in which this allocation unit resides.
total_pages bigint Total number of pages allocated or reserved by this allocation unit.
used_pages bigint Number of total pages actually in use.
data_pages bigint Number of used pages that have:

  • In-row data
  • LOB data
  • Row-overflow data

Value returned excludes internal index pages and allocation-management pages.

 

Wait Statistics

The first post about waits on SQL Server was regarding what SQL Server Wait means, a brief explanation and the concept might be difficult to catch at first. When the task needs to wait for a resource, it is placed on a list until the resource is available.

SQL Server keeps detailed internal records of how long the tasks spend wainting and the types of resources they are waiting. Wait statistics information can be useful resource for troubleshooting performance problems. I like to start with wait stats information to understand the environment I’m working.

To see the Wait stats details, you can query the following system views:

sys.dm_os_waiting_tasks -> Returns information about the wait queue 
of tasks that are waiting on some resource
sys_dm_os_wait_stats -> Returns information about all the waits 
encountered by threads that executed. 
You can use this aggregated view to diagnose performance issues with 
SQL Server and also with specific queries and batches. 
sys.dm_exec_session_wait_stats (Transact-SQL)

I have scripts on Git I gatter on internet to query wait stats in a easy way to read.