Database migration methods

migration-logo-2389341When thinking about migration the most common issue is downtime. There are some methods that require downtime like backup/restore or less downtime using Log Shipping.

You can choose several different methods to migrate your database.  For example, methods that require downtime include:

  • Backup the database, manually copy the backup file to the machine and then restore the database there. This method is simple and the use of compression minimizes the time that is required.
  • Perform a backup to an Azure Blob storage account, and then restore the database from that backup. This method removes the necessity to manually copy the backup file.
  • Detach the database, copy the files to an Azure Blob storage and then attach them to the SQL Server instance. Use this method if you plan to store database files in Azure Blob storage permanently instead of on a hard disk.
  • Log Shipping, use a backup/restore process and the downtime depends on the transaction log file backup copying a small file in the process. This method is simple and is necessary a connection between the machines and permission to copy the backup files.

Methods that not require downtime:

  • AlwaysOn, adding a new secondary replica after replication has completed, you can failover to make the machine the primary replica. For Azure virtual machine use the Add Azure Replica Wizard.
  • Transactional Replication will minimize downtime, but don’t have an AlwaysOn deployment in your SQL Server system.

If you need help to decide which method is better for your scenario, feel free to contact me.

SqlPackage a tool to import/export SQL Server and Azure SQL DB

repair_database-512SqlPackage is a command-line utility that you can use for exporting and importing operations in both on-premises SQL Server databases and in cloud databases. SqlPackage supports the following operations:

  • Extract. Creates a database snapshot DACPAC file from a SQL Server database or from Azure SQL Database.
  • Publish. Updates the schema in a live database to match the schema in a DACPAC
    file. If the database does not exist on the destination server, the publish operation
    creates it.
  • Export. Exports both schema and data from a SQL Server database or from Azure SQL Database into a BACPAC file.
  • Import. Imports the schema and data from a BACPAC into a new database.
  • DeployReport. Creates an XML report that describes the changes that would be made by a publish operation.
  • DriftReport. Creates an XML report of the changes that have been made to a registered database.
  • Script. Creates a Transact-SQL script that you can use to update the schema of a target database to match the schema of a source database.
    Use the /Action: or /a parameter to specify which action to execute.

SqlPackage example to import from bacpac file to Azure
sqlpackage.exe /Action:Import /tsn:tcp:.database.windows.net,1433 /tdn: /tu: /tp: /sf: /p:DatabaseEdition=Premium /p:DatabaseServiceObjective=P4 /p:Storage=File

More about SqlPackage parameters:
https://msdn.microsoft.com/en-us/library/hh550080%28v=vs.103%29.aspx?f=255&MSPPError=-2147217396

SQL Operations Studio Dashboards

SQL Operations Studio is a free tool that runs on Windows, macOS, and Linux, for managing SQL Server databases.

I have a great experience running this tool on Linux CentOS and you can build server and database management dashboards  like below.dashboard02dashboard03

To create custom dashboards check out my tip https://www.mssqltips.com/sqlservertip/5434/customize-sql-operations-studio-dashboards/

 

What is Stretch Database?

Stretch Database is a feature of SQL Server where data can be split between on-premises storage and cloud storage. With Stretch Database, cold, historical data is kept in the cloud and active data is kept on-premises for maximum performance.

Stretch Database requires no changes to client applications or existing Transact-SQL queries, so you can implement it seamlessly for existing applications. Stretch Database can reduce on-premises storage requirements both for data and associated backups. Backups of on-premises data are smaller and therefore run quicker than standard backups. Data in the cloud is backed up automatically.

With Stretch Database, cold historic data remains available for users to query, although there might be a small amount of additional latency associated with queries.

img2

Implement a Stretch Database
You can implement Stretch Database entirely within SQL Server Management Studio; you do not need to pre-configure servers or storage within Microsoft Azure.
Implementing Stretch Database involves the following steps:

  1. Start Microsoft SQL Server Management Studio and connect to the instance of SQL Server.
  2. In Object Explorer, expand Databases.
  3. Right-click the database, point to Tasks, point to Stretch, and then click Enable.

scretch

scretch2

scretch3scretch4scretch5Complete the steps in the Enable Database for Stretch wizard to create a  Database Master Key; identify the appropriate tables and configure the Microsoft Azure deployment.

After implementing Stretch Database, you can monitor it from SQL Server  Management Studio.

In Object Explorer, expand Databases, right-click the stretch-enabled database, point to Tasks, point to Stretch, and then click Monitor to open the Stretch Database Monitor. This monitor shows information about both the local and Azure SQL instances, along with data migration status.

Moving the msdb, model, and tempdb databases files

All system databases, except the resource database, can be moved to new locations to help balance I/O load.

To move the msdb, model, and tempdb databases, perform the following steps:

  • For each file to be moved, execute the ALTER DATABASE … MODIFY FILE statement.
  • Stop the instance of SQL Server.
  • Move the files to the new location (this step is not necessary for tempdb, as its files are recreated automatically on startup).
  • Restart the instance of SQL Server.

The process for moving the master database is different from the process for other databases. To move the master database, perform the following steps:

  • Open SQL Server Configuration Manager.
  • In the SQL Server Services node, right-click the instance of SQL Server, click Properties, and then click the Startup Parameters tab.
  • Edit the Startup Parameters values to point to the planned location for the master database data (-d parameter) and log (-l parameter) files.
  • Stop the instance of SQL Server.
  • Move the master.mdf and mastlog.ldf files to the new location.
  • Restart the instance of SQL Server

 

Understanding parallelism on SQL Server

Parallelism refers to multiple processors cooperating to execute a single query at the same time.

Parallel execution involves the overhead of synchronizing and monitoring the tasks. So, that’s why parallel plans are considered expensive operations.

The query optimizer determines whether a parallel plan should be used based on the configuration and the query cost.

  • A configuration value, max degree of parallelism determines how many CPUs can be used to execute a query. Also, using query hint can set how many CPUs can be used for that specific query.
  • Cost threshold for parallelism determines the cost that a query must meet before a parallel query plan will even be considered. Query cost is determined based on the amount of data the query optimizer estimates to be read to complete the operation.

In earlier versions of SQL Server, it was common to disable parallel queries on systems that were primarily used for transaction processing. It’s common to see MAXDOP 1 for OLTP environments. I prefer to raise the cost threshold for parallelism so a parallel plan is only considered for higher cost queries.

Let’s see an example.

I executed the query below and I got parallelism for that. My query cost was 374.148, remember this cost is always estimated.

SELECT * FROM dbo.FactInternetSales AS FIS
LEFT JOIN dbo.DimCustomer AS DC ON DC.CustomerKey = FIS.CustomerKey
LEFT JOIN dbo.DimProduct AS DP ON DP.ProductKey = FIS.ProductKey
LEFT JOIN dbo.DimProductSubcategory AS DPS ON DPS.ProductSubcategoryKey = DP.ProductSubcategoryKey
ORDER BY DP.StandardCost
OPTION(RECOMPILE)

parallel

Increasing the cost threshold for parallelism value to 400 you are not going to see any change in the query, because the real cost isn’t 374, that cost is for only 1 thread.

In my case the max degree of parallelism value is 2, so multiplying 374 x 2 = 748

sp_configure

Setting the value to 800 and run the same query again you can see no parallelism and the cost is 691.208, so it fits in the cost value.

parallel2

If your CXPACKET wait counter is high, try to increase the cost threshold for parallelism and not set MAXDOP to 1 because some queries even in a transaction database will have the benefit to run in parallel.

Sending files to AWS S3 using Powershell

Amazon has a Powershell module to manage the the principal services available. I’ve been working with EC2, RDS and S3 and I wrote a tip to fast copy data to S3 and I created the function bellow to help to send files to S3.

I’m using the function bellow to send my backups to S3. It’s configured to send all files in the paths passed by parameter

Import-Module AWSPowerShell

# Author: Douglas Correa

function Send-S3Files {
 <#
 .SYNOPSIS
 Send the files from a local path to AWS S3
 .DESCRIPTION
 The function will copy all files from a specific path to AWS S3
 .EXAMPLE
 Send-S3Files -BucketName 'backups' -Region 'sa-east-1' -AKey '####' -SKey '####' -LocalSource 'c:\temp' , 'd:\backups' 
 #>
 [CmdletBinding()]
 Param (
       [Parameter(Mandatory=$True, ValueFromPipeline=$False, HelpMessage='Name of the bucket in AWS')][string]$BucketName
     , [Parameter(Mandatory=$True, ValueFromPipeline=$False, HelpMessage='Region used in AWS')][string]$Region
     , [Parameter(Mandatory=$True, HelpMessage='AWS access key')][string]$Akey
     , [Parameter(Mandatory=$True, HelpMessage='AWS secret key')][string]$SKey
     , [Parameter(Mandatory=$True, ValueFromPipeline=$False, HelpMessage='Local machine paths')][string[]]$LocalSource
 )

    process {

        Initialize-AWSDefaultConfiguration -AccessKey $AKey -SecretKey $SKey -Region $region

        foreach($source in $sources) {
            Set-Location $source
            $files = Get-ChildItem '*.*' | Select-Object -Property Name #get all files in the folder

            try {
            if(Test-S3Bucket -BucketName $bucket) {
                foreach($file in $files) {
                    if(!(Get-S3Object -BucketName $bucket -Key $file.Name)) {
                        Write-Host "Copying file : $file "
                        Write-S3Object -BucketName $bucket -File $file.Name -Key $file.Name -CannedACLName private -region $region
                    } 
                }
            } Else {
                Write-Host "The bucket $bucket does not exist."
            }
        } catch {
            Write-Host "Error uploading file $file"
            $Error
        }
    }
}

}

 

SQL Server threads architecture

execution-pipeline

SQL Server retrieves threads from Windows. The SQL Server configuration setting max worker threads (set at instance level) determines how many threads will be retrieved. SQL Server has its own internal scheduling system, independent of the scheduling performed by the operating system.

Instead of using Windows threads directly, SQL Server creates a pool of worker threads that are mapped to Windows threads whenever work needs to be performed.
When a SQL Server component needs to execute code, the component creates a task that represents the unit of work to be done.

For example, if you send a batch of Transact-SQL commands to the server, it’s
likely that the batch will be executed within a task.
When a SQL Server component creates a task, it is assigned the next available worker thread that is not in use. If no worker threads are available, SQL Server will try to retrieve another Windows thread, up to the point that the max worker threads configuration limit is reached.

At that point, the new task would need to wait to get a worker thread. THREADPOOL wait type is when there are no available threads in the server’s thread pool, during parallelism multiple worker threads are required. So, this wait type ir related with CXPACKET.

I suggest reducing parallelism not setting MAXDOP 1. Try to increase Cost threshold for parallelism if the setting is too low (default is 5) and analyze the queries with parallelism, see the indexes and statistics to understand if the query optimizer is estimating right.

 

How SQL Server stores data?

SQL Server stores data in heaps or b-tree structures. Heaps are unordered set and balanced trees are ordered by their keys.

Heaps and b-tree use collection of pages within the structure and it’s called allocation units.

  • IN_ROW_DATA -> contains all data.
  • LOB_DATA -> structure for large objects used to stored in xml, varchar(max), nvarchar(max), varbinary(max) data types.
  • ROW_OVERFLOW_DATA -> when the data type is variable like varchar and nvarchar exceed 8,060 bytes size limit per row.

To see this information is used the system view allocation_units.

allocation_units

SELECT OBJECT_NAME(I.object_id)
 , I.name
 , AU.total_pages
 , AU.used_pages
 , AU.data_pages
 , P.rows
 , AU.type_desc
 FROM sys.allocation_units AS AU
 INNER JOIN sys.partitions AS P ON AU.container_id = P.partition_id
 INNER JOIN sys.indexes AS I ON I.index_id = P.index_id AND I.object_id = P.object_id
 WHERE P.object_id = OBJECT_ID('TabelaDeTamanhoVariavel')

 

Column name Data type Description
allocation_unit_id bigint ID of the allocation unit. Is unique within a database.
type tinyint Type of allocation unit:

0 = Dropped

1 = In-row data (all data types, except LOB data types)

2 = Large object (LOB) data (text, ntext, image, xml, large value types, and CLR user-defined types)

3 = Row-overflow data

type_desc nvarchar(60) Description of the allocation unit type:

  • DROPPED
  • IN_ROW_DATA
  • LOB_DATA
  • ROW_OVERFLOW_DATA
container_id bigint ID of the storage container associated with the allocation unit.

If type = 1 or 3, container_id = sys.partitions.hobt_id.

If type is 2, then container_id = sys.partitions.partition_id.

0 = Allocation unit marked for deferred drop

data_space_id int ID of the filegroup in which this allocation unit resides.
total_pages bigint Total number of pages allocated or reserved by this allocation unit.
used_pages bigint Number of total pages actually in use.
data_pages bigint Number of used pages that have:

  • In-row data
  • LOB data
  • Row-overflow data

Value returned excludes internal index pages and allocation-management pages.

 

Locks everywhere

LOCKSIn this post I’m going to talk about locks on SQL Server. Locks are necessary, they are used in all operations in the database. Don’t get confused about blocking, locking and blocking are totally different.

When we talk about lock, doing something in the database, like an update and select though will cause a type of lock. The select stantement has a lock operation called shared lock. This means you can share reads with someone else and that may not cause blocks.

SQL Server has different kinds of lock modes, such as (S) Shared, (U) Update, (X) Exclusive, (I) Intent (Sch) Schema, Bulk Update and Key-Range.

  • (S) Shared lock is used in read operations.
  • (U) Update to avoid potential deadlock problem.
  • (X) Exclusive prevent access to a resource by concurrent transactions.
  • (I) Intent prevent other transactions from modifying the higher-level resource and improve the efficiency of the Database Engine in detecting lock conflicts at the higher level.
  • (Sch) Schema uses schema modification (Sch-M) locks during a table data definition language (DDL) operation.

The following table shows the compatibility of the most commonly encountered lock modes.

Existing granted mode
Requested mode IS S U IX SIX X
Intent shared (IS) Yes Yes Yes Yes Yes No
Shared (S) Yes Yes Yes No No No
Update (U) Yes Yes No No No No
Intent exclusive (IX) Yes No No Yes No No
Shared with intent exclusive (SIX) Yes No No No No No
Exclusive (X) No No No No No No