How update works in transactional replication?

In a post about how update works I showed what happened when run an update with the same values. SQL Server is smart enough to see that and not changing anything and register minimum log.

Another day I saw a comment to avoid updating records when none of the values are changing.

Avoid updating records when none of the values are changing. This still does a write and, if the table is in replication or change tracking, still causes the row to be propagated out to other servers. If you are updating a potentially large number of records, make sure to only update the ones where the new value doesn’t equal the old value.

Let’s see how the update behavior when update 10 millions rows but without any change and see if will be any row propagation to another server.

objectexplorer

My Lab contains two servers and a demo database replicated from SQL01 to SQL02.

It’s configured a transaction replication to send the articles and I’m using a table created with the code:

CREATE TABLE tblMillionsRows (
    id BIGINT NOT NULL IDENTITY PRIMARY KEY,
    largeColumn NVARCHAR(MAX) DEFAULT REPLICATE('TESTE',100),
    smallColumn NVARCHAR(150) DEFAULT 'DEMO',
    dateColumn DATETIME
        DEFAULT GETDATE()
);
GO

Inserting rows:

INSERT INTO dbo.tblMillionsRows (
    largeColumn,
    smallColumn,
    dateColumn
)
DEFAULT VALUES
GO 2000000

It will take a while to insert all rows. The next step is add this table in the replication.

DECLARE @publication    AS sysname;
DECLARE @table AS sysname;
DECLARE @filterclause AS nvarchar(500);
DECLARE @filtername AS nvarchar(386);
DECLARE @schemaowner AS sysname;
SET @publication = N'SQL01_demo_tb01'; 
SET @table = N'tblMillionsRows';
SET @schemaowner = N'dbo';

EXEC sp_addarticle 
	@publication = @publication, 
	@article = @table, 
	@source_object = @table,
	@source_owner = @schemaowner, 
	@schema_option = 0x80030F3,
	@vertical_partition = N'true', 
	@type = N'logbased',
	@filter_clause = @filterclause;

EXEC sp_articlecolumn 
	@publication = @publication, 
	@article = @table;

EXEC sp_startpublication_snapshot 
	@publication = 'SQL01_demo_tb01', 
	@publisher =  'SQL01'   

With all set, let’s do some tests. First updating the heap table and seeing if something is replicated.

UPDATE tblMillionsRows SET smallColumn = N'DEMO' WHERE id < 10000
GO

After the update, I ran the script below to see the transactions and commands available to replicate and nothing changed.

SELECT * FROM distribution.dbo.MSrepl_Commands 
SELECT * FROM distribution.dbo.MSrepl_Transactions 
EXEC sp_browsereplcmds 

But, when I change one row running the update below I could see a new command to replicate in the another server.

UPDATE tblMillionsRows SET smallColumn = N'DEMO1' WHERE id < 10000
GO

replicationcommands

In conclusion, SQL Server won’t replicate updates that don’t change the value. You can see more about updates in my post.

Be careful when doing updates, because if the table is replicated and you change millions rows the transaction replication will create one command for each updated row.

 

Database migration methods

migration-logo-2389341When thinking about migration the most common issue is downtime. There are some methods that require downtime like backup/restore or less downtime using Log Shipping.

You can choose several different methods to migrate your database.  For example, methods that require downtime include:

  • Backup the database, manually copy the backup file to the machine and then restore the database there. This method is simple and the use of compression minimizes the time that is required.
  • Perform a backup to an Azure Blob storage account, and then restore the database from that backup. This method removes the necessity to manually copy the backup file.
  • Detach the database, copy the files to an Azure Blob storage and then attach them to the SQL Server instance. Use this method if you plan to store database files in Azure Blob storage permanently instead of on a hard disk.
  • Log Shipping, use a backup/restore process and the downtime depends on the transaction log file backup copying a small file in the process. This method is simple and is necessary a connection between the machines and permission to copy the backup files.

Methods that not require downtime:

  • AlwaysOn, adding a new secondary replica after replication has completed, you can failover to make the machine the primary replica. For Azure virtual machine use the Add Azure Replica Wizard.
  • Transactional Replication will minimize downtime, but don’t have an AlwaysOn deployment in your SQL Server system.

If you need help to decide which method is better for your scenario, feel free to contact me.

What is Stretch Database?

Stretch Database is a feature of SQL Server where data can be split between on-premises storage and cloud storage. With Stretch Database, cold, historical data is kept in the cloud and active data is kept on-premises for maximum performance.

Stretch Database requires no changes to client applications or existing Transact-SQL queries, so you can implement it seamlessly for existing applications. Stretch Database can reduce on-premises storage requirements both for data and associated backups. Backups of on-premises data are smaller and therefore run quicker than standard backups. Data in the cloud is backed up automatically.

With Stretch Database, cold historic data remains available for users to query, although there might be a small amount of additional latency associated with queries.

img2

Implement a Stretch Database
You can implement Stretch Database entirely within SQL Server Management Studio; you do not need to pre-configure servers or storage within Microsoft Azure.
Implementing Stretch Database involves the following steps:

  1. Start Microsoft SQL Server Management Studio and connect to the instance of SQL Server.
  2. In Object Explorer, expand Databases.
  3. Right-click the database, point to Tasks, point to Stretch, and then click Enable.

scretch

scretch2

scretch3scretch4scretch5Complete the steps in the Enable Database for Stretch wizard to create a  Database Master Key; identify the appropriate tables and configure the Microsoft Azure deployment.

After implementing Stretch Database, you can monitor it from SQL Server  Management Studio.

In Object Explorer, expand Databases, right-click the stretch-enabled database, point to Tasks, point to Stretch, and then click Monitor to open the Stretch Database Monitor. This monitor shows information about both the local and Azure SQL instances, along with data migration status.

Moving the msdb, model, and tempdb databases files

All system databases, except the resource database, can be moved to new locations to help balance I/O load.

To move the msdb, model, and tempdb databases, perform the following steps:

  • For each file to be moved, execute the ALTER DATABASE … MODIFY FILE statement.
  • Stop the instance of SQL Server.
  • Move the files to the new location (this step is not necessary for tempdb, as its files are recreated automatically on startup).
  • Restart the instance of SQL Server.

The process for moving the master database is different from the process for other databases. To move the master database, perform the following steps:

  • Open SQL Server Configuration Manager.
  • In the SQL Server Services node, right-click the instance of SQL Server, click Properties, and then click the Startup Parameters tab.
  • Edit the Startup Parameters values to point to the planned location for the master database data (-d parameter) and log (-l parameter) files.
  • Stop the instance of SQL Server.
  • Move the master.mdf and mastlog.ldf files to the new location.
  • Restart the instance of SQL Server

 

How SQL Server stores data?

SQL Server stores data in heaps or b-tree structures. Heaps are unordered set and balanced trees are ordered by their keys.

Heaps and b-tree use collection of pages within the structure and it’s called allocation units.

  • IN_ROW_DATA -> contains all data.
  • LOB_DATA -> structure for large objects used to stored in xml, varchar(max), nvarchar(max), varbinary(max) data types.
  • ROW_OVERFLOW_DATA -> when the data type is variable like varchar and nvarchar exceed 8,060 bytes size limit per row.

To see this information is used the system view allocation_units.

allocation_units

SELECT OBJECT_NAME(I.object_id)
 , I.name
 , AU.total_pages
 , AU.used_pages
 , AU.data_pages
 , P.rows
 , AU.type_desc
 FROM sys.allocation_units AS AU
 INNER JOIN sys.partitions AS P ON AU.container_id = P.partition_id
 INNER JOIN sys.indexes AS I ON I.index_id = P.index_id AND I.object_id = P.object_id
 WHERE P.object_id = OBJECT_ID('TabelaDeTamanhoVariavel')

 

Column name Data type Description
allocation_unit_id bigint ID of the allocation unit. Is unique within a database.
type tinyint Type of allocation unit:

0 = Dropped

1 = In-row data (all data types, except LOB data types)

2 = Large object (LOB) data (text, ntext, image, xml, large value types, and CLR user-defined types)

3 = Row-overflow data

type_desc nvarchar(60) Description of the allocation unit type:

  • DROPPED
  • IN_ROW_DATA
  • LOB_DATA
  • ROW_OVERFLOW_DATA
container_id bigint ID of the storage container associated with the allocation unit.

If type = 1 or 3, container_id = sys.partitions.hobt_id.

If type is 2, then container_id = sys.partitions.partition_id.

0 = Allocation unit marked for deferred drop

data_space_id int ID of the filegroup in which this allocation unit resides.
total_pages bigint Total number of pages allocated or reserved by this allocation unit.
used_pages bigint Number of total pages actually in use.
data_pages bigint Number of used pages that have:

  • In-row data
  • LOB data
  • Row-overflow data

Value returned excludes internal index pages and allocation-management pages.

 

Memory-optimized Logging

In a previous post I talked about transaction log works, and what about using a memory-optimized table?

SQL Server has the feature Memory-Optimized Objects to improve performance. In-memory nonclustered indexes are implemented using a data structure called a Bw-Tree. A Bw-Tree is a lock and latch-free variation of a B-Tree.

In-memory architecture:inmemory

To enable an application to use In-Memory OLTP, you need to complete the following tasks:

  • Create a memory-optimized data filegroup and add a container to the filegroup.
  • Create memory-optimized tables and indexes.
  • Load data into the memory-optimized table and update statistics after loading the data and before creating the compiled stored procedures.
  • Create natively compiled stored procedures to access data in memory-optimized tables. You can also use a traditional, interpreted Transact-SQL to access data in memory-optimized tables.

  • As needed, migrate data from existing tables to memory-optimized tables.

In this new architecture, let’s see what happens in the logging level.

I created a table using the code:

CREATE TABLE InMemoryTable (
 ID INT IDENTITY NOT NULL PRIMARY KEY NONCLUSTERED
 , FIRST_NAME VARCHAR(100)
) WITH (MEMORY_OPTIMIZED=ON, DURABILITY=SCHEMA_ONLY)

As you can see, the first part is equal to a normal table and to create the in-memory table we use MEMORY_OPTIMIZED and the DURABILITY. In this case, I’d like you to pay attention to the DURABILITY configured to SCHEMA_ONLY.

Let’s take a look in the transaction log after create the table.inmemotable

We can see all the sys changes to create the table. Now, inserting a row in this table and see the log again:

INSERT INTO InMemoryTable VALUES('Douglas Correa')
GO

inmemotable

You can see nothing changed, but where’s my data? The data is there in the table but only in memory. As you can imagine, both the logging and saving the data to disk are expensive operations.

That means with DURABILITY schema_only the data won’t be there after a crash or restart the server. Changing that for SCHEMA_AND_DATA and look in the log file we are going to see the log operation when inserting data.

inmemotable2

As you can see, the log operation is LOP_HK, the row is in the log in case of crash SQL Server can redo.

Conclusion

The memory-optimized table is fast and can improve performance especially if don’t need to save the data, but there are limitations and one of limitation I didn’t like was I can’t detach and attach the database recreating the log file.

 

Data types

Data types and Precedence of convert types

SQL Server associates columns, expressions, variables, and parameters with data types. Data types determine what kind of data can be stored in the field: Integers, characters, dates, money, binary strings, etc.

SQL Server supplies several built-in data types but you can also define custom types

Built-in data types are categorized as shown in the table below, also you can see the precedence of convert to other data type. I mean when an operator combines two expressions of different data types, the rules for data type precedence specify that the data type with the lower precedence is converted to the data type with the higher precedence.

IC254199

SQL Server uses the following precedence order for data types:

  1. user-defined data types (highest)
  2. sql_varian t
  3. xml
  4. datetimeoffset
  5. datetime2
  6. datetime
  7. smalldatetime
  8. date
  9. time
  10. float
  11. real
  12. decimal
  13. money
  14. smallmoney
  15. bigint
  16. int
  17. smallint
  18. tinyint
  19. bit
  20. ntext
  21. text
  22. image
  23. timestamp
  24. uniqueidentifier
  25. nvarchar (including nvarchar(max) )
  26. nchar
  27. varchar (including varchar(max) )
  28. char
  29. varbinary (including varbinary(max) )
  30. binary (lowest)

 

 

Storage – Part II

Logical Structures
How, where, when you need to think in data storage? Well, this is the first step after you modeling your database and most companies do not think about it, I have saw companies with a large data into one disk or small data separated in the wrong way, but what is the best to do?

We have some best practices and you can read more about logical structures in SQL Server called filegroups in MSDN page. Every DBA needs to know to create and maintain filegroups because they are part of every SQL Server database. Filegroups affect the performance, maintenance, and security of your data and they are logical structures to group files together.filegoups

At a minimum, every SQL Server database has two operating system files: a data file and a log file. Data files contain data and objects such as tables, indexes, stored procedures, and views. Log files contain the information that is required to recover all transactions in the database. Data files can be grouped together in filegroups for allocation and administration purposes.

Filegroups can be created when the database is first created or created later when more files are added to the database. However, you cannot move files to a different filegroup after the files have been added to the database.

A file cannot be a member of more than one filegroup. Tables, indexes, and large object (LOB) data can be associated with a specific filegroup. This means that all their pages are allocated from the files in that filegroup.

Why use filegroups?

  • you have one or more objects tha have heavy read/write activity
  • you’ve already tuned through indexes and query writing
  • you need a performance boost
  • you have additional storage you can utilize to separate the objects
  • you want to separate tha data so administration tasks such as backups take less time
  • you have a large database (>1TB)
  • disaster and recovery

Best Practices

  • separate data and log files onto separate disks
  • separate tempdb onto its own disk
  • at least two filegroups – primary and one user-defined (default)
  • files in a filegroup should be equally sized for equal proportion of writes
  • use filegroups to isolate objects with heavy read and write activity from each other

Storage – Part I

Pages and Extends Architecture

Bb497064.ors05_01_big
Data Structure SQL Server

The page is the fundamental unit of data storage in SQL Server. An extent is a collection of eight physically contiguous pages. Extents help efficiently manage pages.

Understanding the architecture of pages and extents is important for designing and developing databases that perform efficiently.

The fundamental unit of data storage in SQL Server is the page. The disk space allocated to a data file (.mdf or .ndf) in a database is logically divided into pages numbered contiguously from 0 to n. Disk I/O operations are performed at the page level. That is, SQL Server reads or writes whole data pages.

Extents are a collection of eight physically contiguous pages and are used to efficiently manage the pages. All pages are stored in extents.

Pages

Page size is 8 KB. This means SQL Server databases have 128 pages per megabyte. Each page begins with a 96-byte header that is used to store system information about the page.

Extends

Extents are the basic unit in which space is managed. An extent is eight physically contiguous pages, or 64 KB. This means SQL Server databases have 16 extents per megabyte.

To make its space allocation efficient, SQL Server does not allocate whole extents to tables with small amounts of data. SQL Server has two types of extents:

  • Uniform extents are owned by a single object; all eight pages in the extent can only be used by the owning object.
  • Mixed extents are shared by up to eight objects. Each of the eight pages in the extent can be owned by a different object.

A new table or index is generally allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it then switches to use uniform extents for subsequent allocations. If you create an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the index are in uniform extents