Method to update LOB data with minimal log

Use the .WRITE (expression, @Offset,@Length) clause to perform a partial or full update of varchar(max)nvarchar(max), and varbinary(max) data types.

For example, a partial update of a varchar(max) column might delete or modify only the first 200 characters of the column, whereas a full update would delete or modify all the data in the column. .WRITE updates that insert or append new data are minimally logged if the database recovery model is set to bulk-logged or simple.

UPDATE <table_or_view_name>
SET column_name.WRITE (expression, @offset, @length)
FROM <table_source>
WHERE <search_condition>

expression is the value that is copied to column_nameexpression must evaluate to or be able to be implicitly cast to the column_name type. If expression is set to NULL, @Length is ignored, and the value in column_nameis truncated at the specified @Offset.

@Offset is the starting point in the value of column_name at which expression is written. @Offset is a zero-based ordinal position, is bigint, and cannot be a negative number. If @Offset is NULL, the update operation appends expression at the end of the existing column_name value and @Length is ignored. If @Offset is greater than the length of the column_name value, the Database Engine returns an error. If @Offset plus @Length exceeds the end of the underlying value in the column, the deletion occurs up to the last character of the value. If @Offset plus LEN(expression) is greater than the underlying declared size, an error is raised.

@Length is the length of the section in the column, starting from @Offset, that is replaced by expression@Length is bigint and cannot be a negative number. If @Length is NULL, the update operation removes all data from @Offset to the end of the column_name value.

The regular update statement results in overwriting the entire string using full logging and it’s very inefficient when dealing with large value updates.

Note: The update using WRITE method will fail if the value is null.

More information about performance in my tip on MSSQLTips

 

 

Troubleshooting Transactional Replication – part 1

The common way to replicate data and objects from one server to another is using transactional replication. As you know, SQL Server processes all actions within the database using Transact-SQL statements. Each completed statement is called a transaction. In transactional replication, each committed transaction is replicated to the subscriber as it occurs.

trnsact
“Sometimes” can occur a problem and the process don’t work accordingly. The fist step I like to do is run sp_whoisactive in the servers (Publisher, Distributor and Subscriber), you can see if there are any process blocking the replication.

Next, is to see the publisher and subscriber agents, if there is any issue they will show, like for example login locked.

To have more information what’s happening I like to query the system view MSrepl_Transactions in the distribution database, this view contains one row for each replicated transaction and with that information run the procedure sp_browsereplcmds specifying the seqno start @xact_seqno_start = N’0x0000014F000034B0001C’ for example.

The procedure sp_browserreplcmds returns a record set with the commands stored in the distribution database. It show like a table like below.

2018-07-29_20-31-59

How update works in transactional replication?

In a post about how update works I showed what happened when run an update with the same values. SQL Server is smart enough to see that and not changing anything and register minimum log.

Another day I saw a comment to avoid updating records when none of the values are changing.

Avoid updating records when none of the values are changing. This still does a write and, if the table is in replication or change tracking, still causes the row to be propagated out to other servers. If you are updating a potentially large number of records, make sure to only update the ones where the new value doesn’t equal the old value.

Let’s see how the update behavior when update 10 millions rows but without any change and see if will be any row propagation to another server.

objectexplorer

My Lab contains two servers and a demo database replicated from SQL01 to SQL02.

It’s configured a transaction replication to send the articles and I’m using a table created with the code:

CREATE TABLE tblMillionsRows (
    id BIGINT NOT NULL IDENTITY PRIMARY KEY,
    largeColumn NVARCHAR(MAX) DEFAULT REPLICATE('TESTE',100),
    smallColumn NVARCHAR(150) DEFAULT 'DEMO',
    dateColumn DATETIME
        DEFAULT GETDATE()
);
GO

Inserting rows:

INSERT INTO dbo.tblMillionsRows (
    largeColumn,
    smallColumn,
    dateColumn
)
DEFAULT VALUES
GO 2000000

It will take a while to insert all rows. The next step is add this table in the replication.

DECLARE @publication    AS sysname;
DECLARE @table AS sysname;
DECLARE @filterclause AS nvarchar(500);
DECLARE @filtername AS nvarchar(386);
DECLARE @schemaowner AS sysname;
SET @publication = N'SQL01_demo_tb01'; 
SET @table = N'tblMillionsRows';
SET @schemaowner = N'dbo';

EXEC sp_addarticle 
	@publication = @publication, 
	@article = @table, 
	@source_object = @table,
	@source_owner = @schemaowner, 
	@schema_option = 0x80030F3,
	@vertical_partition = N'true', 
	@type = N'logbased',
	@filter_clause = @filterclause;

EXEC sp_articlecolumn 
	@publication = @publication, 
	@article = @table;

EXEC sp_startpublication_snapshot 
	@publication = 'SQL01_demo_tb01', 
	@publisher =  'SQL01'   

With all set, let’s do some tests. First updating the heap table and seeing if something is replicated.

UPDATE tblMillionsRows SET smallColumn = N'DEMO' WHERE id < 10000
GO

After the update, I ran the script below to see the transactions and commands available to replicate and nothing changed.

SELECT * FROM distribution.dbo.MSrepl_Commands 
SELECT * FROM distribution.dbo.MSrepl_Transactions 
EXEC sp_browsereplcmds 

But, when I change one row running the update below I could see a new command to replicate in the another server.

UPDATE tblMillionsRows SET smallColumn = N'DEMO1' WHERE id < 10000
GO

replicationcommands

In conclusion, SQL Server won’t replicate updates that don’t change the value. You can see more about updates in my post.

Be careful when doing updates, because if the table is replicated and you change millions rows the transaction replication will create one command for each updated row.

 

Moving the msdb, model, and tempdb databases files

All system databases, except the resource database, can be moved to new locations to help balance I/O load.

To move the msdb, model, and tempdb databases, perform the following steps:

  • For each file to be moved, execute the ALTER DATABASE … MODIFY FILE statement.
  • Stop the instance of SQL Server.
  • Move the files to the new location (this step is not necessary for tempdb, as its files are recreated automatically on startup).
  • Restart the instance of SQL Server.

The process for moving the master database is different from the process for other databases. To move the master database, perform the following steps:

  • Open SQL Server Configuration Manager.
  • In the SQL Server Services node, right-click the instance of SQL Server, click Properties, and then click the Startup Parameters tab.
  • Edit the Startup Parameters values to point to the planned location for the master database data (-d parameter) and log (-l parameter) files.
  • Stop the instance of SQL Server.
  • Move the master.mdf and mastlog.ldf files to the new location.
  • Restart the instance of SQL Server

 

Memory-optimized Logging

In a previous post I talked about transaction log works, and what about using a memory-optimized table?

SQL Server has the feature Memory-Optimized Objects to improve performance. In-memory nonclustered indexes are implemented using a data structure called a Bw-Tree. A Bw-Tree is a lock and latch-free variation of a B-Tree.

In-memory architecture:inmemory

To enable an application to use In-Memory OLTP, you need to complete the following tasks:

  • Create a memory-optimized data filegroup and add a container to the filegroup.
  • Create memory-optimized tables and indexes.
  • Load data into the memory-optimized table and update statistics after loading the data and before creating the compiled stored procedures.
  • Create natively compiled stored procedures to access data in memory-optimized tables. You can also use a traditional, interpreted Transact-SQL to access data in memory-optimized tables.

  • As needed, migrate data from existing tables to memory-optimized tables.

In this new architecture, let’s see what happens in the logging level.

I created a table using the code:

CREATE TABLE InMemoryTable (
 ID INT IDENTITY NOT NULL PRIMARY KEY NONCLUSTERED
 , FIRST_NAME VARCHAR(100)
) WITH (MEMORY_OPTIMIZED=ON, DURABILITY=SCHEMA_ONLY)

As you can see, the first part is equal to a normal table and to create the in-memory table we use MEMORY_OPTIMIZED and the DURABILITY. In this case, I’d like you to pay attention to the DURABILITY configured to SCHEMA_ONLY.

Let’s take a look in the transaction log after create the table.inmemotable

We can see all the sys changes to create the table. Now, inserting a row in this table and see the log again:

INSERT INTO InMemoryTable VALUES('Douglas Correa')
GO

inmemotable

You can see nothing changed, but where’s my data? The data is there in the table but only in memory. As you can imagine, both the logging and saving the data to disk are expensive operations.

That means with DURABILITY schema_only the data won’t be there after a crash or restart the server. Changing that for SCHEMA_AND_DATA and look in the log file we are going to see the log operation when inserting data.

inmemotable2

As you can see, the log operation is LOP_HK, the row is in the log in case of crash SQL Server can redo.

Conclusion

The memory-optimized table is fast and can improve performance especially if don’t need to save the data, but there are limitations and one of limitation I didn’t like was I can’t detach and attach the database recreating the log file.

 

What the Checkpoint does?

Checkpoints flush dirty data pages from the buffer cache of the current database to disk. This minimizes the active portion of the log that must be processed during a full recovery of a database. During a full recovery, the following types of actions are performed:

  • The log records of modifications not flushed to disk before the system stopped are rolled forward.
  • All modifications associated with incomplete transactions, such as transactions for which there is no COMMIT or ROLLBACK log record, are rolled back.

Checkpoints occur in the following situations:

  • A CHECKPOINT statement is explicitly executed. A checkpoint occurs in the current database for the connection.
  • A minimally logged operation is performed in the database; for example, a bulk-copy operation is performed on a database that is using the Bulk-Logged recovery model.
  • Database files have been added or removed by using ALTER DATABASE.
  • An instance of SQL Server is stopped by a SHUTDOWN statement or by stopping the SQL Server (MSSQLSERVER) service. Either action causes a checkpoint in each database in the instance of SQL Server.
  • An instance of SQL Server periodically generates automatic checkpoints in each database to reduce the time that the instance would take to recover the database.
  • A database backup is taken.
  • An activity requiring a database shutdown is performed. For example, AUTO_CLOSE is ON and the last user connection to the database is closed, or a database option change is made that requires a restart of the database.

recovery_process

VLF (Virtual Log Files)

tranlog3

To see how many VLFs you have solely look at the number of rows returned by DBCC LOGINFO.

The size and number of VLFs you’ll have depends largely on the size that the chunk is when it’s added to you transaction log.

There is no general rule how to determine the best values for the auto-growth option, as these vary from case to case. Having too many or too little virtual log files causes bad performance.

Having an excessive number of VLFs can negatively impact all transaction log related activities and you may even see degradation in performance when transaction log backups occur.

Most of the time excessive VLF fragmentation is brought about by excessive file growth at small intervals. For example, a database that is set to grow a transaction log file by 5mb at a time is going to have a large number of VLFs should the log decide to grow.

Growth Number of VLFs created
<= 64Mb 4
>64 but <=1Gb 8
>1Gb 16

There is insufficient system memory in resource pool

Doing crash and recovery tests on my local machine I got the SQL Server instance not going online. After trying the third time to bring my instance online thinking was something else problem I saw the SQL Server errolog file and I could see the problem.

Not enough memory, but wasn’t on my machine, was in the resource pool. So, what is a resource pool?

A resource pool represents a subset of the physical resources of an instance of the Database Engine and in my case was insufficient memory. Let’s see the errorlog file:

2018-03-14 16:19:58.09 spid56s     [ERROR] Recovery failed with error 0x83000000 on database 18. This error will be mapped to 'HK_E_RESTORE_INSUFFICIENT_MEMORY' (0x8200002e). (sql\ntdbms\hekaton\runtime\src\hkruntime.cpp : 4805 - 'HkRtRestoreDatabase')
2018-03-14 16:19:58.09 spid34s     [INFO] HkCkptCtrlUninitialize(): Database ID: [18]. Cleaning up StorageArray. LastClosedCheckpointEndTs: '158'
2018-03-14 16:19:58.09 Server      Error: 17300, Severity: 16, State: 1. (Params:). The error is printed in terse mode because there was error during formatting. Tracing, ETW, notifications etc are skipped.
2018-03-14 16:19:58.09 Server      Error: 17312, Severity: 16, State: 1. (Params:). The error is printed in terse mode because there was error during formatting. Tracing, ETW, notifications etc are skipped.
2018-03-14 16:19:58.09 Server      Error: 28709, Severity: 16, State: 19. (Params:). The error is printed in terse mode because there was error during formatting. Tracing, ETW, notifications etc are skipped.
2018-03-14 16:19:58.09 spid37s     Error: 701, Severity: 17, State: 137.
2018-03-14 16:19:58.09 spid37s     There is insufficient system memory in resource pool 'default' to run this query.
2018-03-14 16:19:58.09 spid39s     Error: 701, Severity: 17, State: 137.
2018-03-14 16:19:58.09 spid39s     There is insufficient system memory in resource pool 'default' to run this query.
2018-03-14 16:19:58.11 spid55s     [ERROR] Recovery failed with error 0x83000000 on database 15. This error will be mapped to 'HK_E_RESTORE_INSUFFICIENT_MEMORY' (0x8200002e). (sql\ntdbms\hekaton\runtime\src\hkruntime.cpp : 4805 - 'HkRtRestoreDatabase')
2018-03-14 16:19:58.11 spid31s     [INFO] HkCkptCtrlUninitialize(): Database ID: [15]. Cleaning up StorageArray. LastClosedCheckpointEndTs: '155'
2018-03-14 16:19:58.20 spid31s     SQL Server shutdown has been initiated
2018-03-14 16:19:58.21 spid31s     Error: 19032, Severity: 10, State: 1. (Params:). The error is printed in terse mode because there was error during formatting. Tracing, ETW, notifications etc are skipped.
2018-03-14 16:19:58.28 spid34s     SQL Server shutdown has been initiated

After starting the service SQL Server was doing the redo and undo process, this means it was reading the log files, create the compensate log records if was found any uncommitted transaction.

SQL Server will need memory in buffer pool to complete the redo and undo process and I didn’t remember I changed any SQL Server memory configuration.

So, my approach was to connect SQL Server via command line while the instance was still up and run sp_configure to see how much memory was configured. I got only 512mb set for Max Server Memory and that was the problem. (Max server memory controls the SQL Server memory allocation, compile memory, all caches (including the buffer pool), query execution memory grants, lock manager memory, and CLR memory).

In my environment with 26 databases and my crash recovery tests, 512mb for my pool memory wasn’t enough and when I changed the configuration to 4096mb I could bring the instance online again.

Conclusion

First, read the errorlog file to have more information what SQL Server is doing and also know transaction log operations, log records, checkpoints and how crash  recovery works is fundamental.

SQL Server as a process acquires more memory than specified by max server memory option. Both internal and external components can allocate memory outside of the buffer pool, which consumes additional memory, but the memory allocated to the buffer pool usually still represents the largest portion of memory consumed by SQL Server.

 

How Update works?

Update data in SQL Server is a simple task. Using the command update table set column = value where column = value. Don’t forget the where clause :).

But, what SQL Server does internally? Delete plus Insert? Modify? Well, the answer depends.

Let’s see examples how that works.

First, create a database and table for the test.

createdatebase

As you can see, I didn’t create an index so I will update a heap table. Before update the row, I will drop the log register and then update the row I want.

firstupdate

Ok, now the result of the DBCC LOG

updateCol2

As we can see, the result is a modified row in the log, what about using an index? Creating a nonclustered index in the column I’m updating.

nonclusteredindexupdate

nonclusteredindexupdateresult

The result is very different, now we can see a modified row in the heap and in the nonclustered index I got a delete/insert row. Very nice!

What if was a clustered index?

clusteredindex

I dropped the nonclustered index and created a clustered and result is:

clusteredindexresult

Again, only the modified row in the index clustered. And, if we try to update our key in the clustered index?

clusteredindexkey

clusteredindexkeyresult

I got the result with delete/insert as I was expecting because I changed the key like the nonclustered index. Now, if I try to update the row with the same value, what SQL Server will do?

noupdate

Nothing, no change at all.