What the Checkpoint does?

Checkpoints flush dirty data pages from the buffer cache of the current database to disk. This minimizes the active portion of the log that must be processed during a full recovery of a database. During a full recovery, the following types of actions are performed:

  • The log records of modifications not flushed to disk before the system stopped are rolled forward.
  • All modifications associated with incomplete transactions, such as transactions for which there is no COMMIT or ROLLBACK log record, are rolled back.

Checkpoints occur in the following situations:

  • A CHECKPOINT statement is explicitly executed. A checkpoint occurs in the current database for the connection.
  • A minimally logged operation is performed in the database; for example, a bulk-copy operation is performed on a database that is using the Bulk-Logged recovery model.
  • Database files have been added or removed by using ALTER DATABASE.
  • An instance of SQL Server is stopped by a SHUTDOWN statement or by stopping the SQL Server (MSSQLSERVER) service. Either action causes a checkpoint in each database in the instance of SQL Server.
  • An instance of SQL Server periodically generates automatic checkpoints in each database to reduce the time that the instance would take to recover the database.
  • A database backup is taken.
  • An activity requiring a database shutdown is performed. For example, AUTO_CLOSE is ON and the last user connection to the database is closed, or a database option change is made that requires a restart of the database.

recovery_process

VLF (Virtual Log Files)

tranlog3

To see how many VLFs you have solely look at the number of rows returned by DBCC LOGINFO.

The size and number of VLFs you’ll have depends largely on the size that the chunk is when it’s added to you transaction log.

There is no general rule how to determine the best values for the auto-growth option, as these vary from case to case. Having too many or too little virtual log files causes bad performance.

Having an excessive number of VLFs can negatively impact all transaction log related activities and you may even see degradation in performance when transaction log backups occur.

Most of the time excessive VLF fragmentation is brought about by excessive file growth at small intervals. For example, a database that is set to grow a transaction log file by 5mb at a time is going to have a large number of VLFs should the log decide to grow.

Growth Number of VLFs created
<= 64Mb 4
>64 but <=1Gb 8
>1Gb 16

There is insufficient system memory in resource pool

Doing crash and recovery tests on my local machine I got the SQL Server instance not going online. After trying the third time to bring my instance online thinking was something else problem I saw the SQL Server errolog file and I could see the problem.

Not enough memory, but wasn’t on my machine, was in the resource pool. So, what is a resource pool?

A resource pool represents a subset of the physical resources of an instance of the Database Engine and in my case was insufficient memory. Let’s see the errorlog file:

2018-03-14 16:19:58.09 spid56s     [ERROR] Recovery failed with error 0x83000000 on database 18. This error will be mapped to 'HK_E_RESTORE_INSUFFICIENT_MEMORY' (0x8200002e). (sql\ntdbms\hekaton\runtime\src\hkruntime.cpp : 4805 - 'HkRtRestoreDatabase')
2018-03-14 16:19:58.09 spid34s     [INFO] HkCkptCtrlUninitialize(): Database ID: [18]. Cleaning up StorageArray. LastClosedCheckpointEndTs: '158'
2018-03-14 16:19:58.09 Server      Error: 17300, Severity: 16, State: 1. (Params:). The error is printed in terse mode because there was error during formatting. Tracing, ETW, notifications etc are skipped.
2018-03-14 16:19:58.09 Server      Error: 17312, Severity: 16, State: 1. (Params:). The error is printed in terse mode because there was error during formatting. Tracing, ETW, notifications etc are skipped.
2018-03-14 16:19:58.09 Server      Error: 28709, Severity: 16, State: 19. (Params:). The error is printed in terse mode because there was error during formatting. Tracing, ETW, notifications etc are skipped.
2018-03-14 16:19:58.09 spid37s     Error: 701, Severity: 17, State: 137.
2018-03-14 16:19:58.09 spid37s     There is insufficient system memory in resource pool 'default' to run this query.
2018-03-14 16:19:58.09 spid39s     Error: 701, Severity: 17, State: 137.
2018-03-14 16:19:58.09 spid39s     There is insufficient system memory in resource pool 'default' to run this query.
2018-03-14 16:19:58.11 spid55s     [ERROR] Recovery failed with error 0x83000000 on database 15. This error will be mapped to 'HK_E_RESTORE_INSUFFICIENT_MEMORY' (0x8200002e). (sql\ntdbms\hekaton\runtime\src\hkruntime.cpp : 4805 - 'HkRtRestoreDatabase')
2018-03-14 16:19:58.11 spid31s     [INFO] HkCkptCtrlUninitialize(): Database ID: [15]. Cleaning up StorageArray. LastClosedCheckpointEndTs: '155'
2018-03-14 16:19:58.20 spid31s     SQL Server shutdown has been initiated
2018-03-14 16:19:58.21 spid31s     Error: 19032, Severity: 10, State: 1. (Params:). The error is printed in terse mode because there was error during formatting. Tracing, ETW, notifications etc are skipped.
2018-03-14 16:19:58.28 spid34s     SQL Server shutdown has been initiated

After starting the service SQL Server was doing the redo and undo process, this means it was reading the log files, create the compensate log records if was found any uncommitted transaction.

SQL Server will need memory in buffer pool to complete the redo and undo process and I didn’t remember I changed any SQL Server memory configuration.

So, my approach was to connect SQL Server via command line while the instance was still up and run sp_configure to see how much memory was configured. I got only 512mb set for Max Server Memory and that was the problem. (Max server memory controls the SQL Server memory allocation, compile memory, all caches (including the buffer pool), query execution memory grants, lock manager memory, and CLR memory).

In my environment with 26 databases and my crash recovery tests, 512mb for my pool memory wasn’t enough and when I changed the configuration to 4096mb I could bring the instance online again.

Conclusion

First, read the errorlog file to have more information what SQL Server is doing and also know transaction log operations, log records, checkpoints and how crash  recovery works is fundamental.

SQL Server as a process acquires more memory than specified by max server memory option. Both internal and external components can allocate memory outside of the buffer pool, which consumes additional memory, but the memory allocated to the buffer pool usually still represents the largest portion of memory consumed by SQL Server.

 

Query Store Usage Scenarios

What is Query Store?

Query Store is a tool to troubleshoot query performance by capturing a range of information about query usage, CPU, memory consumption, I/O and execution time, and retaining every Execution Plan for analysis.

query-store-process-3

So, if query store stores my query information, what kind of scenarios I could use this information?

  1. To fix queries with plan choice regressions is one example where to use it. Query Optimizer may decide to choose a different plan and without Query Store would be difficult to identify and fix it.
  2.  Tune top resource consuming queries is much more easier with Query Store because this tool will show the plan is using in a query that has high reads or duration for example.
  3. To compare workload performance before and after the application change or adding new hardware, create indexes or the next scenario SQL Server updates.
  4. To keep performance stability during the upgrade to newer SQL Serverquery-store-usage-5
  5. To improve ad hoc workloads. Run the Transact-SQL script bellow to get total number of query texts, queries, and plans in the system and determine how different they are by comparing their query_hash and plan_hash

/*Do cardinality analysis when suspect on ad hoc workloads*/
SELECT COUNT(*) AS CountQueryTextRows FROM sys.query_store_query_text;
SELECT COUNT(*) AS CountQueryRows FROM sys.query_store_query;
SELECT COUNT(DISTINCT query_hash) AS CountDifferentQueryRows FROM sys.query_store_query;
SELECT COUNT(*) AS CountPlanRows FROM sys.query_store_plan;
SELECT COUNT(DISTINCT query_plan_hash) AS CountDifferentPlanRows FROM sys.query_store_plan; 

 

SQL Server Undocumented 2

One more undocumented function/commands. The fn_dump_dblog function is used to read transaction logs from a log backup file.

This is an example fo the undocumented function to read transaction logs:

SELECT [Current LSN], [Operation], [Transaction Name], [Transaction ID], SUSER_SNAME ([Transaction SID]) AS DBUser 
FROM fn_dump_dblog (
            NULL, NULL, N'DISK', 1, N'C:\temp\Backup_Logfile.trn',
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT)

A good example to use it is when I tried to restore up to specific LSN using the command

RESTORE LOG  WITH STOPBEFOREMARK = ‘lsn:’;

To read your current log file the function fn_dblog is used. For more information about this function click here

Follow bellow the operations is returned for the function:

OPERATION DESCRIPTION
LOP_ABORT_XACT   Indicates that a transaction was aborted and rolled back.
LOP_BEGIN_CKPT  A checkpoint has begun.
LOP_BEGIN_XACT  Indicates the start of a transaction.
LOP_BUF_WRITE Writing to Buffer.
LOP_COMMIT_XACT Indicates that a transaction has committed.
LOP_COUNT_DELTA  ?
LOP_CREATE_ALLOCCHAIN New Allocation chain
LOP_CREATE_INDEX Creating an index.
LOP_DELETE_ROWS Rows were deleted from a table.
LOP_DELETE_SPLIT  A page split has occurred. Rows have moved physically.
LOP_DELTA_SYSIND   SYSINDEXES table has been modified.
LOP_DROP_INDEX Dropping an index.
LOP_END_CKPT Checkpoint has finished.
LOP_EXPUNGE_ROWS Row physically expunged from a page, now free for new rows.
LOP_FILE_HDR_MODIF   SQL Server has grown a database file.
LOP_FORGET_XACT Shows that a 2-phase commit transaction was rolled back.
LOP_FORMAT_PAGE   Write a header of a newly allocated database page.
LOP_HOBT_DDL  ?
LOP_HOBT_DELTA  ?
LOP_IDENT_NEWVAL Identity’s New reseed values
LOP_INSERT_ROWS   Insert a row into a user or system table.
LOP_LOCK_XACT
LOP_MARK_DDL Data Definition Language change – table schema was modified.
LOP_MARK_SAVEPOINT Designate that an application has issued a ‘SAVE TRANSACTION’ command.
LOP_MIGRATE_LOCKS
LOP_MODIFY_COLUMNS    Designates that a row was modified as the result of an Update command.
LOP_MODIFY_HEADER   A new data page created and has initialized the header of that page.
LOP_MODIFY_ROW   Row modification as a result of an Update command.
LOP_PREP_XACT Transaction is in a 2-phase commit protocol.
LOP_SET_BITS
LOP_SET_BITS Designates that the DBMS modified space allocation bits as the result of allocating a new extent.
LOP_SET_FREE_SPACE   Designates that a previously allocated extent has been returned to the free pool.
LOP_SORT_BEGIN  A sort begins with index creation. – SORT_END end of the sorting while creating an index.
LOP_SORT_EXTENT Sorting extents as part of building an index.
LOP_UNDO_DELETE_SPLIT The page split process has been dumped.
LOP_XACT_CKPT During the Checkpoint, open transactions were detected.

Logging

Every database has a transaction log that records all transactions and the database modifications made by each transaction

The transaction log is a critical component of the database. If there is a system failure, you will need that log to bring your database back to a consistent state.

How that works? What steps SQL Server do to keep the consistent state?

Taking a example I want to update 10 rows in my table and those rows are using 2 pages, let’s see step by step SQL Server does.

  1. Read the pages from buffer pool, if the pages isn’t there SQL Server will get that data from disk and put in memory.
  2. Start the lock process, SQL Server will acquire intent-exclusive lock for the table and pages, update row-level locks.
  3. After all rows to be updates are locked, it’s time to convert the update lock to exclusive lock and make the change to the page in memory. This process will create a log record describing the changes were made.
  4. Next the transaction is ready to commit. This step will write the log records to the transaction log on disk and will acknowledge the commit to the user.

The changed data still in the buffer pool and written in the transaction log on disk.  The data will stay in memory util a checkpoint runs and write the changes to the data files.

9012805ba7742e521e941ec1ec5838b1--sql-server-logs

So, that’s why transaction logs are critical and we have to take care of it.

How to manage the transaction log

How to read transaction logs

 

Wait wait wait…

Let’s talk about why we have to wait and how to understand the wait types.

Paul Randal in his post Wait statistics, or please tell me where it hurts said:

A thread is using the CPU (called RUNNING) until it needs to wait for a resource. It then moves to an unordered list of threads that are SUSPENDED. In the meantime, the next thread on the FIFO (first-in-first-out) queue of threads waiting for the CPU (called being RUNNABLE) is given the CPU and becomes RUNNING. If a thread on the SUSPENDED list is notified that it’s resource is available, it becomes RUNNABLE and is put on the bottom of the RUNNABLE queue. Threads continue this clockwise movement from RUNNING to SUSPENDED to RUNNABLE to RUNNING again until the task is completed

That’s explain a lot, because the SQL Server threads doesn’t run all in the same time. A good example is when our query is doing physical reads. The IO subsystem is the slowest part of our resources and probably will take some time if the query is reading gigabytes of data.

Capture

After the CPU request the data from the disk, the disk will run for it, but before send the data back. All data need to go to memory first and that may don’t have the necessary space.  The thread is going to wait until some resources been released first. There are many scenarios, for example, how many threads are running this query? How long will take to the application to show that data?

So, every time a thread needs to wait for a resource it will increase a wait time type, such as PAGEIOLATCH_XX , PAGELATCH_XX, ASYNC_NETWORK_IO, CXPACKET, RESOURCE_SEMAPHORE. I will talk more about waits in the next posts.

Heap Tables

What’s a heap table? I would say it’s a table without clustered index.

What’s the characteristic of a heap table? The data isn’t ordered.

What’s the consequence having a heap table? There are a few:

  1. Specific data is not retrived quickly
  2. Data pages aren’t linked, that means sequential access needs to refer to the index allocation map (IAM) pages
  3. No cost to update indexes
  4. No additional space to store clustered index

SQL Operations Studio Linux

Great news, a visual way to manage SQL Server from Linux.

SQL Operations Studio came to fill a gap we were waiting to develop and manage SQL Server databases on Linux. In the link https://github.com/Microsoft/sqlopsstudio we can see more information like the features below.

Feature Highlights

  • Cross-Platform DB management for Windows, macOS and Linux with simple XCopy deployment
  • SQL Server Connection Management with Connection Dialog, Server Groups, and Registered Servers
  • Object Explorer supporting schema browsing and contextual command execution
  • T-SQL Query Editor with advanced coding features such as autosuggestions, error diagnostics, tooltips, formatting and peek definition
  • Query Results Viewer with advanced data grid supporting large result sets, export to JSON\CSV\Excel, query plan and charting
  • Management Dashboard supporting customizable widgets with drill-through actionable insights
  • Visual Data Editor that enables direct row insertion, update and deletion into tables
  • Backup and Restore dialogs that enables advanced customization and remote filesystem browsing, configured tasks can be executed or scripted
  • Task History window to view current task execution status, completion results with error messages and task T-SQL scripting
  • Scripting support to generate CREATE, SELECT and DROP statements for database objects
  • Workspaces with full Git integration and Find In Files support to managing T-SQL script libraries
  • Modern light-weight shell with theming, user settings, full screen support, integrated terminal and numerous other features

Let’s see how it works:

The process is very simple. First step download de file in this link https://go.microsoft.com/fwlink/?linkid=862646

After download the next step is to extract the files in a folder wherever you want and run the sqlops file.

My first impression is quite good. A good visual tool to work with SQL Server on Linux. I like the execution plan view and how to manage the databases.

sqlops1

 

This slideshow requires JavaScript.

Comparing with SQL Server Management Studio I miss some options such as:

  • LinkedServer Objects
  • Maintenance Plans
  • Replication
  • Database Mail
  • Extended Events

The conclusion is I really can work on that and run all my scripts and create objects like linkedservers, extended events or replications. That helps a lot to try a new platform concepts.