How SQL Server stores data?

SQL Server stores data in heaps or b-tree structures. Heaps are unordered set and balanced trees are ordered by their keys.

Heaps and b-tree use collection of pages within the structure and it’s called allocation units.

  • IN_ROW_DATA -> contains all data.
  • LOB_DATA -> structure for large objects used to stored in xml, varchar(max), nvarchar(max), varbinary(max) data types.
  • ROW_OVERFLOW_DATA -> when the data type is variable like varchar and nvarchar exceed 8,060 bytes size limit per row.

To see this information is used the system view allocation_units.

allocation_units

SELECT OBJECT_NAME(I.object_id)
 , I.name
 , AU.total_pages
 , AU.used_pages
 , AU.data_pages
 , P.rows
 , AU.type_desc
 FROM sys.allocation_units AS AU
 INNER JOIN sys.partitions AS P ON AU.container_id = P.partition_id
 INNER JOIN sys.indexes AS I ON I.index_id = P.index_id AND I.object_id = P.object_id
 WHERE P.object_id = OBJECT_ID('TabelaDeTamanhoVariavel')

 

Column name Data type Description
allocation_unit_id bigint ID of the allocation unit. Is unique within a database.
type tinyint Type of allocation unit:

0 = Dropped

1 = In-row data (all data types, except LOB data types)

2 = Large object (LOB) data (text, ntext, image, xml, large value types, and CLR user-defined types)

3 = Row-overflow data

type_desc nvarchar(60) Description of the allocation unit type:

  • DROPPED
  • IN_ROW_DATA
  • LOB_DATA
  • ROW_OVERFLOW_DATA
container_id bigint ID of the storage container associated with the allocation unit.

If type = 1 or 3, container_id = sys.partitions.hobt_id.

If type is 2, then container_id = sys.partitions.partition_id.

0 = Allocation unit marked for deferred drop

data_space_id int ID of the filegroup in which this allocation unit resides.
total_pages bigint Total number of pages allocated or reserved by this allocation unit.
used_pages bigint Number of total pages actually in use.
data_pages bigint Number of used pages that have:

  • In-row data
  • LOB data
  • Row-overflow data

Value returned excludes internal index pages and allocation-management pages.

 

Don’t shrink the trees


2000px-N-ary_to_binary.svg

The most common and widely used index that we know are nonclustered index. These indexes are created using the famous b-tree (balanced tree). B-tree is a data-structured tree where we have the root and leaves.

A nonclustered index contains the index key values and row locators that point to the storage location of the table data. Generally, nonclustered indexes should be designed to improve the performance of frequently used queries that are not covered by the clustered index.

IC88960

 

These indexes need a regular maintenance because as data is updated, inserted or deleted these indexes are updated for the new data or delete the reference of a particular deleted data.

When this happen we called fragmentation and Brent Ozar can explain more about it.

The objective here is about maintenance the index, well not exactly who to do this, but what do not do. I’ve seen in same cases a good reindex job, everything ok, tables distributed by number of rows, cluster, then no cluster, but after all of it a step with shrink.

Do not use shrink database, this is not a good idea after a reindex. What is this? What will do in the database? In MSDN page we can see this:

Shrinking data files recovers space by moving pages of data from the end of the file to unoccupied space closer to the front of the file. When enough free space is created at the end of the file, data pages at end of the file can be deallocated and returned to the file system

The process SQL uses is ugly and results in index fragmentation again and all that work to reindex was thrown away. If don’t belive me, well Paul Randal explain something in his article.