OH: A: “What’s the scariest datatype?”
A: “a BOOlean” …
C: “It’s funny, because it’s TRUE” Lol
The most common and widely used index that we know are nonclustered index. These indexes are created using the famous b-tree (binary tree). B-tree is a data-structured tree where we have the root and leaves.
A nonclustered index contains the index key values and row locators that point to the storage location of the table data. Generally, nonclustered indexes should be designed to improve the performance of frequently used queries that are not covered by the clustered index.
These indexes need a regular maintenance because as data is updated, inserted or deleted these indexes are updated for the new data or delete the reference of a particular deleted data.
When this happen we called fragmentation and Brent Ozar can explain more about it.
The objective here is about maintenance the index, well not exactly who to do this, but what do not do. I’ve seen in same cases a good reindex job, everything ok, tables distributed by number of rows, cluster, then no cluster, but after all of it a step with shrink.
Do not use shrink database, this is not a good idea after a reindex. What is this? What will do in the database? In MSDN page we can see this:
Shrinking data files recovers space by moving pages of data from the end of the file to unoccupied space closer to the front of the file. When enough free space is created at the end of the file, data pages at end of the file can be deallocated and returned to the file system
The process SQL uses is ugly and results in index fragmentation again and all that work to reindex was thrown away. If don’t belive me, well Paul Randal explain something in his article.
Data types and Precedence of convert types
SQL Server associates columns, expressions, variables, and parameters with data types. Data types determine what kind of data can be stored in the field: Integers, characters, dates, money, binary strings, etc.
SQL Server supplies several built-in data types but you can also define custom types
Built-in data types are categorized as shown in the table below, also you can see the precedence of convert to other data type. I mean when an operator combines two expressions of different data types, the rules for data type precedence specify that the data type with the lower precedence is converted to the data type with the higher precedence.
SQL Server uses the following precedence order for data types:
- user-defined data types (highest)
- sql_varian t
- nvarchar (including nvarchar(max) )
- varchar (including varchar(max) )
- varbinary (including varbinary(max) )
- binary (lowest)
How, where, when you need to think in data storage? Well, this is the first step after you modeling your database and most companies do not think about it, I have saw companies with a large data into one disk or small data separated in the wrong way, but what is the best to do?
We have some best practices and you can read more about logical structures in SQL Server called filegroups in MSDN page. Every DBA needs to know to create and maintain filegroups because they are part of every SQL Server database. Filegroups affect the performance, maintenance, and security of your data and they are logical structures to group files together.
At a minimum, every SQL Server database has two operating system files: a data file and a log file. Data files contain data and objects such as tables, indexes, stored procedures, and views. Log files contain the information that is required to recover all transactions in the database. Data files can be grouped together in filegroups for allocation and administration purposes.
Filegroups can be created when the database is first created or created later when more files are added to the database. However, you cannot move files to a different filegroup after the files have been added to the database.
A file cannot be a member of more than one filegroup. Tables, indexes, and large object (LOB) data can be associated with a specific filegroup. This means that all their pages are allocated from the files in that filegroup.
Why use filegroups?
- you have one or more objects tha have heavy read/write activity
- you’ve already tuned through indexes and query writing
- you need a performance boost
- you have additional storage you can utilize to separate the objects
- you want to separate tha data so administration tasks such as backups take less time
- you have a large database (>1TB)
- disaster and recovery
- separate data and log files onto separate disks
- separate tempdb onto its own disk
- at least two filegroups – primary and one user-defined (default)
- files in a filegroup should be equally sized for equal proportion of writes
- use filegroups to isolate objects with heavy read and write activity from each other
- Files and Filegroups Architecture (MSDN Library)
- How SQL Server Stuff Works: Filegroups (Video) (Brent Ozar blog)
Pages and Extends Architecture
The page is the fundamental unit of data storage in SQL Server. An extent is a collection of eight physically contiguous pages. Extents help efficiently manage pages.
Understanding the architecture of pages and extents is important for designing and developing databases that perform efficiently.
The fundamental unit of data storage in SQL Server is the page. The disk space allocated to a data file (.mdf or .ndf) in a database is logically divided into pages numbered contiguously from 0 to n. Disk I/O operations are performed at the page level. That is, SQL Server reads or writes whole data pages.
Extents are a collection of eight physically contiguous pages and are used to efficiently manage the pages. All pages are stored in extents.
Page size is 8 KB. This means SQL Server databases have 128 pages per megabyte. Each page begins with a 96-byte header that is used to store system information about the page.
Extents are the basic unit in which space is managed. An extent is eight physically contiguous pages, or 64 KB. This means SQL Server databases have 16 extents per megabyte.
To make its space allocation efficient, SQL Server does not allocate whole extents to tables with small amounts of data. SQL Server has two types of extents:
- Uniform extents are owned by a single object; all eight pages in the extent can only be used by the owning object.
- Mixed extents are shared by up to eight objects. Each of the eight pages in the extent can be owned by a different object.
A new table or index is generally allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it then switches to use uniform extents for subsequent allocations. If you create an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the index are in uniform extents
SARG or Search Arguments are very important for a good query, even if the predicate is very selective and supported by an index, does not guarantee that SQL Server is going to use an index. You need to write an appropriate predicate to allow the Query Optimizer to take advantage of the indexes. The Query Optimizer is not omnipotent. It can decide to use an index only when the arguments in the predicate are searchable. You have to learn how to write appropriate search arguments (SARGs).
To write an appropriate SARG, you must ensure that a column that has an index on it appears in the predicate alone, not as a function parameter. SARGs must take the form of column inclusive_operator <value> or <value> inclusive_operator column. The column name is alone on one side of the expression, and the constant or calculated value appears on the other side. Inclusive operators include the operators =, >, <, =>, <=, BETWEEN, and LIKE.
However, the LIKE operator is inclusive only if you do not use a wildcard % or _ at the beginning of the string you are comparing the column to.
Above the query returns two rows only; therefore, the WHERE predicate is very selective. There is a nonclustered index on the orderdate column. However, SQL Server did not use the index, as you can see.
The orderdate in the predicate does not appear alone; it is instead an argument of a function. You can rewrite such a query many times. The following query produces the same result, but this time the predicate is a SARG and we can see the different cost for both querys.
Using the AND operator in the WHERE clause predicate means that each part of the predicate limits the result set even more than the previous part. For example, if the first condition limits a query to five rows, then the next condition connected to the first one with the logical AND operator limits the query to five rows at most. The Query Optimizer understands how the logical AND operator works, and can use appropriate indexes.
However, the logical OR operator is inclusive. For example, if the first condition in a predicate would limit the query to 5 rows and the next condition connected to the first condition with the logical OR operator would limit the query to 6 rows, then the result set could have anything between 6 and 11 rows. If the two conditions use two different columns, then SQL Server conservatively takes the worst case and estimates that the query would return 11 rows.
Having multiple conditions in a predicate connected with the OR operator lowers the possibility for SQL Server to use indexes. You should consider rewriting the predicate to a logically equivalent predicate that uses the AND operato.
Another thing you have in mind is that SQL Server executing the IN and the OR operators in the same way. For example the following two queries return the same two rows and are internally treated as equal.
Sometimes we need to use temporary tables to do aggregations, joins or to simplify the job. SQL Server supports a number of options that you can use to store data temporarily. You can use temporary tables and table variables, before we start with codes we need know some concepts.
Local temporary tables are named with a single number sign as a prefix; for example, #T1.
They are visible only to the session that created them. Different sessions can actually create temporary tables with the same name, and each session will see only its own table. Behind the scenes, SQL Server adds unique suffixes to make the names unique in the database, but this is transparent to the sessions.
Local temporary tables are visible throughout the level that created them, across batches, and in all inner levels of the call stack. So if you create a temporary table in a specific level in your code and then execute a dynamic batch or a stored procedure, the inner batch can access the temporary table. If you don’t drop the temporary table explicitly, it is destroyed when the level that created it terminates.
Global temporary tables are named with two number signs as a prefix; for example, ##T1.
They are visible to all sessions. They are destroyed when the session that created them terminates and there are no active references to them.
Table variables are declared, as opposed to being created. They are named with the at sign (@) as a prefix; for example, @T1. They are visible only to the batch that declared them and are destroyed automatically at the end of the batch. They are not visible across batches in the same level, and are also not visible to inner levels in the call stack.
Temporary tables are created in tempdb in the dbo schema. As already mentioned, you can create two temporary tables with the same name in different sessions, because SQL Server internally adds a unique suffix to each. But if you create temporary tables in different sessions with the same constraint name, only one will be created and the other attempts will fail.
There’s a common misconception that only temporary tables have a physical representation in tempdb and that table variables reside only in memory. This isn’t true. Both temporary tables and table variables have a physical representation in tempdb.
You can find entries in the sys.objects view for the internal tables that SQL Server creates in tempdb to implement your temporary tables and table variables. As an example, the following code creates a temporary table called #T1 and then queries the sys.objects view in tempdb looking for table names starting with #.
As we can see, both the variable as a temporary table are created implicitly and explicitly in tempdb.
So when should I use one or the other?
When it comes to performance, there’s a very important difference between temporary tables and table variables. SQL Server maintains distribution statistics (histograms) for temporary tables but not for table variables. This means that, generally speaking, you tend to get more optimal plans for temporary tables. This comes at the cost of maintaining histograms, and at the cost of recompilations that are associated with histogram refreshes.
This plan is very efficient. The optimizer examined the histogram on col2 and estimated that a very small number of rows are supposed to be filtered. The plan decided to use the index on col2 because the filter is very selective. Only a small number of key lookups are required to obtain the respective data rows. For such a selective filter, this plan is preferable to one that does a full clustered index scan.
Unlike with temporary tables, SQL Server doesn’t maintain histograms for table variables. Not being able to accurately estimate the selectivity of the filter, the optimizer relies on hardcoded estimates that assume fairly low selectivity (30 percent). It just doesn’t realize that the filter is actually very selective, and that a plan more similar to the one shown earlier in the temporary table would have been much more efficient.
A common question is whether table expressions such as common table expressions (CTEs) also get persisted like temporary tables and table variables. The answer is no. When SQL Server optimizes a query against a table expression, it unnests the inner query’s logic and interacts directly with the underlying tables. This means that unlike temporary tables and table variables, table expressions have no physical side to them.
The conclusion from this example is that when the plan efficiency depends on existence of histograms, you should use temporary tables. Table variables are fine to use in two general cases. One is when the volume of data is so small, like a page or two, that the efficiency of the plan isn’t important. The other case is when the plan is trivial. A trivial plan means that there’s only one sensible plan and the optimizer doesn’t really need histograms to come up with this conclusion. An example for such a plan is a range scan in a clustered index or a covering index. Such a plan is not dependent on selectivity of the filter—it’s simply always a better option
than a full scan.