Data types

Data types and Precedence of convert types

SQL Server associates columns, expressions, variables, and parameters with data types. Data types determine what kind of data can be stored in the field: Integers, characters, dates, money, binary strings, etc.

SQL Server supplies several built-in data types but you can also define custom types

Built-in data types are categorized as shown in the table below, also you can see the precedence of convert to other data type. I mean when an operator combines two expressions of different data types, the rules for data type precedence specify that the data type with the lower precedence is converted to the data type with the higher precedence.

IC254199

SQL Server uses the following precedence order for data types:

  1. user-defined data types (highest)
  2. sql_varian t
  3. xml
  4. datetimeoffset
  5. datetime2
  6. datetime
  7. smalldatetime
  8. date
  9. time
  10. float
  11. real
  12. decimal
  13. money
  14. smallmoney
  15. bigint
  16. int
  17. smallint
  18. tinyint
  19. bit
  20. ntext
  21. text
  22. image
  23. timestamp
  24. uniqueidentifier
  25. nvarchar (including nvarchar(max) )
  26. nchar
  27. varchar (including varchar(max) )
  28. char
  29. varbinary (including varbinary(max) )
  30. binary (lowest)

 

 

Advertisements

Temporary Tables

Sometimes we need to use temporary tables to do aggregations, joins or to simplify the job. SQL Server supports a number of options that you can use to store data temporarily. You can use temporary tables and table variables, before we start with codes we need know some concepts.

Local temporary tables are named with a single number sign as a prefix; for example, #T1.

They are visible only to the session that created them. Different sessions can actually create temporary tables with the same name, and each session will see only its own table. Behind the scenes, SQL Server adds unique suffixes to make the names unique in the database, but this is transparent to the sessions.

Local temporary tables are visible throughout the level that created them, across batches, and in all inner levels of the call stack. So if you create a temporary table in a specific level in your code and then execute a dynamic batch or a stored procedure, the inner batch can access the temporary table. If you don’t drop the temporary table explicitly, it is destroyed when the level that created it terminates.

Global temporary tables are named with two number signs as a prefix; for example, ##T1.
They are visible to all sessions. They are destroyed when the session that created them terminates and there are no active references to them.

Table variables are declared, as opposed to being created. They are named with the at sign (@) as a prefix; for example, @T1. They are visible only to the batch that declared them and are destroyed automatically at the end of the batch. They are not visible across batches in the same level, and are also not visible to inner levels in the call stack.

Temporary tables are created in tempdb in the dbo schema. As already mentioned, you can create two temporary tables with the same name in different sessions, because SQL Server internally adds a unique suffix to each. But if you create temporary tables in different sessions with the same constraint name, only one will be created and the other attempts will fail.

Physical Representation in tempdbtempDB

There’s a common misconception that only temporary tables have a physical representation in tempdb and that table variables reside only in memory. This isn’t true. Both temporary tables and table variables have a physical representation in tempdb.

You can find entries in the sys.objects view for the internal tables that SQL Server creates in tempdb to implement your temporary tables and table variables. As an example, the following code creates a temporary table called #T1 and then queries the sys.objects view in tempdb looking for table names starting with #.

As we can see, both the variable as a temporary table are created implicitly and explicitly in tempdb.

So when should I use one or the other?

When it comes to performance, there’s a very important difference between temporary tables and table variables. SQL Server maintains distribution statistics (histograms) for temporary tables but not for table variables. This means that, generally speaking, you tend to get more optimal plans for temporary tables. This comes at the cost of maintaining histograms, and at the cost of recompilations that are associated with histogram refreshes.

temporary1

This plan is very efficient. The optimizer examined the histogram on col2 and estimated that a very small number of rows are supposed to be filtered. The plan decided to use the index on col2 because the filter is very selective. Only a small number of key lookups are required to obtain the respective data rows. For such a selective filter, this plan is preferable to one that does a full clustered index scan.

temporary2

Unlike with temporary tables, SQL Server doesn’t maintain histograms for table variables. Not being able to accurately estimate the selectivity of the filter, the optimizer relies on hardcoded estimates that assume fairly low selectivity (30 percent). It just doesn’t realize that the filter is actually very selective, and that a plan more similar to the one shown earlier in the temporary table would have been much more efficient.

A common question is whether table expressions such as common table expressions (CTEs) also get persisted like temporary tables and table variables. The answer is no. When SQL Server optimizes a query against a table expression, it unnests the inner query’s logic and interacts directly with the underlying tables. This means that unlike temporary tables and table variables, table expressions have no physical side to them.

The conclusion from this example is that when the plan efficiency depends on existence of histograms, you should use temporary tables. Table variables are fine to use in two general cases. One is when the volume of data is so small, like a page or two, that the efficiency of the plan isn’t important. The other case is when the plan is trivial. A trivial plan means that there’s only one sensible plan and the optimizer doesn’t really need histograms to come up with this conclusion. An example for such a plan is a range scan in a clustered index or a covering index. Such a plan is not dependent on selectivity of the filter—it’s simply always a better option
than a full scan.

The Meaning of “Set-Based”

The term set-based is used to describe an approach to handle querying tasks and is based on principles from the relational model. Remember that the relational model is based in part on mathematical set theory. Set-based solutions use T-SQL queries, which operate on the input tables as sets of rows. Such solutions are contrasted with iterative solutions that use cursors or other iterative constructs to handle one row at a time.

According to set theory, a set should be considered as a whole. This means that your attention should be focused on the set and not on its individual elements. With iterative solutions, you break this principle by operating on one element (row) at a time. Also, a set has no particular order to its elements. So when you use set-based solutions, you cannot make any assumptions about the order of the data. Similarly, unless you add an ORDER BY clause to the query, you’re not guaranteed that the data will be returned in any particular order. With iterative solutions, you process one row at a time and you can do so in a specific order.

When you use set-based solutions, you provide your request as a declarative plain language query. In your request, you focus on the “what” part of the request and let the database engine worry about the “how” part. With iterative solutions, you need to implement both the what and the how parts in your code. As a result, iterative solutions tend to be much longer than set-based ones and harder to follow and maintain.

Another reason why you should stick to set-based solutions is a very pragmatic one—performance. Iterative constructs in T-SQL are very slow. For one thing, loops in T-SQL are much slower than those in other programming languages such as in Microsoft .NET code. Secondly, each record fetch from a cursor by using the FETCH NEXT command has quite a high overhead associated with it. There’s no such overhead when SQL Server processes a set-based solution, even if internally the execution plan for the query involves iterations. As a result, if you know how to tune queries, you are often able to achieve much better performance compared to using iterative solutions.

For example, I have seen something like this:
SET NOCOUNT ON;
DECLARE @Result AS TABLE (
actid INT,
mx MONEY );

DECLARE
@actid AS INT,
@val AS MONEY,
@prevactid AS INT,
@prevval AS MONEY;

INSERT INTO #tbTemp
SELECT actid, val
  FROM dbo.Transactions
  ORDER BY actid, val;

SELECT TOP(1) @prevactid = actid, @prevval = val
FROM #tbTemp;

WHILE @prevactid NOT IS NULL
BEGIN
IF @actid @prevactid
     INSERT INTO @Result(actid, mx) VALUES(@prevactid, @prevval);
 

  DELETE #tbTemp
  WHERE actid = @actid;
 

  SELECT TOP(1) @prevactid = @actid, @prevval = @val
  FROM #tbTemp;
END

IF @prevactid  IS NOT NULL
INSERT INTO @Result(actid, mx) VALUES(@prevactid, @prevval);

SELECT actid, mx
FROM @Result;
GO

Some people think this solution is set-based because it doesn’t explicitly declare and use a cursor object (we talk about this later or never 🙂 ). However, recall that one of the principles implemented by set-based solutions is that they treat the set as a whole as opposed to handling one element at a time. This principle is violated here. Also, set-based solutions do not rely on the order of the data, and this one does.

How we can do this with a set-based rule. Well, lets see the following code:
SELECT
T1.actid, T1.tranid, T1.val,
SUM(T2.val) AS balance
FROM dbo.Transactions AS T1
JOIN dbo.Transactions AS T2 ON T2.actid = T1.actid AND T2.tranid <= T1.tranid
GROUP BY T1.actid, T1.tranid, T1.val;

The code above use a recursive join, recursive querys are commonly used with CTE (we talk later) and for this time this works well, but the best way in SQL Server 2012 is using Window Functions.

SELECT actid, tranid, val,
SUM(val) OVER( PARTITION BY actid
               ORDER BY tranid
               ROWS UNBOUNDED PRECEDING)
AS balance
FROM dbo.Transactions;

The conclusion is, always have a way to work with sets, the first intent wasn’t set-based because works one by one element at time.
I used that code with a temporary table (#tbTemp) because I will discuss this subject in the next post.

 

Structured Query Language

By a “set” we mean any collection M into a whole of definite, distinct objects m (which are called the “elements” of M) of our perception or of our thought.

—George Cantor, in “Georg Cantorby Joseph W. Dauben (Princeton University Press, 1990)

Standard SQL is based on the relational model, which is a mathematical model for data management and manipulation. The relational model was initially created and proposed by Edgar F. Codd in 1969. Since then, it has been explained and developed by Chris Date, Hugh Darwen, and others.

Transact-SQL (T-SQL) is the main language used to manage and manipulate data in Microsoft SQL Server.

Microsoft SQL Server Management Studio showing...
Microsoft SQL Server Management Studio showing a query, the results of the query, and the Object Explorer pane while connected to a SQL Server database engine instance. (Photo credit: Wikipedia)

In fact, T-SQL is based more on multiset theory than on set theory. A multiset (also known as a bag or a superset) in many respects is

similar to a set, but can have duplicates. The T-SQL language does give you enough tools so that if you want to follow relational theory, you can do so. For example, the language provides you with a DISTINCT clause to remove duplic

ates.

It then moves on to describe one of the most important concepts you need to know about the language—logical query processing.

Thinking in Sets

Thinkins in sets is a great book by Joe Celko’s. This book tell us about concepts how we need to think when we were working with databases.
Columns are not fields, rows are not records and tables are note files, hence SQL is declarative, not procedural. There is no sequential access or ordering in table, so “first,” “next,”
and “last” rows are totally meaningless.
I agree when he write “One of the first things that a newbie does is use a proprietary autonumbering feature in their SQL product as a PRIMARY KEY”. This is completely wrong, and it violates the definition of a relational key.
An attribute has to belong to an entity in the real world being modeled by the RDBMS. Autonumbering does not exist in an entity in the real world being modeled by the RDBMS. Thus, it is not an attribute
and cannot be in a table, by definition.
That’s how I think and was totally wrong before.

So lets do something sets!

20130501-200608.jpg