Search Arguments

SARG or Search Arguments are very important for a good query, even if the predicate is very selective and supported by an index, does not guarantee that SQL Server is going to use an index. You need to write an appropriate predicate to allow the Query Optimizer to take advantage of the indexes. The Query Optimizer is not omnipotent. It can decide to use an index only when the arguments in the predicate are searchable. You have to learn how to write appropriate search arguments (SARGs).

sarg

To write an appropriate SARG, you must ensure that a column that has an index on it appears in the predicate alone, not as a function parameter. SARGs must take the form of column inclusive_operator <value> or <value> inclusive_operator column. The column name is alone on one side of the expression, and the constant or calculated value appears on the other side. Inclusive operators include the operators =, >, <, =>, <=, BETWEEN, and LIKE.
However, the LIKE operator is inclusive only if you do not use a wildcard % or _ at the beginning of the string you are comparing the column to.

Above the query returns two rows only; therefore, the WHERE predicate is very selective. There is a nonclustered index on the orderdate column. However, SQL Server did not use the index, as you can see.

The orderdate in the predicate does not appear alone; it is instead an argument of a function. You can rewrite such a query many times. The following query produces the same result, but this time the predicate is a SARG and we can see the different cost for both querys.

difference sarg

Using the AND operator in the WHERE clause predicate means that each part of the predicate limits the result set even more than the previous part. For example, if the first condition limits a query to five rows, then the next condition connected to the first one with the logical AND operator limits the query to five rows at most. The Query Optimizer understands how the logical AND operator works, and can use appropriate indexes.

However, the logical OR operator is inclusive. For example, if the first condition in a predicate would limit the query to 5 rows and the next condition connected to the first condition with the logical OR operator would limit the query to 6 rows, then the result set could have anything between 6 and 11 rows. If the two conditions use two different columns, then SQL Server conservatively takes the worst case and estimates that the query would return 11 rows.

Having multiple conditions in a predicate connected with the OR operator lowers the possibility for SQL Server to use indexes. You should consider rewriting the predicate to a logically equivalent predicate that uses the AND operato.

Another thing you have in mind is that SQL Server executing the IN and the OR operators in the same way. For example the following two queries return the same two rows and are internally treated as equal.

sarg or

The Meaning of “Set-Based”

The term set-based is used to describe an approach to handle querying tasks and is based on principles from the relational model. Remember that the relational model is based in part on mathematical set theory. Set-based solutions use T-SQL queries, which operate on the input tables as sets of rows. Such solutions are contrasted with iterative solutions that use cursors or other iterative constructs to handle one row at a time.

According to set theory, a set should be considered as a whole. This means that your attention should be focused on the set and not on its individual elements. With iterative solutions, you break this principle by operating on one element (row) at a time. Also, a set has no particular order to its elements. So when you use set-based solutions, you cannot make any assumptions about the order of the data. Similarly, unless you add an ORDER BY clause to the query, you’re not guaranteed that the data will be returned in any particular order. With iterative solutions, you process one row at a time and you can do so in a specific order.

When you use set-based solutions, you provide your request as a declarative plain language query. In your request, you focus on the “what” part of the request and let the database engine worry about the “how” part. With iterative solutions, you need to implement both the what and the how parts in your code. As a result, iterative solutions tend to be much longer than set-based ones and harder to follow and maintain.

Another reason why you should stick to set-based solutions is a very pragmatic one—performance. Iterative constructs in T-SQL are very slow. For one thing, loops in T-SQL are much slower than those in other programming languages such as in Microsoft .NET code. Secondly, each record fetch from a cursor by using the FETCH NEXT command has quite a high overhead associated with it. There’s no such overhead when SQL Server processes a set-based solution, even if internally the execution plan for the query involves iterations. As a result, if you know how to tune queries, you are often able to achieve much better performance compared to using iterative solutions.

For example, I have seen something like this:
SET NOCOUNT ON;
DECLARE @Result AS TABLE (
actid INT,
mx MONEY );

DECLARE
@actid AS INT,
@val AS MONEY,
@prevactid AS INT,
@prevval AS MONEY;

INSERT INTO #tbTemp
SELECT actid, val
  FROM dbo.Transactions
  ORDER BY actid, val;

SELECT TOP(1) @prevactid = actid, @prevval = val
FROM #tbTemp;

WHILE @prevactid NOT IS NULL
BEGIN
IF @actid @prevactid
     INSERT INTO @Result(actid, mx) VALUES(@prevactid, @prevval);
 

  DELETE #tbTemp
  WHERE actid = @actid;
 

  SELECT TOP(1) @prevactid = @actid, @prevval = @val
  FROM #tbTemp;
END

IF @prevactid  IS NOT NULL
INSERT INTO @Result(actid, mx) VALUES(@prevactid, @prevval);

SELECT actid, mx
FROM @Result;
GO

Some people think this solution is set-based because it doesn’t explicitly declare and use a cursor object (we talk about this later or never 🙂 ). However, recall that one of the principles implemented by set-based solutions is that they treat the set as a whole as opposed to handling one element at a time. This principle is violated here. Also, set-based solutions do not rely on the order of the data, and this one does.

How we can do this with a set-based rule. Well, lets see the following code:
SELECT
T1.actid, T1.tranid, T1.val,
SUM(T2.val) AS balance
FROM dbo.Transactions AS T1
JOIN dbo.Transactions AS T2 ON T2.actid = T1.actid AND T2.tranid <= T1.tranid
GROUP BY T1.actid, T1.tranid, T1.val;

The code above use a recursive join, recursive querys are commonly used with CTE (we talk later) and for this time this works well, but the best way in SQL Server 2012 is using Window Functions.

SELECT actid, tranid, val,
SUM(val) OVER( PARTITION BY actid
               ORDER BY tranid
               ROWS UNBOUNDED PRECEDING)
AS balance
FROM dbo.Transactions;

The conclusion is, always have a way to work with sets, the first intent wasn’t set-based because works one by one element at time.
I used that code with a temporary table (#tbTemp) because I will discuss this subject in the next post.

 

Execution Plan

books

Why I started my blog with some posts about query optimizer, execution plans and logical processing? Well, before you learn about codes and more codes we need to know how this works in SQL Server. For example, the developer know how to write a single select, insert or update statement or  a batch with conditions clauses like IF, but what happens when you execute the query? We need to know how the whole thing works and the concepts to work with sets.

So, SQL Server and others databases has a lot of information for your simple or complex batch to work well and others tools to help our job. The most important is to write the best query as possible. If you write a simple select, the first thing your have to see before you put in your application is the execution plan, IO and time of query response.

People are not very keen on waiting. They get nervous in a traffic jam. They are not too satisfied if they have to sit without a drink for a while in a bar. Similarly, they want their applications to be as responsive as possible. End users perceive performance problems through waiting.

Except for very simple queries, a query can be executed in many different ways. How many ways? Well, the number of different ways of execution or execution plans grows exponentially with query complexity.

For example, analyze the following pseudo-query very superficially:
SELECT
    A.col5
    , SUM (C.col6) AS col6sum
FROM
    TableA AS A
INNER JOIN TableB AS ON A.col1 B.col1
INNER JOIN TableC AS C ON B.col2 = C.col2
WHERE
    A.col3constant1 AND B.col4 constant2
GROUP BY
    A.col5;

Start with the FROM part. Which tables should SQL Server join first, TableA and TableB or TableB and TableC? And in each join, which of the two tables joined should be the left and which one the right table? The number of all possibilities is six, if the two joins are evaluated linearly, one after another. With evaluation of multiple joins at the same time, the number of
all possible combinations for processing the joins is already 12. The actual formula for possible combinations of join evaluation is n!, or n factorial, for linear evaluation, and (2n -2)!/ (n-1)! for parallel evaluation of possible joins.

This already gives four options for each join. So far, there are 6 x 4 = 24 different options for only the FROM part of this query. But the real situation is even worse. SQL Server can execute a hash join in three different ways. As mentioned, this is just a quick superficial analysis of pseudo-query execution, and for this introduction to query optimization problems, such details are not needed.

In the WHERE clause, two expressions are connected with a logical AND operator. The logical AND operator is commutative, so SQL Server can evaluate the second expression first.

Again, there are two choices. Altogether, there are already 6 x 4 x 2 = 48 choices. And again, the real situation is much worse. Because in the pseudo-query all joins are inner joins and because expressions in the WHERE clause are commutative, SQL Server can even start executing the query with any of the expressions of the WHERE clause, then switch to the FROM clause and perform first a join, evaluate the second expression from the WHERE clause, and so on.

So the number of possible plans is already much higher than 48. For this superficial overview, continue with the GROUP BY clause. SQL Server can execute this part in two ways, as an ordered group or as a hash group. Therefore, the number of options for executing the pseudo-query is already 6 x 4 x 2 x 2 = 96.

The important conclusion is that you can see that the number of different possible execution plans for a query grows factorially with query complexity. You can quickly get billions of possible execution plans. SQL Server has to decide which one to use in a very short time. You wouldn’t want to wait, for example, for a whole day for SQL Server to find out the best possible plan and then execute your queries in 5 seconds instead of in 15 seconds. Now you can imagine the complexity of the problems the SQL Server Query Optimizer has to solve with any single query.

 Writing queries requires basic T-SQL knowledge; writing well-performing queries needs much more advanced knowledge.