Storage – Part II

Logical Structures
How, where, when you need to think in data storage? Well, this is the first step after you modeling your database and most companies do not think about it, I have saw companies with a large data into one disk or small data separated in the wrong way, but what is the best to do?

We have some best practices and you can read more about logical structures in SQL Server called filegroups in MSDN page. Every DBA needs to know to create and maintain filegroups because they are part of every SQL Server database. Filegroups affect the performance, maintenance, and security of your data and they are logical structures to group files together.filegoups

At a minimum, every SQL Server database has two operating system files: a data file and a log file. Data files contain data and objects such as tables, indexes, stored procedures, and views. Log files contain the information that is required to recover all transactions in the database. Data files can be grouped together in filegroups for allocation and administration purposes.

Filegroups can be created when the database is first created or created later when more files are added to the database. However, you cannot move files to a different filegroup after the files have been added to the database.

A file cannot be a member of more than one filegroup. Tables, indexes, and large object (LOB) data can be associated with a specific filegroup. This means that all their pages are allocated from the files in that filegroup.

Why use filegroups?

  • you have one or more objects tha have heavy read/write activity
  • you’ve already tuned through indexes and query writing
  • you need a performance boost
  • you have additional storage you can utilize to separate the objects
  • you want to separate tha data so administration tasks such as backups take less time
  • you have a large database (>1TB)
  • disaster and recovery

Best Practices

  • separate data and log files onto separate disks
  • separate tempdb onto its own disk
  • at least two filegroups – primary and one user-defined (default)
  • files in a filegroup should be equally sized for equal proportion of writes
  • use filegroups to isolate objects with heavy read and write activity from each other

Execution Plan

books

Why I started my blog with some posts about query optimizer, execution plans and logical processing? Well, before you learn about codes and more codes we need to know how this works in SQL Server. For example, the developer know how to write a single select, insert or update statement or  a batch with conditions clauses like IF, but what happens when you execute the query? We need to know how the whole thing works and the concepts to work with sets.

So, SQL Server and others databases has a lot of information for your simple or complex batch to work well and others tools to help our job. The most important is to write the best query as possible. If you write a simple select, the first thing your have to see before you put in your application is the execution plan, IO and time of query response.

People are not very keen on waiting. They get nervous in a traffic jam. They are not too satisfied if they have to sit without a drink for a while in a bar. Similarly, they want their applications to be as responsive as possible. End users perceive performance problems through waiting.

Except for very simple queries, a query can be executed in many different ways. How many ways? Well, the number of different ways of execution or execution plans grows exponentially with query complexity.

For example, analyze the following pseudo-query very superficially:
SELECT
    A.col5
    , SUM (C.col6) AS col6sum
FROM
    TableA AS A
INNER JOIN TableB AS ON A.col1 B.col1
INNER JOIN TableC AS C ON B.col2 = C.col2
WHERE
    A.col3constant1 AND B.col4 constant2
GROUP BY
    A.col5;

Start with the FROM part. Which tables should SQL Server join first, TableA and TableB or TableB and TableC? And in each join, which of the two tables joined should be the left and which one the right table? The number of all possibilities is six, if the two joins are evaluated linearly, one after another. With evaluation of multiple joins at the same time, the number of
all possible combinations for processing the joins is already 12. The actual formula for possible combinations of join evaluation is n!, or n factorial, for linear evaluation, and (2n -2)!/ (n-1)! for parallel evaluation of possible joins.

This already gives four options for each join. So far, there are 6 x 4 = 24 different options for only the FROM part of this query. But the real situation is even worse. SQL Server can execute a hash join in three different ways. As mentioned, this is just a quick superficial analysis of pseudo-query execution, and for this introduction to query optimization problems, such details are not needed.

In the WHERE clause, two expressions are connected with a logical AND operator. The logical AND operator is commutative, so SQL Server can evaluate the second expression first.

Again, there are two choices. Altogether, there are already 6 x 4 x 2 = 48 choices. And again, the real situation is much worse. Because in the pseudo-query all joins are inner joins and because expressions in the WHERE clause are commutative, SQL Server can even start executing the query with any of the expressions of the WHERE clause, then switch to the FROM clause and perform first a join, evaluate the second expression from the WHERE clause, and so on.

So the number of possible plans is already much higher than 48. For this superficial overview, continue with the GROUP BY clause. SQL Server can execute this part in two ways, as an ordered group or as a hash group. Therefore, the number of options for executing the pseudo-query is already 6 x 4 x 2 x 2 = 96.

The important conclusion is that you can see that the number of different possible execution plans for a query grows factorially with query complexity. You can quickly get billions of possible execution plans. SQL Server has to decide which one to use in a very short time. You wouldn’t want to wait, for example, for a whole day for SQL Server to find out the best possible plan and then execute your queries in 5 seconds instead of in 15 seconds. Now you can imagine the complexity of the problems the SQL Server Query Optimizer has to solve with any single query.

 Writing queries requires basic T-SQL knowledge; writing well-performing queries needs much more advanced knowledge.

Structured Query Language

By a “set” we mean any collection M into a whole of definite, distinct objects m (which are called the “elements” of M) of our perception or of our thought.

—George Cantor, in “Georg Cantorby Joseph W. Dauben (Princeton University Press, 1990)

Standard SQL is based on the relational model, which is a mathematical model for data management and manipulation. The relational model was initially created and proposed by Edgar F. Codd in 1969. Since then, it has been explained and developed by Chris Date, Hugh Darwen, and others.

Transact-SQL (T-SQL) is the main language used to manage and manipulate data in Microsoft SQL Server.

Microsoft SQL Server Management Studio showing...
Microsoft SQL Server Management Studio showing a query, the results of the query, and the Object Explorer pane while connected to a SQL Server database engine instance. (Photo credit: Wikipedia)

In fact, T-SQL is based more on multiset theory than on set theory. A multiset (also known as a bag or a superset) in many respects is

similar to a set, but can have duplicates. The T-SQL language does give you enough tools so that if you want to follow relational theory, you can do so. For example, the language provides you with a DISTINCT clause to remove duplic

ates.

It then moves on to describe one of the most important concepts you need to know about the language—logical query processing.