Friday, 13 June 2014

Understanding of Execution Plan [What happened When SQL statement Execute]

Introduction

Performance of query is big factor for every developer. I always put it to higher priority and try to understand the factors behind it. When a T-SQL Query takes a long time to execute, we all say that we must see the execution plan to understand the lack behind it.

As I saw that lot of junior profession cannot understand the execution plan of MS SQL Server properly. So they are unable to find the root cause of the performance lack of a T-SQL query. Here in my blog post we are trying to understand the execution plan properly so that we can understand our query well.

To understand the execution plan is not an easy task and cannot be completed within a single article, so we are thinking to publish it by step by step way with multiple article.
Today is our first session related to understanding of execution plan. In this article we are trying to discuss what happened when we execute a SQL statement.

What happened When SQL statement Execute

To meet the requirement of the query there are two processes that we have to discuss.
1.    Process that occurs in the Relational Engine
2.    Process that occurs in the Storage Engine

Process that occurs in the Relational Engine

Query Parsing
When the T-SQL query is submitted it checks first that it written correctly or not. This type of syntax checking is called Query parsing. For example if we write a query with SELECTE instead of SELECT the parsing process stop and SQL server returns an Error to the query source.

After successful parsing of a query the output is called the Parse Tree or Query Tree or Sequence Tree.

 Here we want to mention that the DDL statement is not be Optimized. For example CREATE TABLE/ DROP TABLE etc… So there is only one way to create and drop table and nothing to optimize it.




 Algebrizer

The parse tree is passes to a process called Algebrizer. Here it resolves the name of the various object putted in the T-SQL statement. It resolves all the names of the objects, tables and columns, referred to within the query string.  It identified Individual columns level, its data type for the objects being accessed. It also determines the location of aggregates (such as GROUP BY, and MAX) within the query, a process called aggregate binding.

This algebrizer process is important because the query may have aliases or synonyms, names that don't exist in the database, that need to be resolved, or the query may refer to objects not in the database. When objects don't exist in the database, SQL Server returns an error from this step, defining the invalid object name.
The algebrizer outputs a binary called the query processor tree, which is then passed on to the query optimizer.

Query Optimizer

The most important pieces of data used by the Query Optimizer are statistics, which MS SQL Server generates and maintains against indexes and columns, explicitly for use by the optimizer.

Using the query processor tree and the statistics, the optimizer applies the model in order to work out what it thinks will be the optimal way to execute the query – that is, it generates an execution plan.

The Query Optimizer decides if it can access the data through indexes, what types of joins to use and etc. The decisions made by the optimizer are based on what it calculates to be the cost of a given execution plan, in terms of the required CPU processing and I/O. Hence, this is a cost-based plan.

The Query Optimizer generate multiple execution plan and depends on cost basis it choose the best one from them.

The calculation of the Execution cost is the most important calculation for the Query Optimizer and the optimizer will use a process that is more CPU-intensive if it returns results that much faster. If the optimizer thinks that the costing is so high that it cannot give enough time to process and choose the best one, then it can be select less efficient plan also.
If we use a simple select statement from a single table objects without any aggregation or grouping (SELECT * FROM tbl_employee), rather to span time to calculate cost and select the most efficient plan the optimizer use a Trivial plan on it.

SELECT a.empid, a.empname, b.sal
FROM    tbl_employee AS a
              INNER JOIN tbl_empsal AS b ON a.empid=b.empid

If we add a JOIN clause with this SELECT make the plan non-Trivial and in this case Query optimizer used the cost based calculation to select the plan.
Once the optimizer arrives at an execution plan, the estimated plan is created and stored in a memory space known as the plan cache – although this is all different if a plan already exists in cache. As stated earlier, if the optimizer finds a plan in the cache that matches the currently executing query, this whole process is short-circuited.

Process that occurs in the Storage Engine

Once the Optimizer generates an Execution plan or retrieve it from case, the action switched to the storage engine.
After all the heavy duty work done by optimizer to generate the execution plan, this carefully generated execution plan may be subject to change depends on some criteria

1.    MS SQL Server determines that the plan exceeds the threshold for a parallel execution (an execution that takes advantage of multiple processors on the machine).
2.     The Statistic that is used to generate the execution plan has been changed.
3.    Processes or objects within the query, such as data inserts to a temporary table, result in a recompilation of the execution plan.




Hope you like it.




Posted by: MR. JOYDEEP DAS