Friday 24 October 2014

Understanding the Execution Plan [ When Table JOIN occurs Part-I ]

Introduction

In my previous article we are just using a single table as per our example is concern, Now in this article we are going to see the different table join each other’s and make a  single data set by using JOIN. When a JOIN command is issued in T-SQL, it will be resolved through a Join operator. Hope it will be interesting.

How it Works

SELECT e.JobTitle, a.City,
       p.LastName + ', ' + p.FirstName AS EmployeeName
FROM   HumanResources.Employee AS e
       INNER JOIN Person.BusinessEntityAddress AS bea
                       ON e.BusinessEntityID = bea.BusinessEntityID
       INNER JOIN Person.Address a ON bea.AddressID = a.AddressID
       INNER JOIN Person.Person AS p
             ON e.BusinessEntityID = p.BusinessEntityID;


If we look at the above query the First Name and Last Name is combining together as a meaningful manners. Now we look at the Execution plan.



To read this Execution plan we move from Right to Left. Here we have to identify the most costly operation occurs in the Execution Plan.

1. Index Scan against the Person.Address table (31%).
2. Hash Match join operation between the Person.Person table and the output from the
    first Hash Match (21%).
3. Other Hash Match join operator between the Person.Address table and the output from
     the Nested Loop operator (20%).

There are multiple problematic operators in this query since, in general, we're better off with Seek operations instead of Scan operations.



The query optimizer needed to get at the AddressId and the City columns, as shown by the Output List at the bottom of the ToolTip. Please look at the output List of the tooltip of above figure.  

The Query optimizer calculates the cost based on Index and statistics of the table objects. Here it takes Actual Number of Rows 19614. That means data was to scan the Index row by row and 19641 rows takes the estimated cost 31% and the total Estimated Operation Cost is 0.158681 (31%).

Here the Estimated Operation Cost means the cost to the Query Optimizer for Executing the specific operation. If Estimated Operation Cost is lower is more efficient the Operation. That does not mean that only seeing the Estimated Operation Cost we decide that it is most expensive operation. We have to calculate other factors also.

What is Has Match Join
In this execution plan we also find the Hash Match Join Operators. First we have to understand what it does.

Hash Match just put two data set into a Temporary Table called Hash Table and use this structure to compare data arrive at the matching set.

Here in our Execution Plan we find two Hash Match join operators. First we find just before SELECT operators (Must read from Right to Left). It just JOIN the output of INDEX SCAN with output of the rest of the operators in the query. This the second most expensive operation in our execution plans. Now we see the tooltip property of this Hash Match Join.



So we have to understand Hash Match, but before that we have to go throw with two concepts called Hashing and Hash Table.

Hashing
It is a programmatic technique where data is converted into symbolic forms for much more efficient searching of data. , SQL Server programmatically converts a row of data in a table into a unique value that represents the contents of the row. We can say it like encryption, a hash value converted to the original data.

Hash Table
It is a data structure and allows quick access to the element. The SQL server take a row from table  and hash it into a hash value and store the hash value into a hash table in temp db.

Note: for more information please search on google.

Now move it our main flow, the Hash Match Join operators when SQL server has to join two large datasets. It decides to do so by Hashing the row from the smaller of the two data sets and inserting them into hash table. It then processes the larger data set, one row at a time, against the hash table, looking for matches, indicating the rows to be joined.
If the hash table is relatively small this can be quick in process. If the both table are very large, the hash Match join can be very ineffective in compare to the other type of join. As all the data is stored in the Temp DB, so excessive use of Hash Join in our query provide heavier load of Temp DB.

In our example data from HumanResources.Employee is matched with the Person.Person table. The Hash Match join occurs well on a Table that are not sorted on JOIN columns means where there is no usable Index. Here in this case MERGE JOIN to work better.

Sending a Hash Match join into execution plan sometimes indicate

1.    Missing or Unused Index
2.    Missing Where clause
3.    A Where clause with Calculation or Conversion

Hare for Hash Match Join we have to investigate our query. That means we have to tune our query by adding Index to make the Join operation more efficient.

Related Reference

Understanding of Execution Plan – III - C [ The OPERATORS]

Understanding of Execution Plan – III - B [ The OPERATORS ]

Understanding of Execution Plan [What happened When SQL statement Execute]

Understanding of Execution Plan – II [Reuse of the Execution Plan]


Summary
In the next level we have to more discuss about our execution plan. So this series will be continued for some more articles. Please be with us.


Hope you like it.





Posted by: MR. JOYDEEP DAS 

No comments:

Post a Comment