Introduction
In my
previous article we are just using a single table as per our example is concern,
Now in this article we are going to see the different table join each other’s
and make a single data set by using
JOIN. When a JOIN command is issued in T-SQL, it will be resolved through a Join
operator. Hope it will be interesting.
How it Works
SELECT e.JobTitle, a.City,
p.LastName + ', ' + p.FirstName AS EmployeeName
p.LastName + ', ' + p.FirstName AS EmployeeName
FROM
HumanResources.Employee AS e
INNER JOIN Person.BusinessEntityAddress
AS bea
ON
e.BusinessEntityID = bea.BusinessEntityID
INNER JOIN Person.Address a ON bea.AddressID = a.AddressID
INNER JOIN Person.Person AS p
ON e.BusinessEntityID = p.BusinessEntityID;
ON e.BusinessEntityID = p.BusinessEntityID;
If we look at
the above query the First Name and Last Name is combining together as a
meaningful manners. Now we look at the Execution plan.
To read this
Execution plan we move from Right to Left. Here we have to identify the most
costly operation occurs in the Execution Plan.
1.
Index Scan against the Person.Address table (31%).
2.
Hash Match join operation between the Person.Person table and the output from
the
first Hash Match (21%).
first Hash Match (21%).
3. Other Hash Match join
operator between the Person.Address table and the output from
the Nested Loop operator (20%).
the Nested Loop operator (20%).
There are
multiple problematic operators in this query since, in general, we're better
off with Seek operations instead of Scan operations.
The query
optimizer needed to get at the AddressId and the City columns, as shown by the Output
List at the bottom of the ToolTip. Please look at the output List of the
tooltip of above figure.
The Query optimizer
calculates the cost based on Index and statistics of the table objects. Here it
takes Actual Number of Rows 19614. That means data was to scan the Index
row by row and 19641 rows takes the estimated cost 31% and the total Estimated
Operation Cost is 0.158681 (31%).
Here the Estimated
Operation Cost means the cost to the Query Optimizer for Executing the
specific operation. If Estimated Operation Cost is lower is more efficient the
Operation. That does not mean that only seeing the Estimated Operation Cost we
decide that it is most expensive operation. We have to calculate other factors
also.
What is Has Match Join
In this
execution plan we also find the Hash Match Join Operators. First we have
to understand what it does.
Hash Match just put two data set into a Temporary
Table called Hash Table and use this structure to compare data arrive at the
matching set.
Here in our
Execution Plan we find two Hash Match join operators. First we find just before
SELECT operators (Must read from Right to Left). It just JOIN the output of
INDEX SCAN with output of the rest of the operators in the query. This the
second most expensive operation in our execution plans. Now we see the tooltip
property of this Hash Match Join.
So we have to
understand Hash Match, but before that we have to go throw with two concepts
called Hashing and Hash Table.
Hashing
It is a programmatic technique where data is
converted into symbolic forms for much more efficient searching of data. , SQL
Server programmatically converts a row of data in a table into a unique value
that represents the contents of the row. We can say it like encryption, a hash
value converted to the original data.
Hash Table
It is a data
structure and allows quick access to the element. The SQL server take a row
from table and hash it into a hash value
and store the hash value into a hash table in temp db.
Note: for
more information please search on google.
Now move it our
main flow, the Hash Match Join operators when SQL server has to join two large
datasets. It decides to do so by Hashing the row from the smaller of the two
data sets and inserting them into hash table. It then processes the larger data
set, one row at a time, against the hash table, looking for matches, indicating
the rows to be joined.
If
the hash table is relatively small this can be quick in process. If the both
table are very large, the hash Match join can be very ineffective in compare to
the other type of join. As all the data is stored in the Temp DB, so excessive use
of Hash Join in our query provide heavier load of Temp DB.
In
our example data from HumanResources.Employee is matched with the Person.Person
table. The Hash Match join occurs well on a Table that are not sorted on
JOIN columns means where there is no usable Index. Here in this case MERGE JOIN
to work better.
Sending
a Hash Match join into execution plan sometimes indicate
1.
Missing
or Unused Index
2.
Missing
Where clause
3.
A
Where clause with Calculation or Conversion
Hare for Hash Match Join we have to investigate our
query. That means we have to tune our query by adding Index to make the Join
operation more efficient.
Related Reference
Understanding of
Execution Plan – III - C [ The OPERATORS]
Understanding of
Execution Plan – III - B [ The OPERATORS ]
Understanding of
Execution Plan [What happened When SQL statement Execute]
Understanding of
Execution Plan – II [Reuse of the Execution Plan]
Summary
In the next level we have to more discuss about our
execution plan. So this series will be continued for some more articles. Please
be with us.
Hope you like it.
Posted
by: MR. JOYDEEP DAS
No comments:
Post a Comment