Microsoft SQL Server Knowledge Bank: November 2012

Tuesday, 27 November 2012

Improve the performance by Execution Plan

Introduction

Developers ask a common quest "How to improve the performance of a SQL Query". It is not so easy to answer as lot of factors is related to it. There are some general guidelines that we can follow to improve the overall performance of a query.

But I recommended the execution plan to understand the performance of the query. I preferred execution plan when I am building the query block step by step.

In this article I am trying to show a basic strategy, how to improve a query by observing the query plan.

Prerequisite

To understand this article, we have a very good knowledge of Index, Index Scan, Index Seek, Table scan etc. Please follow the related tropics of this article, to complete this.

Improving Query

To understand it properly, I am taken an example.

Step-1 [ Creating The Base Table ]

-- Creating the Base Table

IF OBJECT_ID('Emp_Dtls') IS NOT NULL

BEGIN

DROP TABLE Emp_Dtls;

END

CREATE TABLE Emp_Dtls

(EMPID INT NOT NULL IDENTITY,

EMPNAME VARCHAR(50) NOT NULL,

EMPGRADE VARCHAR(1) NOT NULL,

EMPDEPT VARCHAR(30) NOT NULL);

Step-2 [ Inserting the Records ]

DECLARE @i INT=1;

BEGIN TRY

BEGIN TRAN

WHILE (@i <= 50000)

BEGIN

INSERT INTO Emp_Dtls

(EMPNAME, EMPGRADE, EMPDEPT)

VALUES('Developer-'+CONVERT(VARCHAR, @i),'C','DEV');

SET @i=@i+1;

END

SET @i=1;

WHILE (@i <= 50000)

BEGIN

INSERT INTO Emp_Dtls

(EMPNAME, EMPGRADE, EMPDEPT)

VALUES('Devlivery Mgr-'+CONVERT(VARCHAR, @i),'B','DM');

SET @i=@i+1;

END

SET @i=1;

WHILE (@i <= 50000)

BEGIN

INSERT INTO Emp_Dtls

(EMPNAME, EMPGRADE, EMPDEPT)

VALUES('Manager-'+CONVERT(VARCHAR, @i),'A','MGR');

SET @i=@i+1;

END

COMMIT TRAN

END TRY

BEGIN CATCH

ROLLBACK TRAN

END CATCH

Step-3 [ See the Actual Execution Plan ]

-- Execution Plan-1 [ Table Scan ]

SELECT * FROM Emp_Dtls;

As there is NO INDEX defined it going to TABLE SCAN. So the performance of the SQL Query is worst. We have to improve the performance of the Query.

Step-4 [ Create Clustered Index ]

As there is no index over here and the attribute "EMPID" of Table objects "EMP_DTLS" has INTEGER data type, so it is a good candidate key for CLUSTERED INDEX. Now we are going to create the CLUSTERED INDEX on it.

-- Create Custered Index

IF EXISTS(SELECT *

FROM sys.sysindexes

WHERE id = OBJECT_ID('Emp_Dtls')

AND name ='IX_CLUS_Emp_Dtls')

BEGIN

DROP INDEX Emp_Dtls.IX_CLUS_Emp_Dtls;

END

CREATE CLUSTERED INDEX IX_CLUS_Emp_Dtls

ON Emp_Dtls(EMPID);

Aster creating the CLUSTERED INDEX we are going to see the EXECUTION plan again that it Improves or NOT.

-- Execution Plan-2 [ Clustered Index Scan ]

SELECT * FROM Emp_Dtls;

Now we can see that there is Clustered Index Scan. So the performance is little bit improve. At least it uses the CLSUTERED INDEX.

Step-5[ Putting WHERE conditions in Query ]

-- Execution Plan-3 [ Using WHERE Conditions ]

SELECT EMPID, EMPNAME FROM Emp_Dtls WHERE EMPGRADE='A';

As the "EMPGRADE" is used in the WHERE conditions we are going to make a NON CLUSTERED Index on it.

-- Now Create Non Clustered Index on EMPGRADE

IF EXISTS(SELECT *

FROM sys.sysindexes

WHERE id = OBJECT_ID('Emp_Dtls')

AND name ='IX_NONCLUS_EMPGRADE')

BEGIN

DROP INDEX Emp_Dtls.IX_NONCLUS_EMPGRADE;

END

CREATE NONCLUSTERED INDEX IX_NONCLUS_EMPGRADE

ON Emp_Dtls(EMPGRADE);

Now again see the execution plan.

-- Execution Plan-4

SELECT EMPID, EMPNAME FROM Emp_Dtls WHERE EMPGRADE='A';

Here again the clustered Index is used. The non clustered index that we created is not used here. Why?

You can extend the functionality of nonclustered indexes by adding nonkey columns to the leaf level of the nonclustered index. By including nonkey columns, you can create nonclustered indexes that cover more queries.

In this Example the "EMPNAME" is a NONKEY Columns.

Step-6[ Solve the Problem ]

-- Non clustered Index with Incluse

IF EXISTS(SELECT *

FROM sys.sysindexes

WHERE id = OBJECT_ID('Emp_Dtls')

AND name ='IX_NONCLUS_EMPGRADE_EMPNAME')

BEGIN

DROP INDEX Emp_Dtls.IX_NONCLUS_EMPGRADE_EMPNAME;

END

CREATE NONCLUSTERED INDEX IX_NONCLUS_EMPGRADE_EMPNAME

ON Emp_Dtls(EMPGRADE) INCLUDE(EMPNAME);

Now again see the Execution Plan.

-- Execution Plan-5

SELECT EMPID, EMPNAME FROM Emp_Dtls WHERE EMPGRADE='A';

Now the desired output came and it is INDEX SEEK.

Hope you like it.

Posted by: MR. JOYDEEP DAS

Monday, 26 November 2012

Function in WHERE clause

Introduction

When we are using the function in the SELECT statement it returns data depends on the number of records retrieve by the select statement and what type of value is passed in the scalar function. The performance of the MS SQL Query is little bit degraded using the scalar function. But developer often used it for encapsulate frequently performed logic.

But if we saw some SELECT statement, we can see that the functions can be used in WHERE conditions also. Using scalar or User defines function in WHERE clause is a good idea? Is this hampering the performance?

This article is related to it.

Is it Bad?

Using function on WHERE clause causes Index scan. The reason for this is that the function value has to be evaluated for each row of data to determine it matches our criteria.

How we understand it

To understand it properly, here I am taking an example. We compare the Actual Execution plan of the Query and understand how the function in the WHERE clause effects the query by Index Scanning.

Step-1 [ Creating the Base Table ]

-- Base Table

IF OBJECT_ID('tbl_EMPDTLS') IS NOT NULL

BEGIN

DROP TABLE tbl_EMPDTLS;

END

CREATE TABLE tbl_EMPDTLS

(EMPID INT NOT NULL IDENTITY PRIMARY KEY,

EMPFNAME VARCHAR(50) NOT NULL,

EMPLNAME VARCHAR(50) NOT NULL,

EMPGRADE VARCHAR(1) NOT NULL,

EMPEMAIL VARCHAR(50) NOT NULL,

DOJ DATETIME NOT NULL);

Step-2 [ Creating the Index Information ]

-- Creating Non clustered Index

IF EXISTS(SELECT *

FROM sys.indexes

WHERE object_id=OBJECT_ID('tbl_EMPDTLS')

AND name ='IX_NONC_EMPFNAME')

BEGIN

DROP INDEX tbl_EMPDTLS.IX_NONC_EMPFNAME;

END

CREATE NONCLUSTERED INDEX IX_NONC_EMPFNAME

ON tbl_EMPDTLS (EMPFNAME)

IF EXISTS(SELECT *

FROM sys.indexes

WHERE object_id=OBJECT_ID('tbl_EMPDTLS')

AND name ='IX_NONC_EMPGRADE')

BEGIN

DROP INDEX tbl_EMPDTLS.IX_NONC_EMPGRADE;

END

CREATE NONCLUSTERED INDEX IX_NONC_EMPGRADE

ON tbl_EMPDTLS (EMPLNAME);

IF EXISTS(SELECT *

FROM sys.indexes

WHERE object_id=OBJECT_ID('tbl_EMPDTLS')

AND name ='IX_NONC_EMPEMAIL')

BEGIN

DROP INDEX tbl_EMPDTLS.IX_NONC_EMPEMAIL;

END

CREATE NONCLUSTERED INDEX IX_NONC_EMPEMAIL

ON tbl_EMPDTLS (EMPEMAIL);

IF EXISTS(SELECT *

FROM sys.indexes

WHERE object_id=OBJECT_ID('tbl_EMPDTLS')

AND name ='IX_NONC_DOJ')

BEGIN

DROP INDEX tbl_EMPDTLS.IX_NONC_DOJ;

END

CREATE NONCLUSTERED INDEX IX_NONC_DOJ

ON tbl_EMPDTLS (DOJ);

Step-3 [ Inserting some records in the Table ]

-- Inserting Records

INSERT INTO tbl_EMPDTLS

(EMPFNAME, EMPLNAME, EMPGRADE, EMPEMAIL, DOJ)

VALUES ('JOYDEEP', 'DAS', 'B', 'joydeep@abc.com','03-12-2006'),

('RAJECH', 'DAS', 'C', 'rajesh@abc.com', '01-12-2006'),

('SUKAMAL', 'JANA', 'B', 'suku@abc.com', '03-12-2004'),

('TUHIN', 'SHINAH', 'B', 'tuhin@abc.com', '07-12-2001'),

('SANGRAM', 'JIT', 'B', 'sangram@abc.com','01-10-2011'),

('SUDIP', 'DAS', 'A', 'sudip@abc.com', '07-11-1990'),

('RANI', 'LAL', 'B', 'rani@abc.com', '03-12-2006'),

('JOHN', 'IBRAHAM','C', 'john@abc.com', '01-05-2007'),

('BHUPEN', 'SINGH', 'A', 'bhapu@abc.com', '03-12-2006'),

('SAIKAT', 'SREE', 'B', 'saikat@abc.com', '01-12-1906'),

('SUJATA', 'LALA', 'B', 'sujata@abc.com', '03-12-2012'),

('RAJU', 'ROSTOGU','C', 'raju@abc.com', '03-12-2006'),

('ROHIT', 'KUMAR', 'C', 'rohit@abc.com', '01-10-2012'),

('VIPIN', 'PAUL', 'B', 'vipin@abc.com', '01-11-2006'),

('VINODH', 'CHOPRA', 'C', 'vinodh@abc.com', '03-12-2006'),

('KALLU', 'SHEK', 'B', 'joydeep@abc.com','01-11-2011')

Step-4 [ Exciting the Query ]

-- Example Set-1

SELECT *

FROM tbl_EMPDTLS WITH(INDEX(IX_NONC_EMPFNAME))

WHERE EMPFNAME LIKE 'J%';

SELECT *

FROM tbl_EMPDTLS WITH(INDEX(IX_NONC_EMPFNAME))

WHERE LEFT(EMPFNAME,1) = 'J';

-- Example Set-2

SELECT *

FROM tbl_EMPDTLS WITH(INDEX(IX_NONC_EMPFNAME))

WHERE EMPFNAME='JOYDEEP'

AND EMPLNAME='DAS'

SELECT *

FROM tbl_EMPDTLS WITH(INDEX(IX_NONC_EMPFNAME))

WHERE EMPFNAME+EMPLNAME='JOYDEEPDAS'

-- Example Set-3

SELECT *

FROM tbl_EMPDTLS WITH(INDEX(IX_NONC_DOJ))

WHERE DOJ = '03-12-2004'

SELECT *

FROM tbl_EMPDTLS WITH(INDEX(IX_NONC_DOJ))

WHERE DOJ < GETDATE()

SELECT *

FROM tbl_EMPDTLS WITH(INDEX(IX_NONC_DOJ))

WHERE DATEDIFF(day, DOJ, '03-12-2004') = 0

Conclusion

So from execution plan we find that using function in WHERE clause is a bad idea.

Hope you like it.

Posted by: MR. JOYDEEP DAS

Thursday, 22 November 2012

“IN”, “EXISTS” clause and their performance

Introduction

To improve the performance of the Query, the general guideline is not to prefer the "IN" Clause. The guideline of the MS SQL query performance says that if we needs "IN" clause, instead of using "IN" clause we must use the "EXISTS" clause. As the "EXISTS" clause improve the performance of the query.

This article is related to "IN", "EXISTS" clause and their performance factors.

Is "IN" and "EXISTS" clause are same

IN Clause

Returns true if a specified value matches any value in a sub query or a list.

EXISTS Clause

Returns true if a sub query contains any rows.

So we see that the "IN" and the "EXISTS" cluse are not same. To support the above definition lets takes an example.

-- Base Tabe

IF OBJECT_ID('emp_DtlTbl') IS NOT NULL

BEGIN

DROP TABLE emp_DtlTbl;

END

IF OBJECT_ID('emp_GradeTbl') IS NOT NULL

BEGIN

DROP TABLE emp_GradeTbl;

END

CREATE TABLE emp_DtlTbl

(EMPID INT NOT NULL IDENTITY PRIMARY KEY,

EMPNAME VARCHAR(50)NOT NULL);

CREATE TABLE emp_GradeTbl

(EMPID INT NOT NULL IDENTITY PRIMARY KEY,

GRADE VARCHAR(1) NOT NULL);

-- Insert Records

INSERT INTO emp_DtlTbl

(EMPNAME)

VALUES ('Joydeep Das'), ('Sukamal Jana'), ('Sudip Das');

INSERT INTO emp_GradeTbl

(GRADE)

VALUES ('B'), ('B'), ('A');

-- [ IN ] Clause Example-1

SELECT *

FROM emp_DtlTbl;

WHERE EMPID IN(SELECT EMPID FROM emp_DtlTbl);

-- [ IN ] Clause Example-2

SELECT *

FROM emp_DtlTbl

WHERE EMPID IN(1,2,3);

-- [ EXISTS ] Clause Example

SELECT a.*

FROM emp_DtlTbl a

WHERE EXISTS(SELECT b.*

FROM emp_DtlTbl b

WHERE b.EMPID = a.EMPID);

Performance Factors

To understand the performance factors let see the actual execution plan for "IN" and "EXISTS" clauses.

Take this example:

-- [ IN ] Clause Exampl

SELECT *

FROM emp_DtlTbl

WHERE EMPID =(SELECT EMPID

FROM emp_DtlTbl

WHERE EMPID = 2);

-- [ EXISTS ] Clause Example

SELECT a.*

FROM emp_DtlTbl a

WHERE EXISTS(SELECT b.*

FROM emp_DtlTbl b

WHERE b.EMPID = 2

AND b.EMPID = a.EMPID);

If we compare the total query costs of the both MS SQL query, we see that the IN clause query cost is higher than the EXISTS clause query costs.

Special notes

Please note that: Here the data of the table is limited, so we cannot measure the performance factors.

Hope you like it.

Posted by: MR. JOYDEEP DAS