Tuesday, 27 November 2012

Improve the performance by Execution Plan

Introduction
Developers ask a common quest "How to improve the performance of a SQL Query". It is not so easy to answer as lot of factors is related to it. There are some general guidelines that we can follow to improve the overall performance of a query.
But I recommended the execution plan to understand the performance of the query. I preferred execution plan when I am building the query block step by step.
In this article I am trying to show a basic strategy, how to improve a query by observing the query plan.

Prerequisite
To understand this article, we have a very good knowledge of Index, Index Scan, Index Seek, Table scan etc. Please follow the related tropics of this article, to complete this.

Improving Query
To understand it properly, I am taken an example.
Step-1 [ Creating The Base Table ]
-- Creating the Base Table
IF OBJECT_ID('Emp_Dtls') IS NOT NULL
   BEGIN
      DROP TABLE Emp_Dtls;
   END
GO  
CREATE TABLE Emp_Dtls
       (EMPID    INT          NOT NULL IDENTITY,
        EMPNAME  VARCHAR(50)  NOT NULL,
        EMPGRADE VARCHAR(1)   NOT NULL,
        EMPDEPT  VARCHAR(30)  NOT NULL);
GO
Step-2 [ Inserting the Records ]
DECLARE @i INT=1;
BEGIN TRY
BEGIN TRAN
WHILE (@i <= 50000)
BEGIN
      INSERT INTO Emp_Dtls
             (EMPNAME, EMPGRADE, EMPDEPT) 
      VALUES('Developer-'+CONVERT(VARCHAR, @i),'C','DEV');
      SET @i=@i+1;
END

SET @i=1;  
WHILE (@i <= 50000)
BEGIN
      INSERT INTO Emp_Dtls
             (EMPNAME, EMPGRADE, EMPDEPT) 
      VALUES('Devlivery Mgr-'+CONVERT(VARCHAR, @i),'B','DM');
      SET @i=@i+1;
END  

SET @i=1;  
WHILE (@i <= 50000)
BEGIN
      INSERT INTO Emp_Dtls
             (EMPNAME, EMPGRADE, EMPDEPT) 
      VALUES('Manager-'+CONVERT(VARCHAR, @i),'A','MGR');
      SET @i=@i+1;
END  
COMMIT TRAN
END TRY
BEGIN CATCH
ROLLBACK TRAN
END CATCH
Step-3 [ See the Actual Execution Plan ]
-- Execution Plan-1 [ Table Scan ]
SELECT * FROM Emp_Dtls; 



As there is NO INDEX defined it going to TABLE SCAN. So the performance of the SQL Query is worst. We have to improve the performance of the Query.
Step-4 [ Create Clustered Index ]
As there is no index over here and the attribute "EMPID" of Table objects "EMP_DTLS" has INTEGER data type, so it is a good candidate key for CLUSTERED INDEX. Now we are going to create the CLUSTERED INDEX on it.
-- Create Custered Index 
IF EXISTS(SELECT *
          FROM   sys.sysindexes
          WHERE  id = OBJECT_ID('Emp_Dtls')
                 AND name ='IX_CLUS_Emp_Dtls')
   BEGIN
      DROP INDEX Emp_Dtls.IX_CLUS_Emp_Dtls;
   END      
GO
CREATE CLUSTERED INDEX IX_CLUS_Emp_Dtls
ON  Emp_Dtls(EMPID);
Aster creating the CLUSTERED INDEX we are going to see the EXECUTION plan again that it Improves or NOT.
-- Execution Plan-2 [ Clustered Index Scan ]
SELECT * FROM Emp_Dtls;



Now we can see that there is Clustered Index Scan. So the performance is little bit improve. At least it uses the CLSUTERED INDEX.
 Step-5[ Putting WHERE conditions in Query ]
-- Execution Plan-3 [ Using WHERE Conditions ]
SELECT EMPID, EMPNAME FROM Emp_Dtls WHERE EMPGRADE='A'; 



 As the "EMPGRADE" is used in the WHERE conditions we are going to make a NON CLUSTERED Index on it.
-- Now Create Non Clustered Index on EMPGRADE
IF EXISTS(SELECT *
          FROM   sys.sysindexes
          WHERE  id = OBJECT_ID('Emp_Dtls')
                 AND name ='IX_NONCLUS_EMPGRADE')
   BEGIN
      DROP INDEX Emp_Dtls.IX_NONCLUS_EMPGRADE;
   END      
GO
CREATE NONCLUSTERED INDEX IX_NONCLUS_EMPGRADE
ON  Emp_Dtls(EMPGRADE); 
GO

Now again see the execution plan.
-- Execution Plan-4
SELECT EMPID, EMPNAME FROM Emp_Dtls WHERE EMPGRADE='A'; 



Here again the clustered Index is used. The non clustered index that we created is not used here. Why?
You can extend the functionality of nonclustered indexes by adding nonkey columns to the leaf level of the nonclustered index. By including nonkey columns, you can create nonclustered indexes that cover more queries.
In this Example the "EMPNAME" is a NONKEY Columns.
Step-6[ Solve the Problem ]
-- Non clustered Index with Incluse
IF EXISTS(SELECT *
          FROM   sys.sysindexes
          WHERE  id = OBJECT_ID('Emp_Dtls')
                 AND name ='IX_NONCLUS_EMPGRADE_EMPNAME')
   BEGIN
      DROP INDEX Emp_Dtls.IX_NONCLUS_EMPGRADE_EMPNAME;
   END      
GO
CREATE NONCLUSTERED INDEX IX_NONCLUS_EMPGRADE_EMPNAME
ON Emp_Dtls(EMPGRADE) INCLUDE(EMPNAME);
Now again see the Execution Plan.
-- Execution Plan-5
SELECT EMPID, EMPNAME FROM Emp_Dtls WHERE EMPGRADE='A';



Now the desired output came and it is INDEX SEEK.

Related Tropics

Hope you like it.


Posted by: MR. JOYDEEP DAS

Monday, 26 November 2012

Function in WHERE clause

Introduction
When we are using the function in the SELECT statement it returns data depends on the number of records retrieve by the select statement and what type of value is passed in the scalar function. The performance of the MS SQL Query is little bit degraded using the scalar function. But developer often used it for encapsulate frequently performed logic.
But if we saw some SELECT statement, we can see that the functions can be used in WHERE conditions also. Using scalar or User defines function in WHERE clause is a good idea? Is this hampering the performance?
This article is related to it.
Is it Bad?
Using function on WHERE clause causes Index scan.   The reason for this is that the function value has to be evaluated for each row of data to determine it matches our criteria.
How we understand it
To understand it properly, here I am taking an example. We compare the Actual Execution plan of the Query and understand how the function in the WHERE clause effects the query by Index Scanning.
Step-1 [ Creating the Base Table ]
 -- Base Table
IF OBJECT_ID('tbl_EMPDTLS') IS NOT NULL
   BEGIN
      DROP TABLE tbl_EMPDTLS;
   END
GO
CREATE TABLE tbl_EMPDTLS
       (EMPID      INT         NOT NULL IDENTITY PRIMARY KEY,
        EMPFNAME   VARCHAR(50) NOT NULL,
        EMPLNAME   VARCHAR(50) NOT NULL,  
        EMPGRADE   VARCHAR(1)  NOT NULL,
        EMPEMAIL   VARCHAR(50) NOT NULL,
        DOJ        DATETIME    NOT NULL);
GO

Step-2 [ Creating the Index Information ]

-- Creating Non clustered Index
IF EXISTS(SELECT *
          FROM   sys.indexes
          WHERE  object_id=OBJECT_ID('tbl_EMPDTLS')
                 AND name ='IX_NONC_EMPFNAME')
    BEGIN
      DROP INDEX tbl_EMPDTLS.IX_NONC_EMPFNAME;
    END             
GO   
CREATE NONCLUSTERED INDEX IX_NONC_EMPFNAME
ON  tbl_EMPDTLS (EMPFNAME)    
GO
IF EXISTS(SELECT *
          FROM   sys.indexes
          WHERE  object_id=OBJECT_ID('tbl_EMPDTLS')
                 AND name ='IX_NONC_EMPGRADE')
    BEGIN
      DROP INDEX tbl_EMPDTLS.IX_NONC_EMPGRADE;
    END             
GO   

CREATE NONCLUSTERED INDEX IX_NONC_EMPGRADE
ON  tbl_EMPDTLS (EMPLNAME);
GO

IF EXISTS(SELECT *
          FROM   sys.indexes
          WHERE  object_id=OBJECT_ID('tbl_EMPDTLS')
                 AND name ='IX_NONC_EMPEMAIL')
    BEGIN
      DROP INDEX tbl_EMPDTLS.IX_NONC_EMPEMAIL;
    END             
GO   

CREATE NONCLUSTERED INDEX IX_NONC_EMPEMAIL
ON  tbl_EMPDTLS (EMPEMAIL);    
GO

IF EXISTS(SELECT *
          FROM   sys.indexes
          WHERE  object_id=OBJECT_ID('tbl_EMPDTLS')
                 AND name ='IX_NONC_DOJ')
    BEGIN
      DROP INDEX tbl_EMPDTLS.IX_NONC_DOJ;
    END             
GO   

CREATE NONCLUSTERED INDEX IX_NONC_DOJ
ON  tbl_EMPDTLS (DOJ);    
GO

Step-3 [ Inserting some records in the Table ]

-- Inserting Records
INSERT INTO tbl_EMPDTLS 
       (EMPFNAME, EMPLNAME, EMPGRADE, EMPEMAIL, DOJ)
VALUES ('JOYDEEP', 'DAS',    'B', 'joydeep@abc.com','03-12-2006'),
       ('RAJECH',  'DAS',    'C', 'rajesh@abc.com', '01-12-2006'),
       ('SUKAMAL', 'JANA',   'B', 'suku@abc.com',   '03-12-2004'),
       ('TUHIN',   'SHINAH', 'B', 'tuhin@abc.com',  '07-12-2001'),
       ('SANGRAM', 'JIT',    'B', 'sangram@abc.com','01-10-2011'),
       ('SUDIP',   'DAS',    'A', 'sudip@abc.com',  '07-11-1990'),
       ('RANI',    'LAL',    'B', 'rani@abc.com',   '03-12-2006'),
       ('JOHN',    'IBRAHAM','C', 'john@abc.com',   '01-05-2007'),
       ('BHUPEN',  'SINGH',  'A', 'bhapu@abc.com',  '03-12-2006'),
       ('SAIKAT',  'SREE',   'B', 'saikat@abc.com', '01-12-1906'),
       ('SUJATA',  'LALA',   'B', 'sujata@abc.com', '03-12-2012'),
       ('RAJU',    'ROSTOGU','C', 'raju@abc.com',   '03-12-2006'),
       ('ROHIT',   'KUMAR',  'C', 'rohit@abc.com',  '01-10-2012'),
       ('VIPIN',   'PAUL',   'B', 'vipin@abc.com',  '01-11-2006'),
       ('VINODH',  'CHOPRA', 'C', 'vinodh@abc.com', '03-12-2006'),
       ('KALLU',   'SHEK',   'B', 'joydeep@abc.com','01-11-2011')
GO

Step-4 [ Exciting the Query ]

-- Example Set-1
SELECT *
FROM   tbl_EMPDTLS WITH(INDEX(IX_NONC_EMPFNAME))
WHERE EMPFNAME  LIKE 'J%';   
GO

SELECT *
FROM   tbl_EMPDTLS WITH(INDEX(IX_NONC_EMPFNAME))
WHERE LEFT(EMPFNAME,1)  = 'J'; 



-- Example Set-2
SELECT *
FROM   tbl_EMPDTLS WITH(INDEX(IX_NONC_EMPFNAME))
WHERE  EMPFNAME='JOYDEEP'
       AND EMPLNAME='DAS'
GO

SELECT *
FROM   tbl_EMPDTLS WITH(INDEX(IX_NONC_EMPFNAME))
WHERE  EMPFNAME+EMPLNAME='JOYDEEPDAS'       


  
-- Example Set-3      
SELECT *
FROM   tbl_EMPDTLS WITH(INDEX(IX_NONC_DOJ))
WHERE  DOJ = '03-12-2004' 
GO

SELECT *
FROM   tbl_EMPDTLS WITH(INDEX(IX_NONC_DOJ))
WHERE  DOJ < GETDATE()
GO

SELECT *
FROM   tbl_EMPDTLS WITH(INDEX(IX_NONC_DOJ))
WHERE  DATEDIFF(day, DOJ, '03-12-2004') =
GO



Conclusion
So from execution plan we find that using function in WHERE clause is a bad idea.

Hope you like it.

Posted by: MR. JOYDEEP DAS

Thursday, 22 November 2012

“IN”, “EXISTS” clause and their performance

Introduction
To improve the performance of the Query, the general guideline is not to prefer the "IN" Clause. The guideline of the MS SQL query performance says that if we needs "IN" clause, instead of using "IN" clause we must use the "EXISTS" clause.   As the "EXISTS" clause improve the performance of the query.
This article is related to "IN", "EXISTS" clause and their performance factors.

Is "IN" and "EXISTS" clause are same
IN Clause
Returns true if a specified value matches any value in a sub query or a list.
EXISTS Clause
Returns true if a sub query contains any rows.
So we see that the "IN" and the "EXISTS" cluse are not same. To support the above definition lets takes an example.
-- Base Tabe
IF OBJECT_ID('emp_DtlTbl') IS NOT NULL
   BEGIN
     DROP TABLE emp_DtlTbl;
   END
GO
IF OBJECT_ID('emp_GradeTbl') IS NOT NULL
   BEGIN
     DROP TABLE emp_GradeTbl;
   END
GO     

CREATE TABLE emp_DtlTbl
       (EMPID   INT        NOT NULL IDENTITY PRIMARY KEY, 
        EMPNAME VARCHAR(50)NOT NULL);
GO

CREATE TABLE emp_GradeTbl
       (EMPID   INT        NOT NULL IDENTITY PRIMARY KEY, 
        GRADE   VARCHAR(1) NOT NULL);
GO

-- Insert Records
INSERT INTO  emp_DtlTbl
       (EMPNAME)
VALUES ('Joydeep Das'), ('Sukamal Jana'), ('Sudip Das');
GO

INSERT INTO  emp_GradeTbl
       (GRADE)
VALUES ('B'), ('B'), ('A');
GO

-- [ IN ] Clause Example-1
SELECT *
FROM   emp_DtlTbl;
WHERE  EMPID IN(SELECT EMPID FROM  emp_DtlTbl);           


-- [ IN ] Clause Example-2
SELECT *
FROM   emp_DtlTbl
WHERE  EMPID IN(1,2,3);


-- [ EXISTS ] Clause Example
SELECT a.*
FROM   emp_DtlTbl a
WHERE  EXISTS(SELECT b.*
              FROM   emp_DtlTbl b
              WHERE  b.EMPID = a.EMPID);
  

Performance Factors
To understand the performance factors let see the actual execution plan for "IN" and "EXISTS" clauses.
Take this example:
-- [ IN ] Clause Exampl
SELECT *
FROM   emp_DtlTbl
WHERE  EMPID =(SELECT EMPID
               FROM   emp_DtlTbl
               WHERE  EMPID = 2);           

-- [ EXISTS ] Clause Example
SELECT a.*
FROM   emp_DtlTbl a
WHERE  EXISTS(SELECT b.*
              FROM   emp_DtlTbl b
              WHERE  b.EMPID = 2
                     AND b.EMPID = a.EMPID);

If we compare the total query costs of the both MS SQL query, we see that the IN clause query cost is higher than the EXISTS clause query costs.

Special notes
Please note that: Here the data of the table is limited, so we cannot measure the performance factors.


Hope you like it.


Posted by: MR. JOYDEEP DAS