Monday, 14 March 2016

Incremental Load Made Easy in Case of UPDATE Records

Introduction

Here i am taking about the Incremental Load of Data in SSIS. All developer use there own methodology to complete the process. This artilce is related to the UPDATE the existing records of Destination in case of Incremental Load.
Hope it will be informative and put some values in your professional work.

What we Actually Do in Incremental Load

Here we are just mentioned the Alogitham behind the Incremental Load.

  1. We check if the Records Exists in the Destination or Not by comparing with Primary key and if the records doesn't Exists  it just insert the records in destination table.
  2. If the Records EXIST then it checks the other columns. If the other columns changes then it goes for UPDATION of records.

Why the UPDATION Process is so Cumbersome

Think we have a table and the table has more than 60 columns and one Primary key. If we compare both Source and Destination Table columns each other’s like this

[Source].[Column-A] == [Destination].[Column-A]
OR [Source].[Column-B] == [Destination].[Column-B]
OR [Source].[Column-C] == [Destination].[Column-C]
OR [Source].[Column-D] == [Destination].[Column-D]
…. 60 columns like this

Think what happens if we are going to maintain such kind of code.

So what the Solutions

To solve this we have to introduce Columns both is Source and Destination Table. This column contains some vale based on Checksum/Binary Checksum / Hash bytes values. We just compare those values and decide whether UPDATION is needed or NOT.

So it is quite easy to compare then compare columns by columns.

Take an Example of CHECKSUM

SELECT EmpId,
EmpName,
EmpGrade,
CHECKSUM(EmpName, EmpGrade) AS [CheckSum_Val]
FROM tbl_Employee;

EmpId    EmpName            EmpGrade       CheckSum_Val
1             Joydeep Das         A                      -1141869962
2            Deepasree Das     B                        1652695494
3            Sukamal Jana        A                        2067120924
4            Santi Ranjan          C                       -1747286364


Take an Example of BINARY_CHECKSUM

SELECT EmpId,
EmpName,
EmpGrade,
BINARY_CHECKSUM(EmpName, EmpGrade) AS [BinaryCheckSum_Val]
FROM tbl_Employee;


EmpId    EmpName        EmpGrade    BinaryCheckSum_Val
1            Joydeep Das      A                    841019011
2            Deepasree Das  B                    1196883990
3           Sukamal Jana     A                     2115519786
4           Santi Ranjan       C                    1412284040


Take an Example of HashBytes


SELECT EmpId,
EmpName,
EmpGrade,
CHECKSUM(HASHBYTES('MD5',EmpName), HASHBYTES('MD5',EmpGrade)) AS [HashBytes_Val]
FROM tbl_Employee;

EmpId                 EmpName          EmpGrade          HashBytes_Val
1                         Joydeep Das       A                        -641922184
2                         Deepasree Das   B                       -1681185931
3                         Sukamal Jana     A                         750240776
4                        Santi Ranjan        C                         1481016726



So which one We Prefer


We have to choose between CHECKSUM, BINARY_CHECKSUM and HASHBYTES

When we go to the CHECKSUM method the MS Documentation gives us some sort of Shocking news.

However, there is a small chance that the checksum will not change. For this reason, we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change”


BINARY_CHECKSUM is also not good reputation.




So HASHBYTES is the safest one that we can use.






Hope you like it.


Posted By: MR. JOYDEEP DAS





No comments:

Post a Comment