Saturday, 4 February 2017

NEW Incremental Process strategy for SSIS

Introduction
We are always trying to provide some useful solution for SSIS developer. Here in this article we are trying to make some incremental process of loading data where source have huge amount of data. It’s a strategy only not the complete solution. You can get a new idea how to handle with huge amount records when you’re dealing with data migration.

Case Scenario
We have a source and destination table object in different DB and different server. Environment of source DB may be very. It may be flat file, Oracle, DB2 etc and the Destination is SQL server in our case (Although the Destination may be very).
Our main job is to move all data from source table to our destination table.

We have to understand certain thing over here
1.    The source data is huge
2.    The source is ever increasing. That means NEW records is always added in the source table.
3.    The data of the source may be modified any point of time.

In one wards the source is LIVE.
Don’t worry; we have the solution for that.

Constraint
Before going to the solution we have some constraint that we need to discuss, others we are unable to build the solution.

1.    Source must have Identity type Primary Key columns.
2.    Source must have Date Type columns called the Last Modified Date (LMDT). When the new records Inserted into the source tale it put the Current Date to the LMDT columns. When the records is Updated the LMDT columns is updated by current Date value.

If all the constraint value meets, we can move into the solution.

What the solution is
The solution is quite simple. We just follow certain steps to achieve it.

1.    Find the Maximum (MAX) value of the Primary Key columns of Destination Table.
2.    Find the Maximum (MAX) value of Last Modified Date (LMDT) of Destination Table.
3.    Now Check the Source table Primary key columns value with MAX value of Primary key columns of Destination Table.
Get All the Records from source which has Greater value of MAX value of Destination for Insert as New Records.
4.    Again check the Source Table Primary key columns value which have less value of MAX value of Destination and Last Modified Date (LMDT) of Source is Greeter then the Maximum (MAX) value of Destination for Update.
Here we are putting those records in a staging table to do the operation by SQL statement.

SSIS Solution

Step -1 [ Source and Destination Table ]

--Source Table
CREATE TABLE [dbo].[tbl_EmployeeDetails]
  (
     EmpID   INT         NOT NULL IDENTITY PRIMARY KEY,
     EmpName VARCHAR(50) NOT NULL,
     Grade   CHAR(1)     NOT NULL,
     LMDT    DATE        NOT NULL
  );
 

SELECT * FROM [dbo].[tbl_EmployeeDetails];

EmpID   EmpName                           Grade     LMDT
1              Joydeep Das                        C             2014-10-01
2              Debleena Bhattacharya       C             2016-10-01
3              Priya Baannerjee                 C             2015-10-01
4              Sarmila Das                         C             2015-10-01
5              Deepasree Das                     C             2015-10-01
6              Rajeev Ranjan                     C             2017-01-01
7              Madhu Mita                         C             2017-01-01

--Destination Table

CREATE TABLE [dbo].[tbl_EmployeeDetails_Dest]
  (
     EmpID      INT             NOT NULL PRIMARY KEY,
     EmpName    VARCHAR(50)     NOT NULL,
     Grade      CHAR(1)         NOT NULL,
     LMDT       DATE            NOT NULL
  );

Step-2 [ Create a Staging Table ]

--Staging Table
CREATE TABLE [dbo].[tbl_EmployeeDetails_Dest_Stage]
  (
     EmpID      INT             NOT NULL,
     EmpName    VARCHAR(50)     NOT NULL,
     Grade      CHAR(1)         NOT NULL,
     LMDT       DATE            NOT NULL
  );

Step-3 [ The Control Flow of SSIS ]



Step – 4 [ Execute SQL Task – Pic MAX Records ]

Here we are using variables to get the MAX records

Name
Scope
Data  Type
v_MAX_ID
Package Level
Int32
v_MAX_DT
Package Level
DateTime

SQL Statement:

SELECT ISNULL(MAX(EmpID), 0),
       ISNULL(MAX(LMDT), '1990-01-01')
FROM   [dbo].[tbl_EmployeeDetails_Dest];

Step-5 [ Data Flow Task – Insert NEW Records ]



SQL Statement:
SELECT * FROM [dbo].[tbl_EmployeeDetails] WHERE EmpID >= ?;


Step-6 [ Execute SQL – Truncate Staging ]


SQL Statement:
TRUNCATE TABLE [dbo].[tbl_EmployeeDetails_Dest_Stage];

Step-7 [ Data Flow task – UPDATE to Staging ]





SQL Statement:
SELECT *
FROM [dbo].[tbl_EmployeeDetails]
WHERE EmpID < ? AND LMDT > ?;


Step-8 [ Execute SQL – Update Destination ]

SQL Statement:
UPDATE t
  SET  t.EmpName = s.EmpName,
       t.Grade   = s.Grade,
        t.LMDT    = s.LMDT
  FROM [dbo].[tbl_EmployeeDetails_Dest] AS t
       INNER JOIN [dbo].[tbl_EmployeeDetails_Dest_Stage] AS s
        ON t.EmpID = s.EmpID;


Observation

Step- 1[ First – Run of SSIS Package ]

SELECT * FROM [dbo].[tbl_EmployeeDetails_Dest];

EmpID   EmpName                         Grade     LMDT
1              Joydeep Das                      C             2014-10-01
2              Debleena Bhattacharya     C             2016-10-01
3              Priya Baannerjee               C             2015-10-01
4              Sarmila Das                       C             2015-10-01
5              Deepasree Das                   C             2015-10-01
6              Rajeev Ranjan                   C             2017-01-01
7              Madhu Mita                       C             2017-01-01

Step-2 [ Update Source and Run SSIS package Again ]

UPDATE [dbo].[tbl_EmployeeDetails]
SET EmpName = 'Joydeep Debbarma',
    LMDT = GETDATE()
WHERE EmpID=1;

Now the source record is modified

SELECT * FROM [dbo].[tbl_EmployeeDetails];

EmpID   EmpName                        Grade      LMDT
1              Joydeep Debbarma            C             2017-02-04
2              Debleena Bhattacharya     C             2016-10-01
3              Priya Baannerjee               C             2015-10-01
4              Sarmila Das                       C             2015-10-01
5              Deepasree Das                   C             2015-10-01
6              Rajeev Ranjan                   C             2017-01-01
7              Madhu Mita                       C             2017-01-01

Step-3 [ Run the SSIS Package Again and Observe Destination ]

SELECT * FROM [dbo].[tbl_EmployeeDetails_Dest];

EmpID   EmpName                         Grade     LMDT
1              Joydeep Debbarma            C             2017-02-04
2              Debleena Bhattacharya     C             2016-10-01
3              Priya Baannerjee               C             2015-10-01
4              Sarmila Das                       C             2015-10-01
5              Deepasree Das                   C             2015-10-01
6              Rajeev Ranjan                   C             2017-01-01
7              Madhu Mita                       C             2017-01-01

Comments

Here we are not using any MERGE statement as we don’t like to use it as MERGE have some performance issue.
We can make it as for DELETE operation also.


Hope you like it.






Posted by: MR. JOYDEEP DAS

No comments:

Post a Comment