Wednesday 11 November 2015

Incremental Load without CDC in SSIS

Introduction
In my previous article we are demonstrating CDC as Incremental load in SSIS. But what happens, if CDC is not there. We can do it without CDC also. Here in this article we are going to demonstrate same Incremental load without CDC. Hope it will be informative and you enjoy the session.

What the scenario is
We are taking the same Scenario that we take in the CDC.
We have a table object named tbl_EmployeeDetails like this


EmpID (PK)
EmpName
Designation
DOJ
1
Joydeep Das
A
2015-02-01
2
Deepasree Das
B
2013-05-01
3
Santi Nath Mandal
A
2012-03-01

And we are going to load data into tbl_Employee table in our production DB without using CDC for incremental load.

How to Do That
Here we are going to describe the main steps or logic to implement this scenario and we hope that you have well understanding with SSIS and no need to describe all the steps.

Step – 1 [ Preparing the Base Table in Source ]

Here we are altering our previous table and adding two extra columns in the name of InsertedDT and UpdatedDt with data type DATETIME. As we are not using CDC, these two columns represent the CDC works for us.
Whenever a new records is Inserted the InsertedDT and UpdatedDT is taking the GETDATE() values and whenever the records is Updated the UpdateDT columns is modified not the InsertedDT columns.

IF OBJECT_ID(N'[dbo].[tbl_EmployeeDetails]', N'U')IS NOT NULL
   BEGIN
      DROP TABLE [dbo].[tbl_EmployeeDetails];
   END
GO

CREATE TABLE [dbo].[tbl_EmployeeDetails]
  (
   EmpID          INT         NOT NULL IDENTITY PRIMARY KEY,     
   EmpName        VARCHAR(50) NOT NULL,
   Designation    CHAR(1)     NOT NULL,
   DOJ            DATETIME    NOT NULL,
   InsertDT       DATETIME    NULL,
   UpdateDT       DATETIME    NULL
  );
GO

It is good it the Insert and Update method take care of the Last two columns i.e. InsertDT and UpdateDT.
 If not, we have to make a simple trigger in this table to take care of those columns.

How to prepare the Trigger

IF OBJECT_ID(N'[dbo].[trg_EmpDtlsInsert]', N'TR')IS NOT NULL
   BEGIN
       DROP TRIGGER [dbo].[trg_EmpDtlsInsert];
   END
GO

CREATE TRIGGER [dbo].[trg_EmpDtlsInsert]
ON [dbo].[tbl_EmployeeDetails]
AFTER INSERT
AS
BEGIN
DECLARE @v_Empid  INT;

SELECT * INTO #temp_Inserted FROM inserted;

IF EXISTS(SELECT * FROM #temp_Inserted)
      BEGIN
            SET @v_Empid = (SELECT EmpID FROM #temp_Inserted);
                   
            UPDATE [dbo].[tbl_EmployeeDetails]
                  SET InsertDT = GETDATE(),
                        UpdateDT = GETDATE()
            WHERE EmpID = @v_Empid;
                       
      END
END
GO

IF OBJECT_ID(N'[dbo].[trg_EmpDtlsUpdate]', N'TR')IS NOT NULL
   BEGIN
       DROP TRIGGER [dbo].[trg_EmpDtlsUpdate];
   END
GO

CREATE TRIGGER [dbo].[trg_EmpDtlsImsertUpdate]
ON [dbo].[tbl_EmployeeDetails]
AFTER UPDATE
AS
BEGIN
DECLARE @v_Empid  INT;

SELECT * INTO #temp_Inserted FROM inserted;

IF EXISTS(SELECT * FROM #temp_Inserted)
      BEGIN
            SET @v_Empid = (SELECT EmpID FROM #temp_Inserted);

            IF UPDATE(EmpName) OR UPDATE(Designation) OR UPDATE(DOJ)
                  BEGIN
                        UPDATE [dbo].[tbl_EmployeeDetails]
                              SET UpdateDT = GETDATE()
                        WHERE EmpID = @v_Empid;
                  END
      END
END
GO

Now Insert Some Records

INSERT INTO [dbo].[tbl_EmployeeDetails]
  (EmpName, Designation, DOJ)
VALUES('Joydeep Das', 'A', '2012-09-01');
GO

INSERT INTO [dbo].[tbl_EmployeeDetails]
  (EmpName, Designation, DOJ)
VALUES('Deepasree Das', 'B', '2011-07-01');
GO

SELECT * FROM [dbo].[tbl_EmployeeDetails];

EmpID
EmpName
Designation
DOJ
InsertDT
UpdateDT
1
Joydeep Das
A
2012-09-01
2015-11-11 21:22:41.043
2015-11-11 21:22:41.043
2
Deepasree Das
B
2011-07-01
2015-11-11 21:22:41.070
2015-11-11 21:22:41.070

Destination Table in Production DB

IF OBJECT_ID(N'[dbo].[tbl_Employee]', N'U')IS NOT NULL
   BEGIN
      DROP TABLE [dbo].[tbl_Employee];
   END
GO

CREATE TABLE [dbo].[tbl_Employee]
  (
   EmpID          INT         NOT NULL PRIMARY KEY,  
   EmpName        VARCHAR(50) NOT NULL,
   Designation    CHAR(1)     NOT NULL,
   DOJ            DATETIME    NOT NULL
  );
GO

Now the preparation is over and we are going to make the SSIS Package.       
                 
Step – 2 [ Introducing the Table to Track the Last Execution Date of Package ]

Here we introduce a new table which will track the Last execution date and time of the package so that if any data modification happens after that date time we must sink it with production table.

IF OBJECT_ID(N'[dbo].[tbl_ExecutionTracker]', N'U')IS NOT NULL
   BEGIN
      DROP TABLE [dbo].[tbl_ExecutionTracker];
   END
GO

CREATE TABLE [dbo].[tbl_ExecutionTracker]
  (
   Descr          VARCHAR(50) NOT NULL PRIMARY KEY,  
   LastExecDt     DATETIME    NOT NULL
  );
GO

-- Insert Records one day old to run the package
INSERT [dbo].[tbl_ExecutionTracker]
   (Descr, LastExecDt)
SELECT 'LastExec' As Descr, GETDATE()-1 AS LastExecDt;
GO

SELECT * FROM [dbo].[tbl_ExecutionTracker];

Descr              LastExecDt
LastExec         2015-11-10 21:37:37.333

Step – 3 [ SSIS Package ]

The simple logic is that the Package Extract the Data from source Greeter then or Equals to the Last Exec Dt.
So 

UpdateDt  >=  LastExecDt


Control Flow Task

[A] Execute SQL 1:



Here we are showing you what we do in the Execute SQL task named Retrieving Last Execution Date.



In SQL Statement:
SELECT LastExecDt
FROM   [dbo].[tbl_ExecutionTracker]
WHERE  Descr = 'LastExec';

[B] Execute SQL 2

In SQL Statement:

UPDATE [dbo].[tbl_ExecutionTracker]
   SET LastExecDt = GETDATE()
WHERE Descr = 'LastExec';
GO

Data Flow Task



OLEDB source -1 [ Retrieving Records using Last Execution Date ]



Conditional Split



 Step-4 [ Run the Package ]

First time Run

SELECT * FROM tbl_ExecutionTracker;

Descr                     LastExecDt
LastExec               2015-11-10 21:37:37.333



SELECT * FROM  [dbo].[tbl_Employee];

EmpID   EmpName                        Designation            DOJ
1              Joydeep Das                      A                             2012-09-01 00:00:00.000
2              Deepasree Das                  B                              2011-07-01 00:00:00.000

Now see the Last Execution Date

SELECT * FROM tbl_ExecutionTracker;

Descr                     LastExecDt
LastExec               2015-11-11 22:43:46.743

Now Insert a New Record and Modify Old Records

INSERT INTO [dbo].[tbl_EmployeeDetails]
  (EmpName, Designation, DOJ)
VALUES('Sukamal Jana', 'A', '2009-01-01');
GO


UPDATE [dbo].[tbl_EmployeeDetails]
   SET EmpName = 'Soni Sethi'
WHERE EmpID = 1;
GO

SELECT * FROM [dbo].[tbl_EmployeeDetails];

EmpID
EmpName
Designation
DOJ
InsertDT
UpdateDT
1
Soni Sethi
A
2012-09-01
2015-11-11 21:22:41.043
2015-11-11 21:22:41.043
2
Deepasree Das
B
2011-07-01
2015-11-11 21:22:41.070
2015-11-11 21:22:41.070
3
Sukamal Jana
A
2009-01-01
2015-11-11 22:48:23.653
2015-11-11 22:48:23.653

Now run the Package again



Now see the destination table
SELECT * FROM  [dbo].[tbl_Employee];

EmpID   EmpName           Designation            DOJ
1              Joydeep Das         A                             2012-09-01 00:00:00.000
2              Deepasree Das      B                             2011-07-01 00:00:00.000
3              Sukamal Jana        A                             2009-01-01 00:00:00.000
           


Summary
The main thing in the package is how to retrieve data from source depending on execution date.




Hope you like it.





Posted by: MR. JOYDEEP DAS

2 comments:

  1. Not sure if you are interested in 3rd party product but ZappySys has very easy solution.
    Link here" SSIS Upsert

    ReplyDelete