Introduction
We are always
trying to provide some useful solution for SSIS developer. Here in this article
we are trying to make some incremental process of loading data where source
have huge amount of data. It’s a strategy only not the complete solution. You
can get a new idea how to handle with huge amount records when you’re dealing
with data migration.
Case Scenario
We have a
source and destination table object in different DB and different server.
Environment of source DB may be very. It may be flat file, Oracle, DB2 etc and
the Destination is SQL server in our case (Although the Destination may be very).
Our main job
is to move all data from source table to our destination table.
We have to understand
certain thing over here
1.
The
source data is huge
2.
The
source is ever increasing. That means NEW records is always added in the source
table.
3.
The
data of the source may be modified any point of time.
In one wards
the source is LIVE.
Don’t worry;
we have the solution for that.
Constraint
Before going
to the solution we have some constraint that we need to discuss, others we are
unable to build the solution.
1.
Source
must have Identity type Primary Key columns.
2.
Source
must have Date Type columns called the Last Modified Date (LMDT). When the new
records Inserted into the source tale it put the Current Date to the LMDT
columns. When the records is Updated the LMDT columns is updated by current
Date value.
If all the constraint
value meets, we can move into the solution.
What the solution is
The solution
is quite simple. We just follow certain steps to achieve it.
1.
Find
the Maximum (MAX) value of the Primary Key columns of Destination Table.
2.
Find
the Maximum (MAX) value of Last Modified Date (LMDT) of Destination Table.
3.
Now
Check the Source table Primary key columns value with MAX value of Primary key
columns of Destination Table.
Get All the Records from source which has Greater value of MAX value of Destination for Insert as New Records.
Get All the Records from source which has Greater value of MAX value of Destination for Insert as New Records.
4.
Again
check the Source Table Primary key columns value which have less value of MAX
value of Destination and Last Modified Date (LMDT) of Source is Greeter then the
Maximum (MAX) value of Destination for Update.
Here we are putting those records in a staging table to do the operation by SQL statement.
Here we are putting those records in a staging table to do the operation by SQL statement.
SSIS Solution
Step -1 [ Source and Destination Table ]
--Source
Table
CREATE TABLE [dbo].[tbl_EmployeeDetails]
(
EmpID INT NOT NULL IDENTITY PRIMARY KEY,
EmpName VARCHAR(50) NOT NULL,
Grade CHAR(1) NOT NULL,
LMDT DATE NOT NULL
);
SELECT * FROM [dbo].[tbl_EmployeeDetails];
EmpID EmpName Grade LMDT
1 Joydeep
Das C 2014-10-01
2 Debleena
Bhattacharya C 2016-10-01
3 Priya
Baannerjee C 2015-10-01
4 Sarmila
Das C 2015-10-01
5 Deepasree
Das C 2015-10-01
6 Rajeev
Ranjan C 2017-01-01
7 Madhu
Mita C 2017-01-01
--Destination
Table
CREATE TABLE [dbo].[tbl_EmployeeDetails_Dest]
(
EmpID INT NOT NULL PRIMARY KEY,
EmpName VARCHAR(50) NOT NULL,
Grade CHAR(1) NOT NULL,
LMDT DATE NOT NULL
);
Step-2 [ Create a Staging Table ]
--Staging
Table
CREATE TABLE [dbo].[tbl_EmployeeDetails_Dest_Stage]
(
EmpID INT NOT NULL,
EmpName VARCHAR(50) NOT NULL,
Grade CHAR(1) NOT NULL,
LMDT DATE NOT NULL
);
Step-3 [ The Control Flow of SSIS ]
Step – 4 [ Execute SQL Task – Pic MAX Records ]
Here we are
using variables to get the MAX records
Name
|
Scope
|
Data Type
|
v_MAX_ID
|
Package Level
|
Int32
|
v_MAX_DT
|
Package Level
|
DateTime
|
SQL Statement:
SELECT ISNULL(MAX(EmpID), 0),
ISNULL(MAX(LMDT), '1990-01-01')
FROM [dbo].[tbl_EmployeeDetails_Dest];
Step-5 [ Data Flow Task – Insert NEW Records ]
SQL
Statement:
SELECT * FROM
[dbo].[tbl_EmployeeDetails] WHERE
EmpID >= ?;
Step-6 [ Execute SQL – Truncate Staging ]
SQL
Statement:
TRUNCATE TABLE [dbo].[tbl_EmployeeDetails_Dest_Stage];
Step-7 [ Data Flow task – UPDATE to Staging ]
SQL
Statement:
SELECT *
FROM [dbo].[tbl_EmployeeDetails]
WHERE EmpID < ? AND LMDT > ?;
FROM [dbo].[tbl_EmployeeDetails]
WHERE EmpID < ? AND LMDT > ?;
Step-8 [ Execute SQL – Update Destination ]
SQL Statement:
UPDATE t
SET t.EmpName = s.EmpName,
t.Grade = s.Grade,
t.LMDT = s.LMDT
FROM [dbo].[tbl_EmployeeDetails_Dest] AS
t
INNER JOIN [dbo].[tbl_EmployeeDetails_Dest_Stage]
AS s
ON t.EmpID = s.EmpID;
Observation
Step- 1[ First – Run of SSIS Package ]
SELECT * FROM
[dbo].[tbl_EmployeeDetails_Dest];
EmpID EmpName Grade LMDT
1 Joydeep
Das C 2014-10-01
2 Debleena
Bhattacharya C 2016-10-01
3 Priya
Baannerjee C 2015-10-01
4 Sarmila
Das C 2015-10-01
5 Deepasree
Das C 2015-10-01
6 Rajeev
Ranjan C 2017-01-01
7 Madhu
Mita C 2017-01-01
Step-2 [ Update Source and Run SSIS package Again ]
UPDATE [dbo].[tbl_EmployeeDetails]
SET EmpName =
'Joydeep Debbarma',
LMDT = GETDATE()
WHERE EmpID=1;
Now the
source record is modified
SELECT * FROM
[dbo].[tbl_EmployeeDetails];
EmpID EmpName Grade LMDT
1 Joydeep
Debbarma C 2017-02-04
2 Debleena
Bhattacharya C 2016-10-01
3 Priya
Baannerjee C 2015-10-01
4 Sarmila
Das C 2015-10-01
5 Deepasree
Das C 2015-10-01
6 Rajeev
Ranjan C 2017-01-01
7 Madhu
Mita C 2017-01-01
Step-3 [ Run the SSIS Package Again and Observe
Destination ]
SELECT * FROM [dbo].[tbl_EmployeeDetails_Dest];
EmpID EmpName Grade LMDT
1 Joydeep Debbarma C 2017-02-04
2 Debleena Bhattacharya C 2016-10-01
3 Priya Baannerjee C 2015-10-01
4 Sarmila Das C 2015-10-01
5 Deepasree Das C 2015-10-01
6 Rajeev Ranjan C 2017-01-01
7 Madhu Mita C 2017-01-01
Comments
Here we are
not using any MERGE statement as we don’t like to use it as MERGE have some performance
issue.
We can make
it as for DELETE operation also.
Hope you like
it.
Posted
by: MR. JOYDEEP DAS
No comments:
Post a Comment