Introduction
Here we are
trying to compare two Table object in different Server Database. If we search goggle
we can find a lot of process to compare two tables in Different server
Database. But the Question is both the table have 10M of records.
Comparing two
Tables with huge data is not simple, especially when they are in different
database.
Common solution
we find that to load the Source table into a Temporary table in Destination
Database and then compare Temporary Table with Destination Table.
It is so
simple? We have to load 10M data into a Temporary Table first and then compare.
It may cause a memory related problem and our package stuck.
Another
solution is by using Lookup Trans form. Good solution when we think about two
different server databases. But if we load 10 M data into a lookup transform
... Only God Knows what happens.
Others we can
use Merge Join transform. But Merge join is a Blocking Transform and more it
takes two sorted input of data flow by using Sort transform.... Bad Idea.
So What the Solution
There is no pre-defining
solution for that. It depends how we design our SSIS package. How we load our
data into our SSIS transform.
Here our
strategy is by loading data depends on a specified columns. If the columns is a
Date data type it is good for us. We can use the date range to load the data. But
we must do it by automatic way.
SSIS Solutions
Here we are
taking two tables with few records for just an example purpose only. We must assume
it with 10 M records.
Step – 1 [ Source and Destination
Table Objects ]
Assume that
the Data is already migrated from Source to Destination.
CREATE TABLE [dbo].[Emp_Source]
(
EmpID INT NOT NULL IDENTITY,
EmpGrade CHAR(1) NOT NULL,
DOJ DATE NOT NULL
);
CREATE TABLE [dbo].[Emp_Destination]
(
EmpID INT NOT NULL IDENTITY,
EmpGrade CHAR(1) NOT NULL,
DOJ DATE NOT NULL
);
Source and Destination
have same data.
EmpID
|
EmpGrade
|
DOJ
|
1
|
C
|
2010-01-01
|
2
|
C
|
2010-01-01
|
3
|
C
|
2010-01-01
|
4
|
C
|
2010-01-01
|
5
|
C
|
2011-01-01
|
6
|
C
|
2011-01-01
|
7
|
C
|
2011-01-01
|
8
|
C
|
2011-01-01
|
9
|
B
|
2012-01-01
|
10
|
B
|
2012-01-01
|
11
|
B
|
2012-01-01
|
12
|
A
|
2013-01-01
|
13
|
A
|
2013-01-01
|
14
|
A
|
2013-01-01
|
15
|
C
|
2014-01-01
|
16
|
C
|
2014-01-01
|
17
|
C
|
2014-01-01
|
18
|
C
|
2015-01-01
|
19
|
C
|
2015-01-01
|
20
|
C
|
2015-01-01
|
21
|
C
|
2016-01-01
|
22
|
B
|
2017-01-01
|
23
|
B
|
2017-01-01
|
24
|
B
|
2017-01-01
|
Step – 2 [ Now Create a Segment Table
for Limited Data Load in SSIS Package ]
EmpID
|
EmpGrade
|
DOJ
|
Segment
|
1
|
C
|
2010-01-01
|
Segment-1
|
2
|
C
|
2010-01-01
|
|
3
|
C
|
2010-01-01
|
|
4
|
C
|
2010-01-01
|
Segment-2
|
5
|
C
|
2011-01-01
|
|
6
|
C
|
2011-01-01
|
|
7
|
C
|
2011-01-01
|
|
8
|
C
|
2011-01-01
|
|
9
|
B
|
2012-01-01
|
Segment-3
|
10
|
B
|
2012-01-01
|
|
11
|
B
|
2012-01-01
|
|
12
|
A
|
2013-01-01
|
Segment-4
|
13
|
A
|
2013-01-01
|
|
14
|
A
|
2013-01-01
|
|
15
|
C
|
2014-01-01
|
Segment-5
|
16
|
C
|
2014-01-01
|
|
17
|
C
|
2014-01-01
|
|
18
|
C
|
2015-01-01
|
Segement-6
|
19
|
C
|
2015-01-01
|
|
20
|
C
|
2015-01-01
|
|
21
|
C
|
2016-01-01
|
Segment-7
|
22
|
B
|
2017-01-01
|
Segment-8
|
23
|
B
|
2017-01-01
|
|
24
|
B
|
2017-01-01
|
CREATE TABLE [dbo].[tbl_Segment]
(
SegmentName VARCHAR(50) NOT NULL,
FromDate DATE NOT NULL,
ToDate DATE NOT NULL
);
INSERT INTO [dbo].[tbl_Segment]
(SegmentName, FromDate,
ToDate)
VALUES
('Segment-1', '2010-01-01',
'2010-12-31'),
('Segment-1', '2011-01-01',
'2011-12-31'),
('Segment-1', '2012-01-01',
'2012-12-31'),
('Segment-1', '2013-01-01',
'2013-12-31'),
('Segment-1', '2014-01-01',
'2014-12-31'),
('Segment-1', '2015-01-01',
'2015-12-31'),
('Segment-1', '2016-01-01',
'2016-12-31'),
('Segment-1', '2017-01-01',
'2017-12-31');
SELECT * FROM
[dbo].[tbl_Segment];
SegmentName
|
FromDate
|
ToDate
|
Segment-1
|
2010-01-01
|
2010-12-31
|
Segment-1
|
2011-01-01
|
2011-12-31
|
Segment-1
|
2012-01-01
|
2012-12-31
|
Segment-1
|
2013-01-01
|
2013-12-31
|
Segment-1
|
2014-01-01
|
2014-12-31
|
Segment-1
|
2015-01-01
|
2015-12-31
|
Segment-1
|
2016-01-01
|
2016-12-31
|
Segment-1
|
2017-01-01
|
2017-12-31
|
By using this
segmentation table we can load a limited amount of records in SSIS package.
Step – 3 [ SSIS Package Control Flow ]
Step – 4 [ The Variable ]
Name
|
Data Type
|
SQL
|
String
|
v_FromDT
|
String
|
v_ToDT
|
String
|
vDateFromTo
|
Object
|
Step – 5 [ Execute SQL – Get the Data
Range ]
Step – 6 [ ForEach Container ]
Step – 7 [ Set the variable Expression
– Variable name SQL ]
The
Expression is
"SELECT
* FROM [dbo].[Emp_Source] WHERE DOJ BETWEEN '"+ @[User::v_FromDT] +"' AND
'"+
@[User::v_ToDT]+"'"
Step – 8 [ Data Flow Task ]
Step – 9 [ OLEDB Source Settings ]
Step
- 10 [ The Lookup Transform ]
The main challenge
is to create a dynamic Lookup Transform. To do that just creates a tradition lookup
transform with OLEDB connection manager that we create normally with SQL
Command not directly choosing Table Object.
Now we have
to make the Lookup Dynamic and use the same Expression used before.
"SELECT
* FROM [dbo].[Emp_Source] WHERE DOJ BETWEEN '"+ @[User::v_FromDT] +"' AND
'"+
@[User::v_ToDT]+"'"
For that we just
select the Data Flow Task where the lookup exists. Select the property of Data
Flow Task and choose the Expression. Then set the SQL Command Property by
Expression. That’s all
Hope you like
it.
Posted
by: MR. JOYDEEP DAS
This is excellent one...
ReplyDeleteThanks @Sukamal
Delete