Sunday, 5 February 2017

Comparing Table in Different Database with Dynamic Lookup

Introduction
Here we are trying to compare two Table object in different Server Database. If we search goggle we can find a lot of process to compare two tables in Different server Database. But the Question is both the table have 10M of records.

Comparing two Tables with huge data is not simple, especially when they are in different database.

Common solution we find that to load the Source table into a Temporary table in Destination Database and then compare Temporary Table with Destination Table.

It is so simple? We have to load 10M data into a Temporary Table first and then compare. It may cause a memory related problem and our package stuck.

Another solution is by using Lookup Trans form. Good solution when we think about two different server databases. But if we load 10 M data into a lookup transform ... Only God Knows what happens.

Others we can use Merge Join transform. But Merge join is a Blocking Transform and more it takes two sorted input of data flow by using Sort transform.... Bad Idea.

So What the Solution
There is no pre-defining solution for that. It depends how we design our SSIS package. How we load our data into our SSIS transform.

Here our strategy is by loading data depends on a specified columns. If the columns is a Date data type it is good for us. We can use the date range to load the data. But we must do it by automatic way.

SSIS Solutions
Here we are taking two tables with few records for just an example purpose only. We must assume it with 10 M records.

Step – 1 [ Source and Destination Table Objects ]

Assume that the Data is already migrated from Source to Destination.

CREATE TABLE [dbo].[Emp_Source]
  (
    EmpID       INT        NOT NULL IDENTITY,
EmpGrade   CHAR(1)    NOT NULL,
     DOJ        DATE       NOT NULL
  );

CREATE TABLE [dbo].[Emp_Destination]
  (
    EmpID       INT        NOT NULL IDENTITY,
     EmpGrade   CHAR(1)    NOT NULL,
     DOJ        DATE       NOT NULL
  );

Source and Destination have same data.

EmpID
EmpGrade
DOJ
1
C
2010-01-01
2
C
2010-01-01
3
C
2010-01-01
4
C
2010-01-01
5
C
2011-01-01
6
C
2011-01-01
7
C
2011-01-01
8
C
2011-01-01
9
B
2012-01-01
10
B
2012-01-01
11
B
2012-01-01
12
A
2013-01-01
13
A
2013-01-01
14
A
2013-01-01
15
C
2014-01-01
16
C
2014-01-01
17
C
2014-01-01
18
C
2015-01-01
19
C
2015-01-01
20
C
2015-01-01
21
C
2016-01-01
22
B
2017-01-01
23
B
2017-01-01
24
B
2017-01-01

Step – 2 [ Now Create a Segment Table for Limited Data Load in SSIS Package ]
EmpID
EmpGrade
DOJ
Segment
1
C
2010-01-01
Segment-1
2
C
2010-01-01
3
C
2010-01-01
4
C
2010-01-01
Segment-2
5
C
2011-01-01
6
C
2011-01-01
7
C
2011-01-01
8
C
2011-01-01
9
B
2012-01-01
Segment-3
10
B
2012-01-01
11
B
2012-01-01
12
A
2013-01-01
Segment-4
13
A
2013-01-01
14
A
2013-01-01
15
C
2014-01-01
Segment-5
16
C
2014-01-01
17
C
2014-01-01
18
C
2015-01-01
Segement-6
19
C
2015-01-01
20
C
2015-01-01
21
C
2016-01-01
Segment-7
22
B
2017-01-01
Segment-8
23
B
2017-01-01
24
B
2017-01-01

CREATE TABLE [dbo].[tbl_Segment]
  (
     SegmentName          VARCHAR(50)     NOT NULL,
     FromDate             DATE            NOT NULL,
     ToDate               DATE            NOT NULL
  );

INSERT INTO [dbo].[tbl_Segment]
(SegmentName, FromDate, ToDate)
VALUES
('Segment-1', '2010-01-01', '2010-12-31'),
('Segment-1', '2011-01-01', '2011-12-31'),
('Segment-1', '2012-01-01', '2012-12-31'),
('Segment-1', '2013-01-01', '2013-12-31'),
('Segment-1', '2014-01-01', '2014-12-31'),
('Segment-1', '2015-01-01', '2015-12-31'),
('Segment-1', '2016-01-01', '2016-12-31'),
('Segment-1', '2017-01-01', '2017-12-31');

SELECT * FROM [dbo].[tbl_Segment];

SegmentName
FromDate
ToDate
Segment-1
2010-01-01
2010-12-31
Segment-1
2011-01-01
2011-12-31
Segment-1
2012-01-01
2012-12-31
Segment-1
2013-01-01
2013-12-31
Segment-1
2014-01-01
2014-12-31
Segment-1
2015-01-01
2015-12-31
Segment-1
2016-01-01
2016-12-31
Segment-1
2017-01-01
2017-12-31

By using this segmentation table we can load a limited amount of records in SSIS package.

Step – 3 [ SSIS Package Control Flow ]



Step – 4 [ The Variable ]
Name
Data Type
SQL
String
v_FromDT
String
v_ToDT
String
vDateFromTo
Object


Step – 5 [ Execute SQL – Get the Data Range ]






Step – 6 [ ForEach Container ]




Step – 7 [ Set the variable Expression – Variable name SQL ]

The Expression is
"SELECT * FROM [dbo].[Emp_Source] WHERE DOJ BETWEEN '"+  @[User::v_FromDT] +"'  AND  '"+  @[User::v_ToDT]+"'"

Step – 8 [ Data Flow Task ]




Step – 9 [ OLEDB Source Settings ]




Step  - 10 [ The Lookup Transform ]
The main challenge is to create a dynamic Lookup Transform. To do that just creates a tradition lookup transform with OLEDB connection manager that we create normally with SQL Command not directly choosing Table Object.

Now we have to make the Lookup Dynamic and use the same Expression used before.
"SELECT * FROM [dbo].[Emp_Source] WHERE DOJ BETWEEN '"+  @[User::v_FromDT] +"'  AND  '"+  @[User::v_ToDT]+"'"

For that we just select the Data Flow Task where the lookup exists. Select the property of Data Flow Task and choose the Expression. Then set the SQL Command Property by Expression. That’s all












Hope you like it.




Posted by: MR. JOYDEEP DAS

2 comments: