Introduction
In my
previous article we are discussing about CDC implementation of Microsoft SQL
Server. If you are not aware about it please visit it before continue this
article as it is mandatory to understand my previous article for that.
The link is
given bellow:
In this
article we are going to discuss about the Microsoft SQL Server 2012 SSIS
implementation of CDC. Hope it will be informative and you enjoy the session
Case Scenario
First we must
understand what we want to do in SSIS for CDC. Here the case scenario is
provided.
We have a
table object named tbl_EmployeeDetails like this
EmpID (PK)
|
EmpName
|
Designation
|
DOJ
|
1
|
Joydeep Das
|
A
|
2015-02-01
|
2
|
Deepasree Das
|
B
|
2013-05-01
|
3
|
Santi Nath Mandal
|
A
|
2012-03-01
|
Now we want
to incremental load of data from Table tbl_EmployeeDetails to Table
tbl_Employee with same columns name.
Here the term
Incremental Load is very important. That’s means
1.
First
time it loads all the data from Table tbl_EmployeeDetails to table tbl_Employee.
2.
Second
time it loads only those data that are newly inserted or Modified.
So, CDC play a
very important role incases of Incremental load to understand which data is
newly added and which data is updated. Hope from my previous article you can
understand the scenario well.
How to Implement in SSIS
Here we are
going to make two packages for that.
Step
– 1 [ The CDC Control Task – Mark Initial Load Start ]
Drag and drop
the CDC control task into Control Flow tab and configure it.
Here we find
·
Add
a new ADO.NET connection manager for the Source database
·
Set
CDC Control Operation to Mark initial load start
·
Create
a new package variable (CDC_State) to hold the CDC state information.
·
Set
the connection manager for the Destination database
·
Create
a table for storing the state ([cdc_states]). This table will be used to
track the CDC load information, so that you only pick up new changes each time
the incremental load package is run. It will be created in the Destination
database.
·
Set
the state name (CDC_State).
This value acts as a key for the CDC state information. Packages that are
accessing the same CDC data should be using a common CDC state name.
CDC
Control Operation
Mark
initial load start
This
operation is used when executing an initial load from an active database
without a snapshot. It is invoked at the beginning of an initial-load package
to record the current LSN in the source database before the initial-load
package starts reading the source tables. A walkthrough of how this process
works can be found in my CDC in SSIS
for SQL Server 2012 post.
Mark
CDC start
This
operation is used when then the initial load is made from a snapshot database
database or from a quiescence database. It is invoked at any point within the
initial load package. The operation accepts a parameter that can be a snapshot
LSN, a name of a snapshot database (from which the snapshot LSN will be derived
automatically) or it can be left empty, in which case the current database LSN
is used as the start LSN for the change processing package. This operation is
used as an alternative to the Mark Initial Load Start/End operations.
Get
processing range
This
operation is used in a change processing package before invoking the data flow
that uses the CDC Source data flow. It establishes a range of LSNs that the CDC
Source data flow reads when invoked. The range is stored in an SSIS package
variable (StateVariable property) that is used by the CDC Source during
data flow processing.
Mark
processed range
This
operation is used in a change processing package at the end of a CDC run (after
the CDC data flow is completed successfully) to record the last LSN that was
fully processed in the CDC run. The next time Get processing range is
used, this position determines the start of the next processing range.
Reset
CDC state
This
operation is used to reset the persistent CDC state associated with the current
CDC context. After this operation is run, the current maximum LSN from the
LSN-timestamp sys.fn_cdc_get_max_lsn table
becomes the start of the range for the next processing range. An example of
when this operation is used is when you want to process only the newly created
change records and ignore all old change records.
Step
– 2 [ The CDC Control Task – Mark Initial Load End ]
Same as The CDC
Control Task – Mark Initial Load Start but only one configuration that we need
to change is Set CDC Control Operation to Mark initial load End
Step
-3 [ Package Details Flow ]
In-between two
CDC Control Task, we use the data flow task and configure it.
Control
flow Tab of the package
Data
flow Tab of package
Now check the
CDC_states table object
SELECT * FROM [dbo].[cdc_states];
name
|
state
|
CDC_State
|
ILEND/IR/0x000000CA000001200002…
|
Step
– 4 [ Incremental Load ]
Configuration
Changes in CDC Control Task
The only
configuration that we need to change in both the CDC Control Task is CDC
Control Operation is Get Processing Range and Mark Processed
Range
Data
Flow Tab Details
Configuration
of CDC Source
Configuration
of CDC Splitter
NO need of any
configuration.
We assume
that the other configuration like OLED Command you now that and no need to
describe.
Hope you like
it.
Posted by: MR. JOYDEEP DAS
No comments:
Post a Comment