Introduction
Change Data
Capture (CDC) is introduced by Microsoft from SQL Server 2008. It is most
popular in case of Incremental load. We are going to study a scenario and
understand why the CDC is so important. The CDC is little complicated in MS SQL
Server 2008 but Microsoft improve the functionality and made it easier for us
from MS SQL 2012. In this article we are going to discuss CDC related to MS SQL
2012.
The article
based on two things
1.
How
CDC works on MS SQL Server 2012
2.
Implementation
of MS SQL Server 2012 CDC into SSIS
Hope it will
be informative and you enjoy the session.
Why CDC is needed a Scenario
We have a table
named tbl_EmployeeDetails and it contains EmpID, EmpName, DesigNation
and DOJ. The data is loaded from tbl_EmployeeDetails to Production
server table named tbl_Employee.
Suppose we
have 50 thousand records in the tbl_EmployeeDetails and data of that table is
frequently changed by Insert/Update and Delete. So it is not possible to delete
the destination table records and load it every time due to performance
factors.
So we have to
understand which records are Inserted/Updated/Deleted and make the operation on
this sets only. Here in this source table tbl_EmployeeDetails we are unable to understand
it and hence the CDC comes into the picture.
Before MS SQL
2008 we used alternate approaches that include timestamp columns, triggers, or
complex queries often hurt performance and increase complexity.
How CDC Works
To understand
it properly here we provide a pictorial diagram.
When any
Insert/Update/Delete occurs in the database table it logged into the log file.
The Scheduler reads the log file and stores the data into CDC table. This is
done by LSN (Log Sequence Number). From this CDC table we can track the changes
of the records in the table.
Implementation of CDC in MS SQL Server 2012
Before
Working with CDC the SQL Server Agent Services must be STARTED
Step-1
[ Enabling Change Data Capture (CDC) in
Database ]
USE TEST
GO
EXEC sys.sp_cdc_enable_db;
GO
To check the
CDC is properly activated on the database
SELECT [name], database_id, is_cdc_enabled
FROM sys.databases WHERE name = 'TEST';
GO
name
|
database_id
|
is_cdc_enabled
|
TEST
|
7
|
1
|
Here is_cdc_enabled
= 1 means the CDC is enabled in the database named TEST.
After this
execution of the command we find some table objects is created automatically as
a system table in the database and we have to understand those table clearly.
cdc.captured_columns
This
table returns result for list of captured column.
cdc.change_tables
This
table returns list of all the tables which are enabled for capture.
cdc.ddl_history
This
table contains history of all the DDL changes since capture data enabled.
cdc.index_columns
This
table contains indexes associated with change table.
cdc.lsn_time_mapping
This
table maps LSN number (for which we will learn later) and time.
Step-2
[ Create the Source Table ]
IF OBJECT_ID(N'[dbo].[tbl_EmployeeDetails]',
N'U') IS NOT NULL
BEGIN
DROP TABLE [dbo].[tbl_EmployeeDetails];
END
GO
CREATE TABLE [dbo].[tbl_EmployeeDetails]
(
EmpID INT NOT NULL IDENTITY PRIMARY KEY CLUSTERED,
EmpName VARCHAR(50) NOT NULL,
DesigNation CHAR(1) NOT NULL,
DOJ DATETIME NOT NULL
);
Step-3
[Enabling Change Data Capture (CDC) in One
or More Table ]
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'tbl_EmployeeDetails',
@role_name = NULL
GO
@source_schema is the table schema where we want to
enable the CDC. In our case it is [dbo].
@source_name is the name of the table objects
where we want to enable CDC.
@role_name
If there is
any restriction of how data should be extracted from database, this option is
used to specify any role which is following restrictions and gating access to
data to this option if there is one. If you do not specify any role and,
instead, pass a NULL value, data access to this changed table will not be
tracked and will be available to access by everybody.
This will
create two jobs in the SQL Server Agent.
Job 'cdc.TEST_capture' started
successfully.
Job 'cdc.TEST_cleanup' started
successfully.
TEST_Capture
These jobs
execute a system stored procedure named sys.sp_MScdc_capture_job. This
procedure cannot be executed explicitly when a change data capture log scan
operation is already active or when the database is enabled for transactional
replication. This system SP enables SQL Server Agent, which in facts enable
Change Data Capture feature.
TEST_Cleanup
When
this job is executed it runs the system stored procedure sys.sp_MScdc_cleanup_job.
This system SP cleans up database changes tables.
After this operation there is another table is
automatically created with name of the table that we created earlier (tbl_EmployeeDeatails).
The name of the mirror table is [cdc].[dbo_tbl_EmployeeDetails_CT]
The [cdc].[dbo_tbl_EmployeeDetails_CT] table will
contains all the changes (Insert/Update/Delete) in the table named [dbo].[ tbl_EmployeeDeatails].
Step
– 4 [ Observing the cdc].[dbo_tbl_EmployeeDetails_CT] Table ]
SELECT * FROM [cdc].[dbo_tbl_EmployeeDetails_CT];
Here we got Five new columns
and others are the columns from original table
Step-
5 [ Now Make some Operation on Source table and See the CDC changes ]
Insert
Records
INSERT INTO [dbo].[tbl_EmployeeDetails]
(EmpName, DesigNation, DOJ)
VALUES('Joydeep
Das', 'A', '2012-09-08'),
('Deepasree
Das', 'B', '2013-05-01');
GO
SELECT * FROM [dbo].[tbl_EmployeeDetails];
SELECT * FROM [cdc].[dbo_tbl_EmployeeDetails_CT];
EmpID
|
EmpName
|
DesigNation
|
DOJ
|
1
|
Joydeep Das
|
A
|
2012-09-08
00:00:00.000
|
2
|
Deepasree Das
|
B
|
2013-05-01
00:00:00.000
|
__$start_lsn
|
__$end_lsn
|
__$seqval
|
__$operation
|
__$update_mask
|
EmpID
|
0x00000086000000A80020
|
NULL
|
0x00000086000000A8001D
|
2
|
0x0F
|
1
|
0x00000086000000A80020
|
NULL
|
0x00000086000000A8001F
|
2
|
0x0F
|
2
|
EmpName
|
DesigNation
|
DOJ
|
Joydeep
Das
|
A
|
2012-09-08
00:00:00.000
|
Deepasree
Das
|
B
|
2013-05-01
00:00:00.000
|
Here _$operation
= 2 means data is Inserted. Look at the _$start_lsn = 0x00000086000000A80020
means both the records in
same transaction.
Update
Records
UPDATE [dbo].[tbl_EmployeeDetails]
SET EmpName = 'Shipra Das'
WHERE EmpID = 1;
SELECT * FROM [dbo].[tbl_EmployeeDetails];
SELECT * FROM [cdc].[dbo_tbl_EmployeeDetails_CT];
EmpID
|
EmpName
|
DesigNation
|
DOJ
|
1
|
Shipra Das
|
A
|
2012-09-08
00:00:00.000
|
2
|
Deepasree Das
|
B
|
2013-05-01
00:00:00.000
|
__$start_lsn
|
__$end_lsn
|
__$seqval
|
__$operation
|
__$update_mask
|
EmpID
|
0x00000086000000A80020
|
NULL
|
0x00000086000000A8001D
|
2
|
0x0F
|
1
|
0x00000086000000A80020
|
NULL
|
0x00000086000000A8001F
|
2
|
0x0F
|
2
|
0x00000088000001980003
|
NULL
|
0x00000088000001980002
|
3
|
0x02
|
1
|
0x00000088000001980003
|
NULL
|
0x00000088000001980002
|
4
|
0x02
|
2
|
EmpName
|
DesigNation
|
DOJ
|
Joydeep
Das
|
A
|
2012-09-08
00:00:00.000
|
Deepasree
Das
|
B
|
2013-05-01
00:00:00.000
|
Joydeep
Das
|
A
|
2012-09-08
00:00:00.000
|
Shipra
Das
|
A
|
2012-09-08
00:00:00.000
|
After Update
operation there 2 new records inserted _$start_lsn = 0x00000088000001980003
which have _$operation = 3 which is Value Before Update And _$operation = 4
which is Value After Update
Delete
Records
DELETE [dbo].[tbl_EmployeeDetails]
WHERE EmpID = 1;
SELECT * FROM [dbo].[tbl_EmployeeDetails];
SELECT * FROM [cdc].[dbo_tbl_EmployeeDetails_CT];
EmpID
|
EmpName
|
DesigNation
|
DOJ
|
2
|
Deepasree Das
|
B
|
2013-05-01
00:00:00.000
|
__$start_lsn
|
__$end_lsn
|
__$seqval
|
__$operation
|
__$update_mask
|
EmpID
|
0x00000086000000A80020
|
NULL
|
0x00000086000000A8001D
|
2
|
0x0F
|
1
|
0x00000086000000A80020
|
NULL
|
0x00000086000000A8001F
|
2
|
0x0F
|
2
|
0x00000088000001980003
|
NULL
|
0x00000088000001980002
|
3
|
0x02
|
1
|
0x00000088000001980003
|
NULL
|
0x00000088000001980002
|
4
|
0x02
|
2
|
0x00000097000001680005
|
NULL
|
0x00000097000001680002
|
1
|
0x0F
|
1
|
EmpName
|
DesigNation
|
DOJ
|
Joydeep
Das
|
A
|
2012-09-08
00:00:00.000
|
Deepasree
Das
|
B
|
2013-05-01
00:00:00.000
|
Joydeep
Das
|
A
|
2012-09-08
00:00:00.000
|
Shipra
Das
|
A
|
2012-09-08
00:00:00.000
|
Shipra
Das
|
A
|
2012-09-08
00:00:00.000
|
For delete
operation a new row is added where _$start_lns = 0x00000097000001680005 and _$operation
= 1 which means Deleted Records.
In my next
article I am going demonstrate the using of CDC in SSIS component.
Hope you like
it.
Posted by: MR. JOYDEEP DAS
Hello Sir grt Post.. I have query regarding the performance related issue.Is it going to effect the performance of the application.?
ReplyDeleteThanks @Vivek
DeleteEverything has it's own cost ...