Introduction
When we are
migrating records to SQL server Destination by Using OLEDB Destination we all
use the OLEDB Destination Data Access Mode – Table or View First Load for
faster execution of SSIS package.
When we are
using First Load option we must specify the Maximum Insert commit size and that’s
the reason SSIS is going to ignore the entire batch when getting a single
error.
To understand
it properly, let’s take an example.
Case Scenario
Here the case
scenario is simple. Just retrieving the records from a flat file and insert it
into SQL destination Table objects.
Here is the
sample of the flat file
If we look at
the flat file we can find that the for Employee ID 101 duplicate records exists
with different name.
We have a
table Objects
CREATE TABLE [dbo].[tbl_EmplyeeDetails]
(
EmpID INT PRIMARY
KEY,
EmpName VARCHAR(50) NOT NULL
);
Here in the
table object EmpID columns are marked as Primary key. So, duplicate records are
unable to enter in this table.
We make a
simple SSIS package with OLEDB Destination with Redirect Rows in Error Output
in OLEDB Destination
Data Flow of the SSIS Package:
OLEDB Destination Editor:
If we run the
package we can see the SSIS package run successfully
Now if we
Query the Destination we find that no records is inserted in the destination
SELECT * FROM
[dbo].[tbl_EmplyeeDetails];
EmpID
EmpName
-----------
--------------------------------------------------
(0 row(s) affected)
If we see the
Redirect Rows Destination flat file for Error we can find.
All the
records are move into error trapping file.
The reason
is, we are declaring the Maximum Insert Commit Size property to 10,000. So it
is going to commit the batch at once it completed. If any error found in the
batch, entire batch is moving by re-direct rows.
We have to
move only single records in re-direct row error path not the entire batch. So,
how we solve this problem?
Solutions
The solution
is quite simple by altering the data flow design.
To solve this
we are using another OLEDB Destination with Re-Direct row error path of First OLEDB
Destination (Just copy the first OLEDB Destination and Make it Second Copy) and
altering Max Row Commit size by 1 (One).
So basically
what happens, if the 10,000 records batch is unable to insert in first OLEDB
destination for single faulty records it moves to the second OLEDB Destination
and try to insert records one by one. If any difficulty came in the insertion
for a specified records it just re-direct those records into another error
path.
Now if we
query the Destination
SELECT * FROM [dbo].[tbl_EmplyeeDetails];
EmpID
EmpName
-----------
--------------------------------------------------
101
Pryo Bhattacharya
102
Deblina Bhattacharya
103
Sarita Das Gupta
104
Shipra Roy Chowdhury
105
Deepasree Das
(5 row(s) affected)
Only duplicate
records are not inserted into the destination table.
Hope you like
it.
Posted
by: MR. JOYDEEP DAS
This is informative. Is there any alternate way to do this.. As per my perception actually one by one single commit it will go slow...
ReplyDeleteActually in real life when we use data porting, we never use this way as it lacks data integrity, a validation check is always used every time a data is being inserted into a new table with same ID (primary key)......... pyal_p@yahoo.com
ReplyDeleteSorry for Late Reply @Shiv.
DeleteThanks for your interest. Here we are taking about billions of data migration. What is your checking approach by using PK. Please share. I really want to know the mechanisms.
We should remember the constraint of Time and Resources when migrating data.