Sunday, 12 February 2017

Error Handling With OLEDB Destination – Fast Load

Introduction
When we are migrating records to SQL server Destination by Using OLEDB Destination we all use the OLEDB Destination Data Access Mode – Table or View First Load for faster execution of SSIS package.

When we are using First Load option we must specify the Maximum Insert commit size and that’s the reason SSIS is going to ignore the entire batch when getting a single error.
To understand it properly, let’s take an example.

Case Scenario
Here the case scenario is simple. Just retrieving the records from a flat file and insert it into SQL destination Table objects.

Here is the sample of the flat file



If we look at the flat file we can find that the for Employee ID 101 duplicate records exists with different name.

We have a table Objects

CREATE TABLE [dbo].[tbl_EmplyeeDetails]
  (
      EmpID     INT             PRIMARY KEY,
      EmpName   VARCHAR(50)     NOT NULL
  );

Here in the table object EmpID columns are marked as Primary key. So, duplicate records are unable to enter in this table.
We make a simple SSIS package with OLEDB Destination with Redirect Rows in Error Output in OLEDB Destination

Data Flow of the SSIS Package:



OLEDB Destination Editor:





If we run the package we can see the SSIS package run successfully



Now if we Query the Destination we find that no records is inserted in the destination

SELECT * FROM [dbo].[tbl_EmplyeeDetails];

EmpID       EmpName
----------- --------------------------------------------------

(0 row(s) affected)

If we see the Redirect Rows Destination flat file for Error we can find.



All the records are move into error trapping file.

The reason is, we are declaring the Maximum Insert Commit Size property to 10,000. So it is going to commit the batch at once it completed. If any error found in the batch, entire batch is moving by re-direct rows.

We have to move only single records in re-direct row error path not the entire batch. So, how we solve this problem?

Solutions
The solution is quite simple by altering the data flow design.



To solve this we are using another OLEDB Destination with Re-Direct row error path of First OLEDB Destination (Just copy the first OLEDB Destination and Make it Second Copy) and altering Max Row Commit size by 1 (One).



So basically what happens, if the 10,000 records batch is unable to insert in first OLEDB destination for single faulty records it moves to the second OLEDB Destination and try to insert records one by one. If any difficulty came in the insertion for a specified records it just re-direct those records into another error path.

Now if we query the Destination

SELECT * FROM [dbo].[tbl_EmplyeeDetails];

EmpID       EmpName
----------- --------------------------------------------------
101         Pryo Bhattacharya
102         Deblina Bhattacharya
103         Sarita Das Gupta
104         Shipra Roy Chowdhury
105         Deepasree Das

(5 row(s) affected)

Only duplicate records are not inserted into the destination table.





Hope you like it.





Posted by: MR. JOYDEEP DAS

3 comments:

  1. This is informative. Is there any alternate way to do this.. As per my perception actually one by one single commit it will go slow...

    ReplyDelete
  2. Actually in real life when we use data porting, we never use this way as it lacks data integrity, a validation check is always used every time a data is being inserted into a new table with same ID (primary key)......... pyal_p@yahoo.com

    ReplyDelete
    Replies
    1. Sorry for Late Reply @Shiv.
      Thanks for your interest. Here we are taking about billions of data migration. What is your checking approach by using PK. Please share. I really want to know the mechanisms.
      We should remember the constraint of Time and Resources when migrating data.

      Delete