Sunday 27 December 2015

SSIS – Using BULK INSERT

Introduction

Bulk Insert is a very important tool for every SSIS developer and it is almost used by every developer to load the Flat file into staging table. Here we are not going to describe detail about Bulk Copy, but working with some common factors when used with Bulk Copy.

What Common Factor we faced

Some of the common factor that we faced with BULK COPY is
1.    When the Import Table have the Identity Columns
2.    When we do not want to Insert some records in a specified columns of a Table
3.    When data type mismatch occurs
4.    When Truncation Error occurs.

Sometime in the ETL process the flat file size is so heavy that we are unable to open it.

How to Solve this Problem

This type and many other type of problem with BULK COPY can be solved by using Format File. So here in this article we are going to describe the BULK COPY using format file.
Format file is a Metadata of table where we are importing the data from flat file. There are two type of format file one is in text format and another is xml format. Generally I preferred the xml format but it completely depends on developer and there are no hard and fast rules for that.

So let’s start the Demonstration

Step – 1 [ The Flat file Format ]



Step – 2 [ The Destination Table ]

IF OBJECT_ID(N'[dbo].[tbl_StudentDetails]', N'U')IS NOT NULL
   DROP TABLE [dbo].[tbl_StudentDetails];
GO

CREATE TABLE [dbo].[tbl_StudentDetails]
    (
        StdRoll   INT         NOT NULL PRIMARY KEY,
        Stdname   VARCHAR(50) NULL,
        StdClass  INT         NULL,
        Section   CHAR(1)     NULL
      );
GO

Step – 3 [ Creating the Format File ]

The XML Format File

bcp PracticeDB.dbo.tbl_StudentDetails
format nul -c -f
E:\Preparation\SSIS\BCP\FmtStudentDetails.xml
-x -S JOYDEEP-LAPTOP\JOYDEEPSQL12 -T -q -t;


Here the XML format file name is FmtStudentDetails.xml and the Instance of the SQL Server name is JOYDEEP-LAPTOP\JOYDEEPSQL12


  <?xml version="1.0" ?>
- <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
- <RECORD>
  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR=";" MAX_LENGTH="12" />
  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR=";" MAX_LENGTH="50" COLLATION="SQL_Latin1_General_CP1_CI_AS" />
  <FIELD ID="3" xsi:type="CharTerm" TERMINATOR=";" MAX_LENGTH="12" />
  <FIELD ID="4" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="1" COLLATION="SQL_Latin1_General_CP1_CI_AS" />
  </RECORD>
- <ROW>
  <COLUMN SOURCE="1" NAME="StdRoll" xsi:type="SQLINT" />
  <COLUMN SOURCE="2" NAME="Stdname" xsi:type="SQLVARYCHAR" />
  <COLUMN SOURCE="3" NAME="StdClass" xsi:type="SQLINT" />
  <COLUMN SOURCE="4" NAME="Section" xsi:type="SQLCHAR" />
  </ROW>
  </BCPFORMAT>

The TEXT Format File


bcp PracticeDB.dbo.tbl_StudentDetails
format nul -c -f E:\Preparation\SSIS\BCP\FmtStudentDetails.fmt 
-S JOYDEEP-LAPTOP\JOYDEEPSQL12 -T -q -t;

10.0
4
1       SQLCHAR             0       12      ";"      1     StdRoll              ""
2       SQLCHAR             0       50      ";"      2     Stdname              SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR             0       12      ";"      3     StdClass             ""
4       SQLCHAR             0       10      "\r\n"   4     Section              SQL_Latin1_General_CP1_CI_AS


If we want we can Edit both the file. Suppose the StdRoll is an IDENTITY columns and we do not want to insert records in the table for that, we have to just edit it


  <?xml version="1.0" ?>
- <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
- <RECORD>
  <FIELD ID="0" xsi:type="CharTerm" TERMINATOR=";" MAX_LENGTH="12" />
  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR=";" MAX_LENGTH="50" COLLATION="SQL_Latin1_General_CP1_CI_AS" />
  <FIELD ID="3" xsi:type="CharTerm" TERMINATOR=";" MAX_LENGTH="12" />
  <FIELD ID="4" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="1" COLLATION="SQL_Latin1_General_CP1_CI_AS" />
  </RECORD>
- <ROW>
  <COLUMN SOURCE="0" NAME="StdRoll" xsi:type="SQLINT" />
  <COLUMN SOURCE="2" NAME="Stdname" xsi:type="SQLVARYCHAR" />
  <COLUMN SOURCE="3" NAME="StdClass" xsi:type="SQLINT" />
  <COLUMN SOURCE="4" NAME="Section" xsi:type="SQLCHAR" />
  </ROW>
  </BCPFORMAT>

10.0
4
1       SQLCHAR             0       12      ";"      0     StdRoll              ""
2       SQLCHAR             0       50      ";"      2     Stdname              SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR             0       12      ";"      3     StdClass             ""
4       SQLCHAR             0       10      "\r\n"   4     Section              SQL_Latin1_General_CP1_CI_AS

Please look at the highlighted line and you can understand it.

Step – 4 [ USING Format File for BULK INSERT ]

BULK INSERT tbl_StudentDetails
   FROM 'E:\Preparation\SSIS\BCP\test.txt'
   WITH (FORMATFILE = 'E:\Preparation\SSIS\BCP\FmtStudentDetails.xml',
         FIELDTERMINATOR = ';',
             ROWTERMINATOR = '0x0A'
         );

BULK INSERT tbl_StudentDetails
   FROM 'E:\Preparation\SSIS\BCP\test.txt'
   WITH (FORMATFILE = 'E:\Preparation\SSIS\BCP\FmtStudentDetails.fmt,
         FIELDTERMINATOR = ';',
             ROWTERMINATOR = '0x0A'
         );


Please note that the Format file is your Metadata of table so if the format file is generated properly the BULK INSERT is done successfully. We can Edit the format file according to our need and controlling the BULK INSERT.


Hope you like it.



Posted by: MR. JOYDEEP DAS


Friday 18 December 2015

SSIS – Incremental Process by CHECKSUM() Function

Introduction
In the journey of my SSIS here we are going to demonstrate one of the processes of Incremental data Load by using CHECKSUM() Function. Hope it will be interesting.

What the Scenario is
Scenario is simple load data from a staging table to Destination table. The Staging table is populated from Flat File source. Here in this article we are not interested to Load the Staging Table but interested to understand how we use the CHECKSUM() function in SQL Server for Incremental data load.

How we do That

 Step – 1 [ The Data Flow of the Package ]





Step – 2 [ The Staging and Destination Table ]


CREATE TABLE [dbo].[tbl_EmployeeStage]
  (
      EmpId   INT            NOT NULL PRIMARY KEY,
      EmpName  VARCHAR(50)    NOT NULL,
      EmpGrade CHAR(1)
  )
GO

CREATE TABLE [dbo].[tbl_Employee]
  (
      EmpId   INT            NOT NULL PRIMARY KEY,
      EmpName  VARCHAR(50)    NOT NULL,
      EmpGrade CHAR(1)
  )
GO

INSERT INTO [dbo].[tbl_EmployeeStage]
    (EmpId, EmpName, EmpGrade)
VALUES(1, 'Joydeep Das', 'A'),
      (2, 'Rajesh Mondal', 'C'),
      (3, 'Santi Nath', 'B');



Step – 2  [ OLEDB – Source for Retrieving data from Source and Destination Table ]






The SQL Command Text for Staging Table

SELECT EmpId, EmpName, EmpGrade, CHECKSUM(*) AS [CheckSum]
FROM   [dbo].[tbl_EmployeeStage]
ORDER BY 1;

The SQL Command Text for Destination Table

SELECT EmpId, EmpName, EmpGrade, CHECKSUM(*) AS [CheckSum]
FROM   [dbo].[tbl_Employee]
ORDER BY 1;

Step – 3  [ The Sort Transform and The Merge Join Transform ]

 Not going to Describe.

Step – 4 [ The Conditional Split ]





Step – 5 [ The Conditional Split named Record Change ]





Please look at the Condition. Here we use the Columns that is made by CHECKSUM() Function. We are not going no check like
[Source Col-1] <> [Destination Col-1] OR [Source Col-2] <> [Destination Col-2] Approach.
Think if you have 150 columns in your table…. What the Situation you face over here.


Step – 6 [ First Time Execution of Package ]





Step – 7 [ Execute Package When Some Records UPDATED in Staging Table ]


UPDATE [dbo].[tbl_EmployeeStage]
       SET    EmpName = 'Suman Das'
WHERE  EmpId 1





Step – 8  [ Execute Package When Records UPDATED and INSERT in Staging Table ]

UPDATE [dbo].[tbl_Employee] SET EmpName = 'Sree Devi' WHERE EmpId 2;
GO

INSERT INTO [dbo].[tbl_EmployeeStage]
    (EmpId, EmpName, EmpGrade)
VALUES(4, 'Sunny', 'C');
GO







Hope you like it.




Posted by: MR. JOYDEEP DAS