Tuesday, 24 November 2015

SSIS – Flat File Retrieval with Real-time Scenario Based

Introduction
We are again with flat file scenario based solution. Query are frequently came to provide the solution. Here is another interesting story with Flat File. I personally prefer when to design with SSIS take a note book and try to draw it. It gives 90% of the solution and next 10% we have Google…

The Scenario
We have a folder and folder contains some flat files. The name of the flat files is important it contains Filename + Date + Serial Number
For an example: EmployeeRecord20151124-01.txt, EmployeeRecord20151124-02.txt etc.

We have to retrieve only those file which have system date. I mean to say the date of package run must match with the date for File name.
In a single day there are three copy of the file is put into the folder, such as morning, Evening and Night.

The package must run each file in first time and if we re-run the package it not retrieve the data from old file of same day. It looks for new file in same day, if found then retrieve records and else not.

The new flat file of same day may be contains the old modified records also. So package must take care of that also.

Hope you understand the scenario.

The Logic behind the Package
1.    We need three tables object. One is Destination Table where the actual data store. Second table keep track the File name which is already retrieved. Third table takes all the file name and path from the source folder.

2.    First we start works with third table and put the entire file name that match the current date scenario.
3.    Then we compare it with Second table and check whether the file is retrieved before or not. If not then we retrieve the file.
4.    The data flow logic is as before using Merge transform and Conditional split

How we do that

Step – 1 [ Create the Necessary Table Objects ]

IF OBJECT_ID(N'[dbo].[tbl_EmployeeDetails]', N'U')IS NOT NULL
   BEGIN
      DROP TABLE [dbo].[tbl_EmployeeDetails];
   END
GO

CREATE TABLE [dbo].[tbl_EmployeeDetails]
   (
     EmpID    INT         NOT NULL PRIMARY KEY,
     EmpName  VARCHAR(50) NOT NULL,
     EmpDept  VARCHAR(50) NOT NULL,
     EmpGrade CHAR(1)     NOT NULL
   );
GO

IF OBJECT_ID(N'[dbo].[tbl_OLDFlatFileDetails]', N'U')IS NOT NULL
   BEGIN
      DROP TABLE [dbo].[tbl_OLDFlatFileDetails];
   END
GO

CREATE TABLE [dbo].[tbl_OLDFlatFileDetails]
   (
     IdNo           INT           NOT NULL IDENTITY PRIMARY KEY,
     FFileName      VARCHAR(50)   NOT NULL,
     FullPath       VARCHAR(1000) NOT NULL,
   );
GO

IF OBJECT_ID(N'[dbo].[tbl_NEWFlatFileDetails]', N'U')IS NOT NULL
   BEGIN
      DROP TABLE [dbo].[tbl_NEWFlatFileDetails];
   END
GO

CREATE TABLE [dbo].[tbl_NEWFlatFileDetails]
   (
     IdNo           INT           NOT NULL IDENTITY PRIMARY KEY,
     FFileName      VARCHAR(50)   NOT NULL,
     FullPath       VARCHAR(1000) NOT NULL,
   );
GO

Step – 2 [ The Control Flow Task ]



Step – 3 [ Execute SQL Task named TRUNCATE tbl_NEWFlatFileDetails ]

Here we Truncate the tbl_NEWFlatfileDetails Table where we store the Flat file name and path from the source folder.

TRUNCATE TABLE [dbo].[tbl_NEWFlatFileDetails];

Step – 4 [ ForEach Loop Container ]



Step – 5 [ Expression Task named SUBSTRING File Name ]



In the Expression we used
@[User::v_FileNameSubString]= SUBSTRING( @[User::v_FileName], 15, 8 )
Here the SUBSTRING Function retrieves the DATE part from the File name like 20151124.

Step – 6 [ Precedence Constraint ]



In the Expression we used
@[User::v_FileNameSubString]== @[User::v_CurrentDateFile]

Where the variable named @[User::v_CurrentDateFile] has its own Expression

REPLACE((DT_WSTR, 10)(DT_DBDATE)GETDATE(),"-","")

Step – 7 [ Execute SQL task named Insert Data INTO tbl_NEWFlatFileDetails ]



The SQL Statement is

INSERT INTO [dbo].[tbl_NEWFlatFileDetails]
  (FFileName, FullPath)
VALUES(?, ?)

Step – 8 [ Execute SQL Task named Retrive Records Comparing Tables ]



The SQL Statement is

SELECT FullPath, FFileName
FROM   [dbo].[tbl_NEWFlatFileDetails]
WHERE  FFileName NOT IN(SELECT FFileName
                        FROM   [dbo].[tbl_OLDFlatFileDetails]);

Step – 9 [ For EachLoop Container ]



Step – 10 [ Data Flow Task named Working With tbl_EmployeeDetails ]

I am not going to describe it here. Just a pictorial diagram is given



The Flat File Connection manager need to change one property named Connection String by Expression



Step – 11 [ Execute SQL task named Insert Data Into tbl_OLDFlatFileDetails ]



The SQL Statement is

INSERT INTO [dbo].[tbl_OLDFlatFileDetails]
(FFileName, FullPath)
VALUES(?, ?)

One thing I want to mention in my SSIS article is that, I don’t want my reader just copy and paste the steps but they must understand the beauty of the package and do it by their own logic. If the reader has any confusion or query they can comment it or write me mail.



Hope you like it.





Posted by: MR. JOYDEEP DAS

3 comments:

  1. Nice article related to SSIS .Also we can use lookup or Marge Statement instead of Merge join and conditional split tasks.

    ReplyDelete
    Replies
    1. Thanks @Jax to provide your valuable time.
      I personally do not like Lookup as per performance reason in case of Large volume of data.
      Please provide your valuable comments, that we all can share knowledge.

      Delete