Tuesday, 18 April 2017

SSIS - Searching Records in Flat file when loading

In this article we are trying to play with filtration in Flat File. As we all know that to make any filtration with flat file, we have to load the entire flat file into a staging table or just use a Conditional Split Transformation after loading the Flat file by using Flat File Data source and Flat file connection manager.

If the size of our flat file is huge and we need only some records depending on specified criteria, why we are going to load the entire flat file and then filter some records from it?
It is always better, if we are going to filter some records before loading the flat file and then work with those records. Something like Incremental Load with Flat File but No staging table.

Here we are just going to demonstrate the simple filtration in flat file before loading it. We can make it Incremental load by using the same approach.

Look’s interesting? To understand this article properly, we need some prerequisite like the VB scripting language and understanding of parameterized Batch file with SSIS.

Case Scenario
The high-level view of our SSIS package is it just load a flat file form the source and before loading the flat file it filters some records first and then load it.
Please note that we are not going to load the entire flat file. We are just going to load some selected records.

Source and Destination Details
In source we have a flat file with huge volume of data and in destination, we have a table object to store the records.

Sample of Flat File Source:

Filter Criteria:
Before loading the flat file we are just going to filter the records depending on the criteria mentioned below.
We are going to use the last columns of the Flat File and retrieve those records that have greater than or equal (>=) to the specified date in a variable.

Proposed Solution
We are going
·         Call a parameterized batch file from the SSIS Execute Process task
·         Batch file is going to call the VB script file
·         The VB Script file is going to filter the flat file according to criteria and make another flat file with filtered output
·         By using the newly created flat file we are going to store the records into our DB table objects

Pictorial Diagram for Understanding:

[Step - 1 ] Create the VB Script

Here in our example we are creating the VB script in the name of Test.bvs

strFiles = WScript.Arguments.Unnamed(0)
strDate  = WScript.Arguments.Unnamed(1)

Set objFS = CreateObject( "Scripting.FileSystemObject" )
Set objFile = objFS.OpenTextFile(strFiles)

Do Until objFile.AtEndOfStream
   strLine = objFile.ReadLine
   s = Split(strLine,"|")
   if (CDate(s(5)) >= CDate(strDate)) then
      WScript.Echo  strLine
   end if

If we look at the code we can see that it’s open the flat file first. The name of the flat file is supplied with arguments.
strFiles = WScript.Arguments.Unnamed(0)

Then it takes another input. In our case it is a Date.
strDate  = WScript.Arguments.Unnamed(1)

Then it reads the file line by line using a Loop until the end of the file comes. After reading a single line of flat file, it just splits the columns.
s = Split(strLine,"|")
In our case the column separator is (|). In our flat file, there are 5 columns (starting columns are 0 columns). The last columns contain the Date columns that we need to compare.

if (CDate(s(5)) >= CDate(strDate)) then

[ Step – 2 ] Create the Batch file that called the VB Script file
Here we are creating the batch file in the name of MyBatch.bat
The batch file calls the VB script file. To call the VB script file we need two parameters.

SET arg1 = %1

REM cscript //nologo test.vbs stdDtls.txt 05-Jan-2015 > filter.txt
cscript //nologo test.vbs StdDtls.txt "%1%" > filter.txt


Here the filtered output is stored into another flat file named filter.txt. The file name filter.txt is automatically created.

[ Step – 3 ] Call the Batch file from SSIS Process Task
First, we have to create a variable in our SSIS package which contains the date and depending on that date we are going to extract the records.

Now we are going to call the batch file from Execute Process Task of SSIS.

After executing the process task, it generates another flat file. This flat file contains the filtered records. In our case the name of the file is filter.txt

[ Step – 4 ] Retrieve records from Newly created flat file
Now we are extracting the records from newly created flat file. For Meta data mapping we just execute the Execute Process time once in first time to create a new flat file and then use this flat file form Flat File Connection Manager.
If it is needed, we can use Expression of Flat File connection manager Property named Connection string to make it dynamic.

Control Flow Diagram:

Data Flow Diagram:

Output in Data Viewer:

Here we are not going to store it in a DB table object. We are just showing it in the Data Viewer.

Hope you like it.