Wednesday, 20 February 2013

Data Flow tasks of SSIS

In my previous article, I am with Control Flow Tasks and Containers. In this article I am trying to discuss abut one of the control flows tasks named data flow.
If we think about a package of SSIS, it must contains one Control Flow tasks and can have zero, one or more data flow tasks.

How work with Data Flow Tasks
It is simple to work with data flow tasks. From Control Flow Items tools bar just drag the Data Flow tasks to the work space of Control Flow tabs and then double click the Data Flow Tasks.

Data Flow tasks Tool Box Objects
Data flow tasks has 3 types of Objects in tools box mentioned bellow.

1.    Data Flow Source Adapters
2.    Data Flow Transformations
3.    Data Flow Destination Adapter

Data Flow Source Adapter
It uses the package connections which points a server instance or file locations. The Source Adapter extracts data from source and moves it to direct Destination Adaptors or to Data Flow transformations and then from Data flow Transformation to Destination adapter.

Here there is one exceptions, the Raw File Adapters doesn't use any package connections. 

There are some common data flow sources adaptors are mentioned bellow.

1.    Data Reader: Make connection to a table or Query through a .NET providers
2.    Excel: Extracts data from Excel work sheet in the Excel file.
3.    Flat File: Make connection with delimited or fixed-width file.
4.    Raw File: Connect to binary type file.
5.    OLEDB: Connects to Installed OLEDB provider, such as SQL Server, Access, ORACLE etc.
6.    XML: Extracts data from XML file.

Data Flow Transformations
Transformations are used to modify or manipulate data in the data flow. It performs operations at one row at time or several rows of data at once.

There are wide verity of transformation tasks are present in the tools bar, what you choose is depends on what type of data processing is needed to complete your functionality. Depending of different nature of transformation, we can divide the transformation into 4 categories.

Category-1 [ Logical row-level transformations ]    
It's works on each and every rows of data flow one by one. `Some common used of there transformations are data type conversion, replacing NULL values, case conversion(lower case to upper case etc), row counts etc.

Category-2 [ Multi-Input and Multi-Output transformations ]    
It's works with more than one data input or can generate more than one output. It's have the ability to combine or branch data from one or more sources to one or more destination. Common examples are Union All, Merge-Join, Multicast etc.

Category-3 [ Multi-Row transformations ]    
It's performs works based on criteria from multiple inputs rows or generate multiple output rows from a single input rows. This type of transforms has more memory overhead. But from business prospective it is very important. Some common examples are Aggregate, Pivot, Sort, Row Sampling etc.

Category-4 [ Advanced Data-preparation transformations ]    
It's works on rows in the data flow pipeline. Some common examples are OLE DB Command, Fuzzy Grouping, and Scripts Components etc.

Data Flow Destination
It is the end point of the package. It is just like the Data flow source to use the package connections. It defines the destination where the processed or unprocessed data are pushed. If we take an example, suppose we want to store the final output of the package in a excel file than we must select the Excel Destination adapter.

To understand it properly
Sometimes real life examples help us to understand properly. Here I am taking a real life example to understand it properly. Think about postal services.

Step-1 [ Data Flow Source Adapters ]
Postman collects postage from different sources like post box, registered postage, and parcel from customer home.

Step-2 [ Data Flow Transformations ]
Then in the post office, they check the details of each and every postal baggage like the stamps are corrects or not, delivery address etc.. ect.. and process them.

Step-3 [ Data  Flow Destination Adapter ]
Postman goes to the delivery address and delivers the postage.   

Hope you like it.

Posted by: MR. JOYDEEP DAS

Sunday, 17 February 2013

Important Facts SSIS


Before going to deep with SSIS, we have to understand some important facts related to SSIS. I forgot it to mention in my previous articles related to SSIS. Here in this article I am trying to illustrate it. Here I am using Microsoft SQL server 2008 or 2008R2 version for SSIS solutions.

Relation with MS Visual Studio 2010

Microsoft Visual Studio 2010 does not support Business Intelligence Development Studio Integration Services, Report Services and Analysis Services projects for SQL Server 2008 or 2008R2 version. To work around this issue, you can install Visual Studio 2008 alongside Visual Studio 2010 on the same machine and then open the Business Intelligence Development Studio projects in Visual Studio 2008.

Creating Solutions In BIDS

When we are going to create a new solution, Business Intelligence Development Studio adds a Solution folder to Solution Explorer, and creates files that have the extensions are mentioned bellow.

File Extension
The *.sln file contains information about the solution configuration and lists the projects in the solution.
The *.suo file contains information about your preferences for working with the solution.

Files in Integration Services Projects

When we add a new or an existing Integration Services project to a solution, Business Intelligence Development Studio (BIDS) creates project files that have the extensions are mentioned bellow

  • The *.dtproj file contains information about project configurations and items such as data sources and packages.

  • The *.dtproj.user file contains information about your preferences for working with the project.

  • The *.database file contains information that Business Intelligence Development Studio requires to open the Integration Services project.

Related Tropics

Understand the BIDS

SSSIS Control Flow Containers


Hope you like it

Posted by: MR. JOYDEEP DAS

Saturday, 16 February 2013

SSIS Control Flow Containers


To continuing my journey on SSIS, this article contains related to Control Flow Containers. It is important to understand the containers of control flow. As per me very SSIS developer must have a solid concept related to the Containers of Control flow.

Type of Control Flow

There are three primary types of control flow objects.

1.    Control flow Tasks: Workflow objects that perform operational-level jobs.
2.    Control flow Containers: It’s grouping mechanisms for tasks and other containers.
3.    Constraints: Allow to connects tasks and define execution ordering and precedence.       

Control flow containers

When we open the SSIS BIDS for control flow tables and the Control Flow Items in the tools box we can see the three containers placed at the top of the others.

Microsoft places it as the top positions they are frequently used by the SSIS Developers.

Why it’s needed

A SSIS package must contains a single control flow tasks. However most of the time a package contains several control flow tasks and they coordinate each other. So in this condition we must organize the tasks and for that we use the control flow containers. So a control flow contains used to group the tasks together. The control flow containers group the tasks together for parallelization, ordering, logging and transactions.

We can think the control flow containers as a folder and contains the other sub folder and file. Here the control flow containers contain different tasks like file in the folder and others containers like subfolder within folder.

How to add containers

To add a control flow containers just drag it from control flow item tool bars to control flow work space of BIDS.

Type of Control Flow Containers
There are three primary types of control flow containers

1    .    Sequence Containers
2    .    For Loop Containers
3    .    ForEach Loop Containers

      Sequence Containers

      Most commonly used containers by SSIS developer. It’s used to organize subordinate tasks by grouping them together and let us applies transactions or assigns logging to the containers.

       For Loop Container

     Provide the same functionality as sequence container except that they can let us run the tasks within them multiple times, based on any condition. Think it as a loop. For example for loop that runs from 1 to 100 times.

      ForEach Loop Conatiners

      Allow us looping but instead of providing a condition expression we loop over a set of objects such as files in a folder.

     If we takes a simple example think we have 10 clients and they upload there data to an FTP server by 10 different txt file. The SSIS program must check the format of each txt file before read and store the data in Table objects of MS SQL Server.

Common Properties of Containers

There are some common properties of all the containers are mentioned bellow. We can find the details in MSDN

Let’s take a simple example to understand the Control Flow containers.

Example of Sequence Containers

1    .    Drag the Sequence Containers from Control Flow Items tools bar to Control Flow work 
         space. Drag 3 Execute SQL Task to the Sequence Containers.
2    .    Edit the Execute SQL Task by right click each tasks and select Edit from popup menu.
3    .    In SQL StatementàConnection Select the OLEDB Connection and in SQL Statement  
         property just Write SELCT 1.

4   .    In the 2nd Execute SQL Tasks for the property SQL statement write something which in not a 
        sql statement to create Error.
5   .    Now go to the property of the Sequence containers and set the transaction option property to 
6   .    Now run the package.

Related Tropics

1    .    Understand the BIDS

2   .    SSIS Connection Manager

3   .    Introduction to Control Flow and Data Flow

Hope you like it.

Posted by: MR. JOYDEEP DAS

Monday, 11 February 2013

Introduction to Control Flow and Data Flow


When someone learning the SSIS, he/she must face a common question like What is the difference between Control Flow and Data Flow. As the Control flow and the Data flow are two basic elements of SSIS.

We already get a glance about Control Flow and Data Flow in my previous article Understanding the BIDS and in this article now I am trying to differentiate between Control Flow and Data flow. Most of the SSIS developer spent a lot of time to dealing with these two elements.

So let's Start

When we open the Business Intelligence Development Studio or BIDS (in MS SQL Server 2005/2008 2008R2/SQL Server Denali (CTP3)) we find that there are two tabs in the name of Control Flow and Data Flow.

Control Flow

Control flow controls the package flow based on completions, success or tasks failure. The smallest units of the control flow are a task. It does not move data from one task to another. It just maintains the control of data flow. Tasks are run in series if connected with precedence or in parallel. Package control flow is made up of containers and tasks connected with precedence constraints to control package flow.

Data Flow

Data Flow deals with the Actual data movement. Here the multiple components can be process data at the same time. The smallest unit of the data flow is called components. The Data flow is made up of source (From where the data is collected, may be Excel, Flat file, FTP, SQL Server etc), Transform (Modification or manipulation of data, such as data type conversion, convert to smaller to upper etc) and the destination(where the data stores, may be SQL server etc).

So we can tell that the data flow is the child of the control flow. A SSIS package at least contains a control flow. Control Flow may or may not required a data flow.

Real life Example

Don't be afraid the bookish things. To understand it properly, here I am mention a simple scenario or story.
A post man named xyz deliver the postage door to door. Let's assume, he has 3 letter for person –A , B and C. Here he person B address is far from post office. The Person B address is between A and C and Person C address is near the post office. So the postman decide to go to the Person B address first and provide him the letter, then the person B and at the end person C.

Here the decision taken by the postmen how to distribute the postage material is the Control Flow and actual handover the postage material is the Data Flow.

Hope you like it.

Posted by: MR. JOYDEEP DAS 

Wednesday, 6 February 2013

Error 27506


One of my friend want a solution from me related to scripts compilation error. He is going to compile a batch job of some SQL scripts, Procedure etc and found the error mentioned bellow.

Error 27506. Error executing SQL script test.sql. Line 76. During undo of a logged operation in database 'Test_DB', an error occurred at log record ID(38256:32:22). The row was not found. Restore the database from a full backup, or repair the…

The Reason of this Error

The reason from this error is Data Corruption in Database

Possible Solutions

Follow the Tasks given bellow

1.    Backup the Database
2.    Run the following query


Hope it will solve the problem.

Posted by: MR. JOYDEEP DAS

Sunday, 3 February 2013

SSIS Connection Manager


After the proper understanding the BIDS it's time to understand the connection manager of SSIS package.  In this article I am trying to cover the connection manager and the purpose of it.

The Connection Manager

The Connection manager dialog box helps us to select the type of connection to add to a package. The connection contains connection string that point to files or databases.

Available connection manager

SSIS uses connection managers to integrate different data sources into packages. SSIS includes a wide variety of different connection managers that allow us to move data around from place to place.

The details are mentioned bellow.

Connection Manager
Connecting to ADO objects such as a Recordset.
Connecting to data sources through an ADO.NET provider.
Connects to a cache either in memory or in a file
Connecting to an Analysis Services database or cube.
Connecting to an Excel worksheet.
Connecting to a file or folder.
Connecting to delimited or fixed width flat files.
Connecting to an FTP data source.
Connecting to an HTTP data source.
Connecting to a Microsoft Message Queue.
Connecting to a set of files, such as all text files on a particular hard drive.
Connecting to a set of flat files.
Connecting to an ODBC data source.
Connecting to an OLE DB data source.
Connecting to a server via SMO.
Connecting to a Simple Mail Transfer Protocol server.
Connecting to a SQL Server Mobile database.
Connecting to Windows Management Instrumentation data.

How to use Connection Manager

To create a connection manager just right click it and choose one from the shortcut menu.

Here I choose New Connection… The Connection manager dialog box appears.

From depending on types, we can choose the connection.

Where It Used

The connection manager can be used at data flow source and destination.
Here some pictorial diagram showing that the connection manager is using at Data Flow sources.

Related tropics

Understand the BIDS

Hope you like it.

Posted by: MR. JOYDEEP DAS