Friday, 29 December 2017

SSIS Data Flow Error Handling with Description and Column Name

Introduction
Finding the Error Description is a simple one line code but finding Error Column name is not so much simple. Microsoft made it easy from SQL 2016, when modified API comes. But before 2016 it’s really a difficult job to find the Error Column Name by using Script Task and Script Component.
This article is trying to provide a simple step by step solution to find the Error Description and Error column name.

Scenario
We are trying to insert records from Flat file to our SQL Table. The flat file has two columns named “roll” and “name”. Sometimes name contains huge text and don’t support by our Table object (Size of the name column in table object is Varchar(50)) and hence generate Error. We are trying to display Error description and the column name that generate the Error.


Control/Data Flow Details



Control Flow – Get Linage ID Task
This Script Task is used to grade the Linage id at run time and save it in a text file.

What to do:
1.    Create a text file C:\temp\write.txt
2.    Create two package level variable named User::execsObj and User::lineageIds as Object type and pass them as Read Write variable of script task.
3.    In Script Task add assembly named Microsoft.SqlServer.DTSPipelineWrap

Namespace needed
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using System.Windows.Forms;
using System.Collections.Generic;
using System.IO;


Script Task Code in C#

public void Main()
{
 // Grab the executables so we have to something to iterate over, and initialize our lineageIDs list
// Why the executables?  Well, SSIS won't let us store a reference to the Package itself...
Dts.Variables["User::execsObj"].Value =(Package)Dts.Variables["User::execsObj"].Parent).Executables;

Dts.Variables["User::lineageIds"].Value = new Dictionary<int, string>();
lineageIds = (Dictionary<int, string>)Dts.Variables["User::lineageIds"].Value;
Executables execs = (Executables)Dts.Variables["User::execsObj"].Value;

ReadExecutables(execs);

// Just proof of concept to see the results before you dedicate your time to the solution
// Delete this code in your actual implementation
using (StreamWriter writetext = new StreamWriter(@"C:\temp\write.txt", true))
        {
                foreach (var kvp in lineageIds)
                writetext.WriteLine(kvp.Key + " : " + kvp.Value);
        }
            Dts.TaskResult = (int)ScriptResults.Success;
}

        

private void ReadExecutables(Executables executables)
{
  foreach (Executable pkgExecutable in executables)
    {
     if (object.ReferenceEquals(pkgExecutable.GetType(),            
                 typeof(Microsoft.SqlServer.Dts.Runtime.TaskHost)))
                {
                    TaskHost pkgExecTaskHost = (TaskHost)pkgExecutable;
                    if (pkgExecTaskHost.CreationName.StartsWith("SSIS.Pipeline"))
                    {
                        ProcessDataFlowTask(pkgExecTaskHost);
                    }
                }
     else if (object.ReferenceEquals(pkgExecutable.GetType(), 
                 typeof(Microsoft.SqlServer.Dts.Runtime.ForEachLoop)))
                {
                    // Recurse into FELCs
                    ReadExecutables(((ForEachLoop)pkgExecutable).Executables);
                }
      }
 }

private void ProcessDataFlowTask(TaskHost currentDataFlowTask)
{
   MainPipe currentDataFlow = (MainPipe)currentDataFlowTask.InnerObject;

   foreach (IDTSComponentMetaData100 currentComponent in 
                  currentDataFlow.ComponentMetaDataCollection)
            {
              // Get the inputs in the component.
              foreach (IDTSInput100 currentInput in currentComponent.InputCollection)
                  foreach (IDTSInputColumn100 currentInputColumn in 
                          currentInput.InputColumnCollection)
                        lineageIds.Add(currentInputColumn.ID, currentInputColumn.Name);

               // Get the outputs in the component.
               foreach (IDTSOutput100 currentOutput in currentComponent.OutputCollection)
                    foreach (IDTSOutputColumn100 currentoutputColumn in 
                                currentOutput.OutputColumnCollection)
                        lineageIds.Add(currentoutputColumn.ID, currentoutputColumn.Name);
            }
}


Adding Script Component in Data Flow Task
Now add script component in data flow task.


What to do:
1.    Just pass the variable named User::lineageIds as Read variable
2.    Script Component input columns as ErrorCode and ErrorColumn
3.    Script Component output column as ErrorDescription and ErrorColumnDescription with DT_STR data type.

Name space
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Collections.Generic;

Script Component Code in C#
public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        Dictionary<int, string> lineageIds = (Dictionary<int, string>)Variables.lineageIds;

        int? colNum = Row.ErrorColumn;
        if (colNum.HasValue && (lineageIds != null))
        {
            if (lineageIds.ContainsKey(colNum.Value))
                Row.ErrorColumnDescription = lineageIds[colNum.Value];

            else
                Row.ErrorColumnDescription = "Row error";
        }
        Row.ErrorDescription = this.ComponentMetaData.GetErrorDescription(Row.ErrorCode);
     }









Hope you find it informative.

Posted by: MR. JOYDEEP DAS

Saturday, 23 December 2017

What is the best value for the Fill Factor

Introduction
Fill Factor is directly related to Index and Index is directly related to Performance. So for performance point of view Fill Factor play a very important role.
Fill Factor determines the percent of each leaf-level page to be filled with data. By default it is 100 percent. That means all most all the space of 8KB leaf-level page is filled with data.
Here in this article, I am not going to discuss about what fill factor is and how it reduces the index fragmentation. 
If you need the information related to fill factor and how it works, you can search on google. You can find a lot of articles are related to it.
Here in this article I am trying to discuss related to the best value that we can assign to fill factor.
As a SQL developer, we always face this type of scenario. The query in the mind is what the value we choose for fill factor. To resolve this query we just move to DBA and asking them. Sometimes they say 70% or 80%. 
But they don’t provide any explanation for it. Why they have chosen this digit as a value of fill factor and I personally don’t understand how they (DBA) decide them without seeing proper page split. It is really a misty for me. Anyway, they are the Boss and they have full authority to manage Database.

Here in this article I am proving my personal opinion to choosing correct fill factor for index. You can try this or go with the DBA’s definition.
There is no calculative method or mathematical formula to find out the correct fill factor. It’s totally depends on implementation and experience with indexing.
To understand it properly, here I am taking some scenario 

Scenario – 1
I have a table on OLAP environment and there is no DELETE or UPDATE operation is going on. Just new records are inserted.
I suggest that there is no need to implement Fill Factor or go with default Fill Factor 100 Percent.
Sometimes the improper fill factor decreases the performance.
Now let us assume that there is Table 1 which contains the data worth of 1000 pages. All these pages are 100% filled. If I run a query to retrieve all the data from the SQL Server, it should retrieve all the 1000 pages. 
If pages are only 50% failed to accommodate Table 1, it will need 2000 pages, which means SQL Server has to read twice the amount of the data from the disk, leading to higher usage of memory, CPU and IO bandwidth.

Scenario -2
I have a table on OLTP environment and INDERT/UPDATE/DELETE operation is huge.
We can find the Fill Factor of an existing table/Index by

SELECT OBJECT_NAME(OBJECT_ID) Name, type_desc, fill_factor
FROM sys.indexes

First don’t set any fill factor and just create the Index. After one or two week observe the average index fragmentation. 
If the nature table is static and there is no value changed, then there is no need to set any fill factor value.
If the table 
matter is less often Updated table, then set the fill factor value at 95%.
If 
the nature table is frequently Updated table, then set the fill factor value at 70 to 90%

Situation for Fill Factor
Fill Factor %
Static Table – The value of the table never changed
100
Tables Updated Less Often
95
Frequently Updated Table
70 to 90


But all is depends on the ratio of index fragmentation. We must observe the regular index fragmentation ratio and decide the fill factor accordingly. Not to decide any arbitrary digit as a value of fill factor.
We can find the index fragmentation ratio by using this SQL statement

SELECT b.name As [Table Name], c.name As [Index Name], avg_fragmentation_in_percent
FROM   sys.dm_db_index_physical_stats(DB_ID('Practice'), NULL, NULL, NULL , NULL)a
       INNER JOIN sys.tables b WITH (nolock) ON a.OBJECT_ID = b.OBJECT_ID
       INNER JOIN sys.indexes c WITH (nolock) ON a.OBJECT_ID = c.OBJECT_ID
                                                  AND a.index_id = c.index_id
WHERE  b.name = 'ORDER_PRICE'
       AND c.name = 'PK_OrderProduct';


Hope it will be informative.





Posted by: MR. JOYDEEP DAS