Sunday 11 August 2013

SSIS Slowly Changing Dimension (SCD) Part- I

Introduction

Slowly Changing Dimension (SCD) as the name suggest that a dimension which is changing slowly. To understand it properly we take an example of Employee Table object in our database.

Object name: Tbl_Employee_Details

Attributes Name
EMPID
FIRST_NAME
MIDDLE_NAME
LAST_NAME
DOB
NATIONALITY
JOB_TITLE
MARITAL_STATUS
EMPLOYEE_GENDER

If we observe this table objects the records of the employee is not change frequently. But we cannot tell that the records never change. It may change occasionally.
Here we are providing some situation when the records can be changes in the employee table.

1.    If the name of employee is wrongly entered in the table (may be the spelling mistake).
2.    The employee gets married so the marital status is going to change.
3.    The Last name of the employee is going to change.
4.    The job title of the employee may be changed after getting promotion.

Some of the attributes of the Employee objects not change like DOB, Employee Gender, and Nationality if we are not going to enter wrong data mistakenly.
Others attribute changes may occurs but may happened after certain period of time.

Type of Changes supported by SCD
SCD supports four types of changes

1.    Changing Attributes
2.    Historical Attributes
3.    Fixed Attributes
4.    Inferred numbers

Type-1 [ Changing Attributes ]
When we are going to overwrite the column existing records. Here we are taking an example to understand the Type-1.

Suppose we have a situation where the first name of the Employee is misspelled and the wrong spelling is stored in the first name of the employee. For making the first name correct we need to add one more records for same employee so that we can overwrite the first name of the employee. This SCD transformation directs these rows to an output named Changing Attributes Updates Output.

EMPID
FIRST_NAME
FIRST_NAME
MIDDLE_NAME
101
SUBOOOOL
CHANDRA
DAS
101
SUBAL
CHANDRA
DAS

Type-2 [ Historical Attributes ]
Here we need to maintain the history of the records, where some particular value changes. Here we take an example to understand the Type-2.

Here the employee gets promoted, so the job title of the employee changes. In this case we need to maintain the history of the employee. For this type of changes there will be multiple records for the same employee with different job title. Then to indentify the current records, we can either add a column as current flag, which will be 'Y' for the current or latest records, Or else we can add two columns as START_DATE and END_DATE, through which we can maintain history of employee's records. This SCD directs these rows to two outputs: Historical Attribute Inserts Output and New Output. 

EMPID
FIRST_NAME
JOB_TITLE
START_DATE
END_DATE
CURRENT
101
SUBAL
GRADE-C
01-01-2010
01-01-2011
N
101
SUBAL
GRADE-B
01-01-2011
01-01-2012
N
101
SUBAL
GRADE-A
01-01-2012
01-01-2099
Y

Type-3 [ Fixed Attributes ]
Here in this example DOB, Employee Gender, and Nationality should never be changed. If any changes occur these columns value then either it should throw exception (error) or changes can be saved in some other destination. These changes not are applied in columns.

This SCD transformation detects changes and directs the rows with changes to an output named Fixed Attribute Output.

Type-4 [ Inferred Number ]
This are the records of the dimension, which are found missing during fact load. If we take an example, say there is a fact table which contains employee and department information. While generating the fact table from employee table and department table , sometimes happens that employee table contains some departments name which has no records in department table, and during fact table generation those records are found missing from department table, these kind of member of dimension department are called inferred member. It's like 'Fact arriving earlier than dimensions'.

This SCD transformation directs these rows to an output named Inferred Member Updates. When data for the inferred member is loaded, we can update the existing record rather than create a new one.

TYPE-1 Changing Attributes Example of SCD

Step-1 [ Creating Tables ]
Here I am creating two table with same structure. The table one is tbl_Employee_Details wich contains the original employee records and tbl_Employee_Details_Change which contains the records need to be changed to the tbl_Employee_Details table objects.

Here we observe that the First name of the Employee is wrongly entered in the table tbl_Employee_Details. So the correct value of the same records is put to the tbl_Employee_Details_Change table.

-- Base Table
IF OBJECT_ID(N'dbo.tbl_Employee_Details', N'U') IS NOT NULL
   BEGIN
      DROP TABLE dbo.tbl_Employee_Details;
   END
GO  
CREATE TABLE dbo.tbl_Employee_Details
             (EMPID                INT,
              FIRST_NAME           VARCHAR(50),
              MIDDLE_NAME          VARCHAR(50),
              LAST_NAME            VARCHAR(50), 
              DOB                  DATETIME,
              NATIONALITY          VARCHAR(50),
              JOB_TITLE            VARCHAR(20),
              MARITAL_STATUS       CHAR(1),
              EMPLOYEE_GENDER      CHAR(1));
             
GO
-- Table for Changing Records
IF OBJECT_ID(N'dbo.tbl_Employee_Details_Change', N'U') IS NOT NULL
   BEGIN
      DROP TABLE dbo.tbl_Employee_Details_Change;
   END
GO  
CREATE TABLE dbo.tbl_Employee_Details_Change
             (EMPID                INT,
              FIRST_NAME           VARCHAR(50),
              MIDDLE_NAME          VARCHAR(50),
              LAST_NAME            VARCHAR(50), 
              DOB                  DATETIME,
              NATIONALITY          VARCHAR(50),
              JOB_TITLE            VARCHAR(20),
              MARITAL_STATUS       CHAR(1),
              EMPLOYEE_GENDER      CHAR(1));
GO           
             

Step-2 [ Inserting Records into Tables ]
-- Inserting Recors in Base Table
INSERT INTO dbo.tbl_Employee_Details
       (EMPID, FIRST_NAME, MIDDLE_NAME, LAST_NAME,    
        DOB, NATIONALITY, JOB_TITLE, MARITAL_STATUS,
         EMPLOYEE_GENDER)
VALUES (101, 'SUBOOOOL', 'CHANDRA', 'DAS',
        '12-07-1945', 'INDIAN', 'GRADE-A', 'M',
        'M'),
        (102, 'JOYDEEP', NULL, 'DAS',
        '12-17-1974', 'INDIAN', 'GRADE-B', 'M',
        'M');
-- Inserting Recors for Change
INSERT INTO dbo.tbl_Employee_Details_Change
       (EMPID, FIRST_NAME, MIDDLE_NAME, LAST_NAME,    
       DOB, NATIONALITY, JOB_TITLE, MARITAL_STATUS,
       EMPLOYEE_GENDER)
VALUES (101, 'SUBAL', 'CHANDRA', 'DAS',
        '12-07-1945', 'INDIAN', 'GRADE-A', 'M',
        'M');

Step - 3 [ SSIS Data Flow  Tasks ]



Step - 4 [ OLE DB Source Editor ]



Step - 5 [ Slowly Changing Dimension Wizard ]





Step – 6 [ OLE DB Command ]
It configured automatically
In SQL Command

UPDATE [dbo].[tbl_Employee_Details_Change]
SET [FIRST_NAME] = ?,[LAST_NAME] = ?,[MIDDLE_NAME] = ?
WHERE [EMPID] = ?



Step – 7 [ OLE DB Destination ]







Step – 8 [ Run the SSIS Package ]



Step – 9 [ Observation ]

SELECT * FROM tbl_Employee_Details;



In my next article I am showing the example of TYPE-2 (Historical Attributes).

Hope you like it.


Posted by: MR. JOYDEEP DAS

3 comments: