Wednesday, 18 March 2015

Column Store Index in MS SQL 2012 / 2014

Introduction

Column store index is the unique feature of Microsoft SQL Server starting from Microsoft SQL 2012 and improving with Microsoft SQL Server 2014. Here in this article I am going to discuss about the architecture of Column store Index.
Please note that this Column store Index architecture is suitable only for OLAP or BI environment not in OTLP environment, others the performance down.

Architecture of Column Store Index
Before going to the depth of the architecture of column store index in MS SQL 2012, we assume that the we all well known about the row based index and how it load in the memory to fulfill the request of a query.

First of all we look at the figure to understand the column store index properly



Segment Concept:
Each column in the column store index is it own segment. A Segment contains the value from one column only. Each and every column in a table that include in the column store index forms individual segment. Which allows the column data access individually. Each segment can from by multiple pages.

A segment has the limitation of 1 millions of rows. So for a big table it is not be possible to fit single entire columns within a single segment, so single column of a table may be fit with multiple segments.

Data is transferred from the disk to memory by segment, not by page. A segment is a highly compressed Large Object (LOB) that can contain up to one million rows. The data within each column’s segment matches row-by-row so that the rows can always be assembled correctly.

In our example the disk data came from three pages that is Page -1, Page-2 and Page-3. If we crate column store index in the column named “Name” then the segment is created from Page-1, Page-2 and Page-3.

Group Concept:
When a column is not fit in a single segment as the data of columns exceed the one million then columns takes multiple segments to store data. In such cases, multiple segments are created for each column and grouped into multiple row groups, one for each set of segments.

When a column store index is broken into multiple row groups, each row group contains a set of complete rows.  Please look at the bellow figure to understand it properly.



Dictionary Concept:
Besides column segments a column store index consists of another data storage element: dictionaries. Dictionaries are widely used in columnar storage as a means to efficiently encode large data types, like strings. The values stores in the column segments will be just entry numbers in the dictionary, and the actual values are stored in the dictionary. This technique can yield very good compression for repeated values, but yields bad results if the values are all distinct (the required storage actually increases). This is what makes large columns (strings) with distinct values very poor candidates for column store indexes. Column store indexes contain separate dictionaries for each column and string columns contain two types of dictionaries.

Primary Dictionary
This is a global dictionary used by all segments of a column.
Secondary Dictionary
This is an overflow dictionary for entries that did not fit in the primary dictionaries. It can be shared by several segments of a column: the relation between dictionaries and column segments is one-to-many.

Data Compression:
Column store indexes are based on xVelocity (formerly known as VertiPaq), an advanced storage and compression technology that originated with Power Pivot and Analysis Services but has been adapted to SQL Server 2012 databases.
New feature of Column Store in SQL 2014
  • We can create only one non-clustered column store index on a table although we can include all the columns of the table in the single index.

  • SQL Server 2014 enhances it to add support for creating Clustered Column store Index.
  • When we create a column store index and it makes the table read only.

  • With SQL Server 2014, we can create a column store index without having much impact on write-ability on the table. This means we can issue some INSERT, UPDATE, DELETE statements with a table with clustered column store index. No more tedious workaround is required for writing data to a table with column store index in this release like the previous release.




Hope you like it.





Posted by: MR. JOYDEEP DAS

Saturday, 14 March 2015

In Memory OLT P (Memory Optimizer Table) In SQL 2014

Introduction

In SQL 2014 comes with some amazing feature that we must discuss in my blog post. One of the interesting features is In Memory OLTP (Memory Optimizer Table).
In this article we are trying to learn about it. Hope it will be interesting and informative.

What is it?
Microsoft is releasing SQL Server 2014 with a new set of In-Memory OLTP features to significantly improve the OLTP performance and reduce the processing time for servers with a large amount of memory and multi-core processors.

In Memory OLTP is a new technology introduced by Microsoft to reduce the work load of OLTP and provide the high improvement of processing. The performance gains also depends on the hardware we have and how we writing our query. By this technology we are taking the full advantage of modern hardware trends and modern application requirement.
SQL Server is generally designed for saving data in the disk and loading the data into the memory to serve the request made by the Query. This done for resource management as it is quite expensive.

Nowadays the cost of the hardware is quite low and we can manage a server with huge REM and multi core and the new feature of Microsoft called In Memory OLTP works on it.
The In-memory OLTP features are a new database engine component, which is fully integrated into SQL Server and runs side by side with the traditional database engine. It allows us to declare a table to be stored in main memory so that your OLTP workload can access this memory resident data faster.

In the memory optimized table all the data is stored in the memory not in Disk.
Type of Memory Optimize table

There are two types of memory optimize table

1.    SCHEMA_AND_DATA
2.    SCHEMA_ONLY

SCHEMA_AND_DATA :
The SCHEMA_AND_DATA Memory-Optimized table is a table that resides in memory where the data is available after a server crash, a shutdown or a restart of SQL Server

Usages: A point of sales transaction data might be a good fit for a SCHEMA_AND_DATA type table.  We might want our point of sales transactions to run as fast as possible so the memory-optimize type table would provide this, but we wouldn’t want to lose those transactions should our server be restarted.

SCHEMA_ONLY
 SCHEMA_ONLY Memory-Optimized table is a table that does not persist data should SQL Server crash, or the instance is stopped or restarted.  The SCHEMA_ONLY Memory-Optimized tables do retain their table structure should the server crash, or be shutdown.

Usages: SCHEMA_ONLY table would be useful for a staging table in a data warehouse application.   Typically it is fairly easily to reload a data warehouse staging table from its data source. This is why making these type of tables a SCHEMA_ONLY type table is relatively safe.

Maintaining Version in Rows
Rows in memory-optimized tables are versioned. This means that each row in the table potentially has multiple versions. All row versions are maintained in the same table data structure. Row versioning is used to allow concurrent reads and writes on the same row.



.
How we Crate it?
To create Memory-Optimized table we are just following our step by step process to Understand it clearly.

Step-1 [ Create Database to Support Memory Optimize Table ]

First we have to create a database to support Memory-Optimized Table. If needed, we can alter our existing database also.

 IF EXISTS (SELECT *
           FROM   sys.databases
           WHERE name = N'InMemoryExample'
)
  DROP DATABASE InMemoryExample;
GO

CREATE DATABASE InMemoryExample
ON PRIMARY
  (NAME = InMemory_Data,
   FILENAME = N'C:\data\InMemoryExample_Data.mdf',
   SIZE = 100MB,
   FILEGROWTH = 10MB),
FILEGROUP InMemoryExample_InMemory CONTAINS MEMORY_OPTIMIZED_DATA
  ( NAME = InMemory_InMemory,
    FILENAME = N'C:\data\InMemoryExample_InMemory.mdf')

LOG ON
  ( NAME = InMemoryExample_Log,
    FILENAME = N'C:\data\InMemoryExample_Log.ldf',
    SIZE = 100MB,
    FILEGROWTH = 10MB)

GO

The script look likes same. Please review the script carefully, the only differences that we find is

FILEGROUP InMemoryExample_InMemory CONTAINS MEMORY_OPTIMIZED_DATA
  ( NAME = InMemory_InMemory,
    FILENAME = N'C:\data\InMemoryExample_InMemory.mdf')

We created a FILEGROUP named “InMemoryExample_InMemory” that will be used to support our Memory-Optimized tables.  This file group contains a single file.  Without this “MEMORY_OPTIMIZED_DATA” file group I wouldn’t be able to create a Memory-Optimized table in our new database.

Step-2 [ Creating Memory Optimized Table ]

Here we are going to create two types of memory optimized table.

SCHEMA_AND_DATA table:

IF  OBJECT_ID('MemoryOptimized_Schema_And_Data','U') IS NOT NULL
    DROP TABLE MemoryOptimized_Schema_And_Data
GO

CREATE TABLE MemoryOptimized_Schema_And_Data
(
       Id    INT        NOT NULL,
       Col1  CHAR(1000) NOT NULL,
       CONSTRAINT PK_MemoryOptimized_Schema_And_Data
        PRIMARY KEY NONCLUSTERED HASH (Id)
        WITH (BUCKET_COUNT = 1024)
) WITH (MEMORY_OPTIMIZED = ON,
        DURABILITY = SCHEMA_AND_DATA);

SCHEMA_ONLY Table:

IF  OBJECT_ID('MemoryOptimized_Schema_Only','U') IS NOT NULL
    DROP TABLE MemoryOptimized_Schema_Only
GO

CREATE TABLE MemoryOptimized_Schema_Only
(
       Id     INT        NOT NULL,
       Col1   CHAR(1000) NOT NULL,
       CONSTRAINT PK_MemoryOptimized_Schema_Only
        PRIMARY KEY NONCLUSTERED HASH (Id)
        WITH (BUCKET_COUNT = 1024)
) WITH (MEMORY_OPTIMIZED = ON,
        DURABILITY = SCHEMA_ONLY);

Now we have to understand it.
Please look at the part of the script carefully.

CONSTRAINT PK_MemoryOptimized_Schema_Only
        PRIMARY KEY NONCLUSTERED HASH (Id)
        WITH (BUCKET_COUNT = 1024)

Here we are creating NON CLAUSTERED HASH Index on columns ID. The memory optimized table needs the HASH index It cannot be more than the 8. With CTP1 only columns with Windows BIN2 collation types can be used for a HASH index.  Therefore on our table we are just create a HASH index on the INT column and not the char column.
Second thing is BUCKET_COUNT. The BUCKET_COUNT identifies the number of different buckets SQL Server will create in memory to store my Memory-Optimized table records.  Each bucket is identified by the value created when hashing the index column.  Each unique index key value that has the same hash value will be stored in the same bucket.  Therefore it is recommended that we create a bucket value that is equal to or greater than the number of unique key values we expect for your Memory-Optimized table.

Memory-Optimized tables only support the following data types: bit, tinyint, smallint, int, bigint, money, smallmoney, float, real, datetime, smalldatetime, datetime2, date, time, numberic, decimal, char(n),  varchar(n) ,nchar(n),  nvarchar(n), sysname, binary(n), varbinary(n), and Uniqueidentifier. Notice that none of the large Binary Object data types are allowed, even the variable character “max” data types.  Something worth also mentioning is the combined record length must not exceed 8060.  This record length limitation will be enforced at the time we create our table.

Step-3 [ Inserting data into Memory-Optimized Table ]

SET NOCOUNT ON;

USE InMemoryExample;
GO

DELETE FROM MemoryOptimized_Schema_And_Data;
DELETE FROM MemoryOptimized_Schema_Only;

SET STATISTICS IO Off;
SET STATISTICS TIME Off;

DECLARE @s datetime = getdate()

-- Load Normal Table
DECLARE @I int = 0;
WHILE @I < 1000
BEGIN
       SET @I+=1;
       INSERT INTO Normal(Id,C1)
       VALUES (@i,cast(@I as varchar(4)) + 'A');
END;

SELECT DATEDIFF(ms,@s,getdate()) as Normal;

-- Load SchemaAnadData table --
SET @s = getdate();
SET @I = 0;
WHILE @I < 1000
BEGIN
       SET @I+=1;
       INSERT INTO MemoryOptimized_Schema_And_Data
               (Id, Col1)
        VALUES (@i,cast(@I as varchar(4)) + 'A');
END;

SELECT DATEDIFF(ms,@s,getdate()) as SchemaAndData;

-- Load SchemaOnly table
SET @s = getdate();
SET @I = 0;
WHILE @I < 1000
BEGIN
       SET @I+=1;
       INSERT INTO MemoryOptimized_Schema_Only
               (Id, Col1)
        VALUES (@i,cast(@I as varchar(4)) + 'A');
END;
SELECT DATEDIFF(ms,@s,getdate()) As SchemaOnly;
GO

Some limitation
  • NO TRUNCATE TABLE statement against my Memory-Optimized tables.
  • NO ALTER TABLE statement against my Memory-Optimized tables.
  • NO UPDATE of primary key columns of my Memory-Optimized tables.
  • NO FOREIGN KEY or CHECK constraints.
  • NO UNIQUE constraints other than the PRIMARY KEY.




Hope you like it.


Posted by: MR. JOYDEEP DAS

Sunday, 8 March 2015

Recursive Trigger Fire

Introduction
We never design database like that the recursive trigger fire. But what happens if recursive trigger happens? We have to understand it clearly and solve such kind of problem. This article is related to it. Hope it will be informative.

First We understand it?
We have two tables named Test_1 and Test_2. The Test_1 have trigger named trg_Test_1 which insert data to table Test_2 and Test_2 table have trigger named trg_Test_2 which Insert data to Test_1 Table.



Example of Recursive Trigger

Step-1 [ Creating Base Table ]

CREATE TABLE test_1
   (
     ID    INT,
     MARKS INT
   ) 
GO
CREATE TABLE test_2
   (
     ID    INT,
     MARKS INT
   )
GO

Step-2 [ Creating Trigger ]

CREATE TRIGGER trg_test_1 ON test_1
AFTER INSERT
AS
BEGIN
    INSERT INTO test_2
    SELECT * FROM Inserted;
END
GO
CREATE TRIGGER trg_test_2 ON test_2
AFTER INSERT
AS
BEGIN
    INSERT INTO test_1
    SELECT * FROM Inserted;
END

Step-3 [ Insert Value in First Table so that Trigger can Fire ]

INSERT INTO  test_1
VALUES(1, 20);

Msg 217, Level 16, State 1, Procedure trg_test_2, Line 5
Maximum stored procedure, function, trigger, or view nesting level exceeded (limit 32).

If we look at the table we find no records is affected

SELECT * FROM   test_1
SELECT * FROM   test_2

Step-4 [ Now we Rectified the Trigger ]

DROP TRIGGER trg_test_1
GO
CREATE TRIGGER trg_test_1 ON test_1
AFTER INSERT
AS
BEGIN
    IF trigger_nestlevel() < 2
       BEGIN
         INSERT INTO test_2
         SELECT * FROM Inserted;
       END  
END
GO
DROP TRIGGER trg_test_2
GO
CREATE TRIGGER trg_test_2 ON test_2
AFTER INSERT
AS
BEGIN
    IF trigger_nestlevel() < 2
       BEGIN
             INSERT INTO test_1
             SELECT * FROM Inserted;
       END  
END  

Step-5 [Observation ]

INSERT INTO  test_1
VALUES(1, 20);

SELECT * FROM   test_1
SELECT * FROM   test_2

ID          MARKS
----------- -----------
1           20

(1 row(s) affected)

ID          MARKS
----------- -----------
1           20

(1 row(s) affected)

Hope you like it.




Posted by: MR. JOYDEEP DAS

Saturday, 7 March 2015

Index on Computed Columns ?


Introduction

The Question that I always find in the bog post or in community post is
“Can we create the Index on Computed Columns”

The answer is not so easy. To explain it properly let’s try an Example. Hope you find it informative

Simple Test to Understand the Index in Computed Columns

Step-1
Create a Function with SCHEMA Binding
IF OBJECT_ID(N'dbo.func_TOTMARKSWITHPRACTICAL', N'FN')IS NOT NULL
   BEGIN
      DROP FUNCTION  [dbo].[func_TOTMARKSWITHPRACTICAL];
   END
GO
CREATE FUNCTION [dbo].[func_TOTMARKSWITHPRACTICAL]
     (
        @p_MARKS    INT
     ) 
RETURNS INT     
WITH SCHEMABINDING       
AS    
BEGIN
    DECLARE @v_TOTALMARKS INT;
   
    SET @v_TOTALMARKS = @p_MARKS + 50;
   
    RETURN @v_TOTALMARKS;
END
GO

Step-2
Create the Base Table to Use SCHEMA Binding Function and Insert Records
IF OBJECT_ID(N'dbo.tbl_STUDENTDTLS', N'U')IS NOT NULL
   BEGIN
      DROP TABLE [dbo].[tbl_STUDENTDTLS];
   END
GO
CREATE TABLE [dbo].[tbl_STUDENTDTLS]
       (
         STDID                               INT                    NOT NULL PRIMARY KEY,
         STDNAME                       VARCHAR(50) NOT NULL,
         STDMARKS                     INT                    NOT NULL,
         STDTOTALMARKS       AS [dbo].[func_TOTMARKSWITHPRACTICAL](STDMARKS)
       );   
      
GO
INSERT INTO [dbo].[tbl_STUDENTDTLS]           
       (STDID, STDNAME, STDMARKS)
VALUES (101, 'Joydeep Das', 100),
               (102, 'Anirudha Dey', 150);
      
GO

Step-3
Check the IsIndexTable Property of Computed Columns
SELECT  (SELECT CASE COLUMNPROPERTY( OBJECT_ID('dbo.tbl_STUDENTDTLS'),
          'STDTOTALMARKS','IsIndexable')
                WHEN 0 THEN 'No'
                WHEN 1 THEN 'Yes'
         END) AS 'STDTOTALMARKS is Indexable ?'


STDTOTALMARKS is Indexable ?
----------------------------
Yes

Step-4
Check the IsDeterministic Property of Computed Columns
SELECT  (SELECT CASE COLUMNPROPERTY( OBJECT_ID('dbo.tbl_STUDENTDTLS'),
                                               'STDTOTALMARKS','IsDeterministic')
                 WHEN 0 THEN 'No'
                WHEN 1 THEN 'Yes'
         END) AS 'STDTOTALMARKS is IsDeterministic?'

STDTOTALMARKS is IsDeterministic?
---------------------------------
Yes

Step-5
Check the USERDATTACCESS Property of Computed Columns
SELECT  (SELECT CASE COLUMNPROPERTY( OBJECT_ID('dbo.tbl_STUDENTDTLS'),
                                               'STDTOTALMARKS','USERDATAACCESS')
                WHEN 0 THEN 'No'
                WHEN 1 THEN 'Yes'
         END) AS 'STDTOTALMARKS is USERDATAACCESS?'

STDTOTALMARKS is USERDATAACCESS?
--------------------------------
No

Step-6
Check the IsSystemVerified Property of Computed Columns
SELECT  (SELECT CASE COLUMNPROPERTY( OBJECT_ID('dbo.tbl_STUDENTDTLS'),
                                               'STDTOTALMARKS','IsSystemVerified')
                WHEN 0 THEN 'No'
                WHEN 1 THEN 'Yes'
         END) AS 'STDTOTALMARKS is IsSystemVerified?'

STDTOTALMARKS is IsSystemVerified?
----------------------------------
Yes

Step-7
Analyzing All Property output of Computed Columns
Property Name
Output
IsIndexable
Yes
IsDeterministic
Yes
USERDATAACCESS
No
IsSystemVerified
Yes

Step-8
So we can Crete Index on Computed Columns in this Situation
CREATE NONCLUSTERED INDEX IX_NON_tbl_STUDENTDTLS_STDTOTALMARKS
ON [dbo].[tbl_STUDENTDTLS](STDTOTALMARKS);

Step-9
Now Check the Same thing with Function Without Schema Binding
IF OBJECT_ID(N'dbo.func_TOTMARKSWITHPRACTICAL', N'FN')IS NOT NULL
   BEGIN
      DROP FUNCTION  [dbo].[func_TOTMARKSWITHPRACTICAL];
   END
GO
CREATE FUNCTION [dbo].[func_TOTMARKSWITHPRACTICAL]
     (
        @p_MARKS    INT
     ) 
RETURNS INT     
AS    
BEGIN
    DECLARE @v_TOTALMARKS INT;
   
    SET @v_TOTALMARKS = @p_MARKS + 50;
   
    RETURN @v_TOTALMARKS;
END
GO

Step-10
Now analyze the same property again
Property Name
Output
IsIndexable
No
IsDeterministic
No
USERDATAACCESS
Yes
IsSystemVerified
No


Step-11
In this scenario we are unable to Create index on Computed Columns
CREATE NONCLUSTERED INDEX IX_NON_tbl_STUDENTDTLS_STDTOTALMARKS
ON [dbo].[tbl_STUDENTDTLS](STDTOTALMARKS);

Error:

Msg 2729, Level 16, State 1, Line 1
Column 'STDTOTALMARKS' in table 'dbo.tbl_STUDENTDTLS'
cannot be used in an index or statistics or as a partition key
because it is non-deterministic.



Hope you like it.


Posted by: MR. JOYDEEP DAS