Introduction
Column store
index is the unique feature of Microsoft SQL Server starting from Microsoft SQL
2012 and improving with Microsoft SQL Server 2014. Here in this article I am
going to discuss about the architecture of Column store Index.
Please note
that this Column store Index architecture is suitable only for OLAP or BI
environment not in OTLP environment, others the performance down.
Architecture of Column Store
Index
Before going
to the depth of the architecture of column store index in MS SQL 2012, we assume
that the we all well known about the row based index and how it load in the
memory to fulfill the request of a query.
First of all
we look at the figure to understand the column store index properly
Segment Concept:
Each column
in the column store index is it own segment. A Segment contains the value from
one column only. Each and every column in a table that include in the column
store index forms individual segment. Which allows the column data access
individually. Each segment can from by multiple pages.
A segment has
the limitation of 1 millions of rows. So for a big table it is not be possible
to fit single entire columns within a single segment, so single column of a
table may be fit with multiple segments.
Data is
transferred from the disk to memory by segment, not by page. A segment is a
highly compressed Large Object (LOB) that can contain up to one million rows.
The data within each column’s segment matches row-by-row so that the rows can
always be assembled correctly.
In our
example the disk data came from three pages that is Page -1, Page-2 and Page-3.
If we crate column store index in the column named “Name” then the segment is
created from Page-1, Page-2 and Page-3.
Group Concept:
When a column
is not fit in a single segment as the data of columns exceed the one million then
columns takes multiple segments to store data. In such cases, multiple segments
are created for each column and grouped into multiple row groups, one for each
set of segments.
When a column
store index is broken into multiple row groups, each row group contains a set
of complete rows. Please look at
the bellow figure to understand it properly.
Dictionary Concept:
Besides
column segments a column store index consists of another data storage element:
dictionaries. Dictionaries are widely used in columnar storage as a means to
efficiently encode large data types, like strings. The values stores in the
column segments will be just entry numbers in the dictionary, and the actual
values are stored in the dictionary. This technique can yield very good
compression for repeated values, but yields bad results if the values are all
distinct (the required storage actually increases). This is what makes
large columns (strings) with distinct values very poor candidates for column
store indexes. Column store indexes contain separate dictionaries for each
column and string columns contain two types of dictionaries.
Primary
Dictionary
This
is a global dictionary used by all segments of a column.
Secondary
Dictionary
This
is an overflow dictionary for entries that did not fit in the primary
dictionaries. It can be shared by several segments of a column: the
relation between dictionaries and column segments is one-to-many.
Data Compression:
Column
store indexes are based on xVelocity (formerly known as VertiPaq), an advanced
storage and compression technology that originated with Power Pivot and
Analysis Services but has been adapted to SQL Server 2012 databases.
New feature of Column Store
in SQL 2014
- We can create only
one non-clustered column store index on a table although we can include
all the columns of the table in the single index.
- SQL Server 2014
enhances it to add support for creating Clustered Column store Index.
- When we create a
column store index and it makes the table read only.
- With SQL Server
2014, we can create a column store index without having much impact on
write-ability on the table. This means we can issue some INSERT, UPDATE,
DELETE statements with a table with clustered column store index. No more
tedious workaround is required for writing data to a table with column
store index in this release like the previous release.
Hope
you like it.
Posted by: MR. JOYDEEP DAS