Microsoft SQL Server Knowledge Bank: Related to Clustered Index

When we think about the performance factors, the Index plays the main role. In this article I am trying to discuss some points related to CLUSTERD INDEX.

The main tropics, that I am trying to highlight is mentioned bellow.

1. Cluster index Structure

2. How the Cluster index Works

3. Efficient Cluster Index

Clustered Index Structure

In order to understand the design principles of a good clustered index, we need to discuss how SQL Server stores clustered indexes. All table data is stored in 8 KB data pages. When a table contains a clustered index, the clustered index tells SQL Server how to order the table's data pages

In the SQL Server the index are organised as B-Tree structure.

1. Each pages of the index B-Tree are called index node.

2. The top node of the B-Tree is called the root node.

3. The bottom level of nodes in the index is called the leaf node.

4. Any level between the root and the leaf node are collectively known as intermediate level.

In a clustered index, the leaf nodes contain the data pages of the underlying table. The root and intermediate level nodes contain index pages holding index rows. Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index. The pages in each level of the index are linked in a doubly-linked list.

The leaf level of the B-tree is always Level- 0 and the root level is always the highest level. In this figure there is only one intermediate Level, but in a large index will often have more than one intermediate Level, and a small index might not have an intermediate Level.

The Index pages in the root and intermediate levels contain clustering key and a page pointer down into the Next level of the B-Tree. This process will be repeated until the leaf node is reached. The leaf node of the clustered index contains the actual data in an ordered fashion based on the clustering key.

How the Clustered Index Works

To understand it, please look at the bellow figure of B-Tree structure.

It contains the 1 million records of clustered key Student ID. In this figure the Level-1 and 2 is the Index page. In Level – 1 each page contains 50000 records and clustered index values plus a pointer down into the associated page on the next level.

To understand it properly I am going to take an example.

We want to retrieve the details of student ID 500. For this SQL server has to read the three pages. The root page in Level 2, the intermediate page in Level 1, and the appropriate leaf level page in Level 0. The root level tells the SQL Server to which intermediate level to read and intermediate levels tells that which leaf level page to read to retrieve the details.

Effective clustered index

The prime characteristics of a clustered index are mentioned bellow.

Narrow
Unique
Static
Ever-increasing

I am trying to discuss it in little more details.

Narrow

The width of the Index refers to the number of bytes in the index key. Let's take an example to understand it.

CREATE TABLE student_dtl

(

studentID INT IDENTITY(1,1),

studentRefKEY UNIQUEIDENTIFIER,

studentName VARCHAR(50)

)

The table has the 15 melions of rows and only two columns are candidates for use as clustering key.

1. The "studentID" has the INT data type used the 4 bytes.

2. The "studentRefKEY" has the UNIQUEIDENTIFIER data type used the 16 bytes.

INSERT INTO student_dtl

(studentRefKEY, studentName)

VALUES (newid(), 'Joydeep Das')

SELECT datalength(studentID) AS studentID,

datalength(empRefKEY) AS

FROM emp_dtl

-- Result Output

studentID studentRefKEY

----- ---------

4 16

So which one is good for clustered index. If we see the B-Tree structure of both, in case of UNIQUEIDENTIFIER the number of intermediate level is greter then the INT. The facts is the UNIQUEIDENTIFIER takes the much more space than INT data type. So when the clustered index is created few rows can be packed into each index page, and the clustered key requires an additional non-leaf level to store the Key.

Unique

First of all we have to know that the SQL Server douesnot require a cluster index to be UNIQUE.

A clustered index must be unique, If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a uniqueifier. This four-byte value is not visible to users.

Static

We want to choose a clustering key that will never be updated. SQL Server must ensure that data exists in a logical order based upon the clustering key. Therefore, when the clustering key value is updated, the data may need to be moved elsewhere in the clustered index so that the clustering order is maintained.

There are other considerations that can affect how many pages are updated, such as whether an update to a variable-length column causes the row to exceed the amount of available space. In such a case, the data would still need to be moved, although only the data page of the clustered index is affected.

So, the updating the clustering key is more expensive. Therefore it is a best practice to avoid clustering on columns that are frequently updated.

Ever-increasing

An integer identity column is an excellent example of an ever-increasing column. The identity property continuously increments by the value defined at creation, which is typically one. This allows SQL Server, as new rows are inserted, to keep writing to the same page until the page is full, then repeating with a newly allocated page.

SQL Server can much more efficiently write data if it knows the row will always be added to the most recently allocated, or last, page

Reduction in clustered index fragmentation, this fragmentation results from data modifications and can take the form of gaps in data pages, so wasting space, and a logical ordering of the data that no longer matches the physical ordering.

Hope you like it.

Posted by: MR. JOYDEEP DAS

Friday, 6 July 2012

Related to Clustered Index

6 comments: