Sunday, 22 April 2012

Database Compression – Data Compression Part-1



Database Compression – Data Compression Part-1

In this article I am trying to illustrate the feature of Data Compression in SQL Server 2008. Here I am trying to explain how the Data compression occurs.
The real facts we faced that a database must grow. When the data is growing so fast it is difficult for a DBA to maintain the database in terms of performance and other thing. Thinking about a situation when a query of a table contains more than 10 million of rows or you are going to take the backup of a database with 5 TB size.
While the facts is that as DBA we can't stop the growth of SQL server database but SQL server 2008 and letter gives us some tools to help us better deal with all this data, and that is "compression".

SQL Server 2008 supports the Compression to reduce the size of the MDF and backup compressions can help us reduce the amount of space our backup takes.   
Please note that for Compression we can also look at "Red Gate SQL Storage Compress" but it is not under the scope of this article.

The Compression not only reduces physical size, it reduce disk I/O, which can enhance the performance of database.   

When we are thinks about compression we must think it in two ways.
1.    The data compressions which includes row-level and Page-level compression, that occurs with MDF files of our database.
2.    The Backup compression, which occurs only when the data is backed up.

Data compressions come into two forms:

Row-Level data Compression

It turning fixed length data type into variable length data type and freeing up empty space. It also has the ability to ignore zero and NULL values and saving additional space. It tries more rows can fit in a single data page.

Page-level Data Compression

It starts with row-level data compressions and then add two additional compression feature named Prefix and Dictionary compression.

The backup compression comes with single form:

Backup Compression

It does not use the row-level or page level data compression. Instead backup compression occurs at the time of backup and it's use its own proprietary compression technique.


The data compression can be used only for those database objects mentioned bellow.

1.    A Table Stored as Heap
2.    A Table stored as a clustered Index
3.    A Non-Clustered index
4.    An Indexed View
5.    Partitioned Table and Index

How the Row level Data compression works

Reducing the Amount of metadata used to store rows.

Storing fixed length numeric data type as if they were variable length data type. For example, if you store the value 1 in a BIGINT data type, storage will only take 1 byte, not 8 bytes, which the BIGINT data types normally takes.

 Storing CHAR data types as variable-length data types. For example, if you have a CHAR (100) data type, and only store 10 characters in it, blank characters are not stored, and thus reducing the space needed to the store data.

Not storing NULL or 0 values.

How the Page level Compression Works

It starts out by using row-level data compression to get as many rows as it can on a single page.

Next, prefix compression is run. Essentially, repeating patterns of data at the beginning of the values of a given column are removed and substituted with an abbreviated reference that is stored in the compression information (CI) structure that immediately follows the page header of a data page.

And last, dictionary compression is used. Dictionary compression searches for repeated values anywhere on a page and stores them in the CI. One of the major differences between prefix and dictionary compression is that prefix compression is restricted to one column, while dictionary compression works anywhere on a data page.


My next article I am going to explore the step by step procedure of data compressions.
Hope you like it.



Posted by: MR. JOYDEEP DAS

No comments:

Post a Comment