Introduction
When we think about
the performance factors, the Index plays the main role. Here we are trying to
discuss some points related to
- Clustered Index
- Index Fragmentation
- Index Defragmentation
- Index and Disk Space
- Index and Statistics
Clustered Index
Structure
In order to understand
the design principles of a good clustered index, we need to discuss how SQL
Server stores clustered indexes. All table data is stored in 8 KB data pages.
When a table contains a clustered index, the clustered index tells SQL Server
how to order the table's data pages
In the SQL Server the
index are organized as B-Tree structure.
o
Each
pages of the index B-Tree are called index node.
o
The
top node of the B-Tree is called the root node.
o
The
bottom level of nodes in the index is called the leaf node.
o
Any
level between the root and the leaf node are collectively known as intermediate
level.
In a clustered index, the leaf nodes contain the
data pages of the underlying table. The root and intermediate
level nodes contain index pages holding index rows. Each index row contains a
key value and a pointer to either an intermediate level page in the B-tree, or
a data row in the leaf level of the index. The pages at each level of the index are linked in a doubly-linked list.
The leaf level of the
B-tree is always Level- 0 and the root level has been always the highest level. In this figure there is only one
intermediate Level, but in a large index will often have more than one
intermediate Level, and a small index might not have an intermediate Level.
The Index pages in the
root and intermediate levels contain a clustering key and a page pointing down into the Next
level of the B-Tree. This process will be repeated until the leaf node is
reached. The leaf node of the clustered index contains the actual data in an
ordered fashion based on the clustering key.
How the Clustered
Index Works
To understand it,
please look at the bellow figure of B-Tree structure.
It contains the 1
million records of clustered key Student ID. In this figure the Level-1 and 2
is the Index page. In Level – 1 each page contains 50000 records and clustered
index values plus a pointed down into the
associated page on the next level.
To understand it
properly I am going to take an example.
We want to retrieve the details of student ID
500. For this SQL server has to read the three pages. The root page in Level 2,
the intermediate page in Level 1, and the appropriate leaf level page in Level
0. The root level tells the SQL Server to which intermediate level to read and
intermediate levels tells that which leaf level page to read to retrieve the
details.
Effective clustered
index
The prime
characteristics of a clustered index are mentioned bellow.
·
Narrow
·
Unique
·
Static
·
Ever-increasing
We are trying to
discuss it in little more details.
Narrow
The width of the Index
refers to the number of bytes in the index key. Let's take an example to
understand it.
CREATE TABLE student_dtl
(
studentID INT IDENTITY(1,1),
studentRefKEY UNIQUEIDENTIFIER,
studentName VARCHAR(50)
)
The table has the 15 millions
of rows and only two columns are candidates for use as clustering key.
o The "studentID" has the INT data type used the 4
bytes.
o The "studentRefKEY" has the UNIQUEIDENTIFIER data type
used the 16 bytes.
INSERT INTO student_dtl
(studentRefKEY, studentName)
VALUES (newid(), 'Joydeep Das')
GO
SELECT datalength(studentID) As studentID,
datalength(empRefKEY) As empRefKEY
FROM emp_dtl
-- Result Output
studentID
studentRefKEY
-----
---------
4
16
Which one is good for a
clustered index?
If we see the B-Tree
structure of both, in case of UNIQUEIDENTIFIER the number of intermediate levels is greeter then the INT. The facts are the
UNIQUEIDENTIFIER takes the much more spacious than INT data type. When the clustered index is created few rows
can be packed into each index page, and the clustered key requires an
additional non-leaf level to store the Key.
Unique
First of all we have
to know that the SQL Server doesn’t require a clustered index to be UNIQUE.
A clustered index must be unique, if the clustered index is not a unique index; SQL Server makes any duplicate keys, unique by adding an internally generated value called an uniqueifier. This four-byte value is not visible to users.
A clustered index must be unique, if the clustered index is not a unique index; SQL Server makes any duplicate keys, unique by adding an internally generated value called an uniqueifier. This four-byte value is not visible to users.
Static
We want to choose a
clustering key that will never be updated. SQL Server must ensure that data
exists in a logical order based upon the clustering key. Therefore, when the
clustering key value is updated, the data may need to be moved elsewhere in the
clustered index so that the clustering order is maintained.
There are other
considerations that can affect how many pages are updated, such as whether an
update to a variable-length column causes the row to exceed the amount of
available space. In such a case, the data would still need to be moved,
although only the data page of the clustered index is affected.
So, the updating the
clustering key is more expensive. Therefore, it is a best practice to avoid clustering on columns that are
frequently updated.
Ever-increasing
An
integer identity column is an excellent example of an ever-increasing
column. The identity property continuously increments by the value
defined at creation, which is typically one. This allows SQL Server, as new
rows are inserted, to keep writing to the same page until the page is full,
then repeating with a newly allocated page.
SQL Server can much more efficiently write
data if it knows the row will always be added to the most recently allocated,
or last, page
Reduction in clustered index fragmentation,
this fragmentation results from data modifications and can take the form of
gaps in data pages, so wasting space, and a logical ordering of the data that
no longer matches the physical ordering.
Index Fragmentation
Index fragmentation is
a phenomenon where index contents are no longer stored continuously in the
storage. When index contents become scattered in the storage, fragmented,
performance on the index will degrade.
There are 2 types of
Index fragmentation
- Extent or External fragmentation.
- Page or Internal fragmentation.
The Index fragmentation
can occur in both Clustered and Non-Clustered Index. To understand the
External and Page fragmentation I am taking an example.
Un-Fragmented Index
In the above figure
all the pages are full and the physical order of the pages is sequential.
As a result of data
modifications, in the cases of extent fragmentation the pages get out of the
physical order. This type of fragmentation produces random IO, which does
not perform as well as sequential IO.
Extent Fragmentation
The bellow figure
demonstrates the extent fragmentation.
In the internal
fragmentation there are gaps in the data pages, which reduce the amount of data
that can be stored on each page. So it increases the overall amount of space
needed to store the data. When the pages are not full and additional pages are
required to store the data, as more IO will be required to retrieve those
additional pages.
Page Fragmentation
During a page split, a
new page is allocated, and half the records are moved from the old page to the
newly-allocated page. Each page has a pointer to the previous and next page in
the index, so those pages will also need to be updated. The bellows figure demonstrates
the process.
Page splitting due to
fragmentation
Analyzing Fragmentation
To analyze SQL Server
indexes, we use the system function. The function named dm_db_index_physical_statsto determine which indexes are fragmented and the extent of that
fragmentation. We used this function to analyzing all the index of the database,
or all the index of the table or a specified index.
The function dm_db_index_physical_statsto takes the following parameters described below.
Database ID
This is a small integer
value that represents the ID number of a database. If null is specified, the
function retrieves index-related data from all databases on a SQL Server
instance.
Object ID
This is an integer
value that represents the ID number of a table or view. If null is specified,
the function retrieves index-related data for all tables and views in a
specific database or a SQL Server instance. If you specify null,
you must also specify null for the index ID and partition number.
Index ID
This is an integer
value that represents the ID number of an index. If null is specified, the
function retrieves index-related data for all indexes defined on the specified
table or view. If you specify null, you must also specify null for the
partition number. Also, if the object ID refers to a heap, use 0 as the index
ID.
Partition number
This is an integer
value that represents the partition number of an index or heap. If null is
specified, the function retrieves index-related information for all partitions
related to a specific object.
Mode
The scan level used to
obtain index-related information. Valid inputs include NULL, DEFAULT, or one of
the following three modes:
LIMITED
Scans the smallest number of pages, which means this is the
fastest mode. The LIMITED mode is equivalent to NULL and DEFAULT.
SAMPLED
Scans 1% of all pages. If an index contains fewer than 10,000
pages, then DETAILED mode is used.
DETAILED
Scans all index pages, which means this is the slowest mode, but
most accurate.
The sys.dm_db_index_physical_stats function
returns a number of values that provide details about the indexes we specify.
SQL Server Books Online provides details about each of these values.
Along with other
information, there are two important columns that for detecting fragmentation,
which are as follows:
avg_fragmentation_in_percent
This is a percentage
value that represents external fragmentation. For a clustered table and leaf
level of index pages, this is Logical fragmentation, while for heap, this is
Extent fragmentation. The lower this value, the better it is. If this value is
higher than 10%, some corrective action should be taken.
avg_page_space_used_in_percent
This is an average
percentage use of pages that represents to internal fragmentation. Higher the
value, the better it is. If this value is lower than 75%, some corrective
action should be taken.
Here is the quick
script to find fragmentation of the database for all the indexes.
SELECT ps.database_id,
ps.OBJECT_ID,
ps.index_id,
b.name,
ps.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL,NULL) ASps
INNER JOIN sys.indexes AS b ON ps.OBJECT_ID = b.OBJECT_ID
AND ps.index_id = b.index_id
WHERE ps.database_id = DB_ID()
ORDER BY ps.OBJECT_ID
Index Defragmentation
Fragmentation of the
Index can be de-fragmented by two ways; it depends on size of the table and the
level of the defragmentation.
o
Reorganize Index
o
Rebuild Index
Reorganize Index
It does not take much
more system resources. That means it can be done when the user access the table
objects where the index exists. This process reorganizes the leaf nodes of the
index physically to match it with logical order. If the physical order matches
the logical order of the index it increases the performance.
To reorganize one or
more indexes, use the ALTER INDEX statement with the REORGANIZE clause. This
statement replaces the DBCC INDEXDEFRAG statement of the previous version of
SQL Server.
To reorganize a single partition of a partitioned index, use the PARTITION clause of ALTER INDEX.
To reorganize a single partition of a partitioned index, use the PARTITION clause of ALTER INDEX.
Reorganizing an index
defragments the leaf level of clustered and non-clustered indexes on tables and
views by physically reordering the leaf-level pages to match the logical order
(left to right) of the leaf nodes. Having the pages in order improves
index-scanning performance.
The index is
reorganized within the existing pages allocated to it; no new pages are
allocated. If an index spans more than one file, the files are reorganized one
at a time. Pages do not migrate between files.
Reorganizing also
compacts the index pages. Any empty pages created by this compaction are
removed providing additional available disk space. Compaction is based on the
fill factor value in the sys.indexes catalog
view.
The reorganize process
uses minimal system resources. Also, reorganizing is automatically performed
online. The process does not hold long-term blocking locks; therefore, it will
not block running queries or updates.
Reorganize an index
when the index is not heavily fragmented. See the previous table for
fragmentation guidelines. However, if the index is heavily fragmented, you will
achieve better results by rebuilding the index.
Besides reorganizing one or more indexes,
large object data types (LOBs) that are contained in the clustered index or
underlying table are compacted by default when an index is reorganized.
The data types image, text, ntext, varchar(max), nvarchar(max), varbinary(max), and xml are large object data types and compacting this data can cause better disk space use.
Reorganizing a specified clustered index will compact all LOB columns that are contained in the leaf level (data rows) of the clustered index.
Reorganizing a non-clustered index will compact all LOB columns that are non-key (included) columns in the index.
When ALL is specified, all indexes associated with the specified table or views are reorganized and all LOB columns associated with the clustered index, underlying table, or non-clustered index with included columns are compacted.
The LOB_COMPACTION clause is ignored if LOB columns are not present.
The data types image, text, ntext, varchar(max), nvarchar(max), varbinary(max), and xml are large object data types and compacting this data can cause better disk space use.
Reorganizing a specified clustered index will compact all LOB columns that are contained in the leaf level (data rows) of the clustered index.
Reorganizing a non-clustered index will compact all LOB columns that are non-key (included) columns in the index.
When ALL is specified, all indexes associated with the specified table or views are reorganized and all LOB columns associated with the clustered index, underlying table, or non-clustered index with included columns are compacted.
The LOB_COMPACTION clause is ignored if LOB columns are not present.
Rebuild Index
Rebuild index means
the dropping an existing index of the table and create a new index. The older
index is gone with the drop and in the new index the logical order matches the
physical order of the index and improves the performance.
In doing this,
fragmentation is removed, disk space is reclaimed by compacting the pages using
the specified or existing fill factor setting, and the index rows are reordered
in contiguous pages and allocating new pages as needed. This can improve disk
performance by reducing the number of page reads required to obtain the
requested data.
To rebuilds clustered
and non-clustered indexes ALTER INDEX with the REBUILD clause. This statement
replaces the DBCC DBREINDEX statement of previous version of the SQL Server and
the CREATE INDEX with the DROP_EXISTING clause.
If we rebuilding is a
cluster index, then our tables will be unavailable to the user during the
rebuild process and if it is a non-clustered index only a shared lock will be
placed on the table so users will be able to access the table, but would not be
allowed to do modifications on it.
SQL Server 2005 the
online option for index rebuilds is only available in Enterprise edition while
in SQL Server 2008 it is available in Enterprise, Developer, and Evaluation
editions.
When we are creating
an index online, it does not drop the existing index rather it keeps the
original index to be used by the users and creates an index parallel to it.
During this process only the source index will be available for use by the
users. Once the index rebuild operation is complete, the original index will be
dropped and the new one will be updated and available for read and write operations.
Reorganize Index
Example
-- Base Table Definition
CREATE TABLE emp_record
(empID INT IDENTITY(1,1) NOT NULL,
empNAME VARCHAR(50) NOT NULL,
empDEMP CHAR(1),
CONSTRAINT PK_emp_record PRIMARY KEY CLUSTERED(empID)
)
-- Reorganize Index
ALTER INDEX PK_emp_record ON emp_record REORGANIZE
-- Reorganize All Index
ALTER INDEX ALL ON emp_record REORGANIZE
Rebuild Index Example
-- Rebuild Index
ALTER INDEX PK_emp_record ON emp_record REBUILD
-- Rebuild Index with
option specified
ALTER INDEX ALL ON emp_record
REBUILD WITH (FILLFACTOR = 80, SORT_IN_TEMPDB = ON,
STATISTICS_NORECOMPUTE = ON)
-- Rebuild Index online
ALTER INDEX PK_emp_record ON emp_record REBUILD WITH (ONLINE = ON)
ALTER INDEX ALL ON emp_record REBUILD WITH (ONLINE = ON)
Summarize the
Differences
o Index Rebuild drops the existing Index and Recreates the index
from scratch.
o Index Reorganize physically reorganizes the leaf nodes of the
index.
o Rebuild the Index when an index is over 30% fragmented.
o Reorganize the Index when an index is between 10% and 30%
fragmented.
o If fragmentation is below 10%, no action required.
Rebuilding takes more server resources and
uses locks unless you use the ONLINE option available in 2005 Enterprise and
Development editions.
Defrag Index Stored
Procedure
This stored procedure
defrags the index accordingly. This is prototype only. You can modify it …
IF OBJECT_ID('proc_DEFRAG') IS NOT NULL
BEGIN
DROP PROCEDURE proc_DEFRAG
END
GO
CREATE PROCEDURE proc_DEFRAG
(
@p_DBName VARCHAR(50) = NULL,
@p_SchemaNm VARCHAR(50) = NULL,
@p_TBLName VARCHAR(50) = NULL,
@p_IndxName VARCHAR(50) = NULL
)
AS
DECLARE @v_FrgReOrgLmt DECIMAL(10,2),
@v_FrgReBuildLmt DECIMAL(10,2),
@v_CurrentFreg DECIMAL(10,2),
@v_Flag DECIMAL(1),
@v_fillfactor INT,
@v_Str VARCHAR(MAX),
@v_Message VARCHAR(MAX)
BEGIN
SET @v_FrgReOrgLmt = 10.00
SET @v_FrgReBuildLmt = 30.00
SET @v_CurrentFreg = 0.00
SET @v_Flag = 0
SET @v_fillfactor = 80
IF ISNULL(@p_DBName, '')<>''
AND ISNULL(@p_TBLName, '')<>''
AND ISNULL(@p_IndxName, '')<>''
AND ISNULL(@p_SchemaNm, '')<>''
BEGIN
SELECT @v_CurrentFreg = avg_fragmentation_in_percent
FROM
sys.dm_db_index_physical_stats(DB_ID(@p_DBName), NULL, NULL, NULL, NULL)a
INNER JOIN sys.tables b WITH (nolock) ON a.OBJECT_ID = b.OBJECT_ID
INNER JOIN sys.indexes c WITH (nolock) ON a.OBJECT_ID = c.OBJECT_ID
AND a.index_id = c.index_id
WHERE b.name = @p_TBLName
AND c.name = @p_IndxName
--Reorganizing the index
IF ((@v_CurrentFreg >= @v_FrgReOrgLmt)
AND (@v_CurrentFreg <= @v_FrgReBuildLmt))
BEGIN
SET @v_Str = 'ALTER INDEX ' + @p_IndxName +
' ON [' + RTRIM(LTRIM(@p_SchemaNm)) +
'].[' + RTRIM(LTRIM(@p_TBLName)) +
'] REORGANIZE'
SET @v_Flag = 1
SET @v_Message = 'Rebuild Index with REORGANIZE'
GOTO WAYOUT
END
--Rebuilding the index
IF (@v_CurrentFreg>@v_FrgReBuildLmt)
BEGIN
SET @v_Str = 'ALTER INDEX ' + @p_IndxName +
' ON [' + RTRIM(LTRIM(@p_SchemaNm)) +
'].[' + RTRIM(LTRIM(@p_TBLName)) +
'] REBUILD WITH (FILLFACTOR = ' +
CONVERT(VARCHAR(3),@v_fillfactor) +
', STATISTICS_NORECOMPUTE = OFF)'
SET @v_Flag = 1
SET @v_Message = 'Rebuild Index with REBGUILD'
END
WAYOUT:
IF @v_Flag = 1
BEGIN
EXEC (@v_Str)
END
ELSE
BEGIN
SET @v_Message = 'Parameter is NOT properly Supplied'
END
RAISERROR(@v_Message, 0, 1) WITH NOWAIT
RAISERROR('', 0, 1) WITH NOWAIT
END
END
GO
Index and Disk Space
Disk space is a very
important factor to Create, Rebuild or Drop Index. Disk space causes the
performance degradation of an index or even fails. So the understanding about
disk space is very important factors. In this article I am trying to
discuss related to it.
o
Space required in
Index Data Definition Language Operations
o
An Example of Index
Disk Space
o
Transaction Log Disk
Space for Index Operations
Space required in
Index DDL Operations
Here I am mentioning
some Index operations not required any additional disk space is mentioned
bellow.
o ALTER INDEX with REORGANIZE options not required any additional
disk space as the index is reorganized within the existing pages allocated to
it; no new pages are allocated. But the log space is required.
o When we are dropping a non-clustered index by DROP INDEX or when
we are dropping a clustered index offline without specifying the MOVE TO clause
and non-clustered index do not exist.
o When we are CREATE TABLE with PRIMARY KEY or UNIQUE constraints
The index operation
that required additional disk space is mentioned bellow.
All other index DDL
operations require additional temporary disk space to use during the operation,
and the permanent disk space to store the new index structure or structures.
When a new index
structure is created, disk space for both the old (source) and new (target)
structures is required in their appropriate files and file groups. The old
structure is not de-allocated until the index creation transaction commits.
The DDL operations
that needs new index structures and required additional disk space is mentioned
bellow.
o CREATE INDEX
o CREATE INDEX WITH DROP_EXISTING
o ALTER INDEX REBUILD
o ALTER TABLE ADD CONSTRAINT (PRIMARY KEY or UNIQUE)
o ALTER TABLE DROP CONSTRAINT (PRIMARY KEY or UNIQUE) when the
constraint is based on a clustered index
o DROP INDEX MOVE TO (Applies only to clustered indexes.)
An Example of Index Disk Space
I am finding this example from MSDN.
The formula is used to measure, disk space
requirements is mentioned bellow
IndexSize = NoOfRows *
AvgSizeOfIndex * 2 * (1/(100-FILLFACTOR)/100
In this example, disk space requirements to
create a clustered index are determined. Assume the following conditions are
true before creating the clustered index:
o The existing table (heap) contains 1 million rows. Each row is
200 bytes long.
o Non-clustered index A contains 1 million rows. Each row is 50
bytes long.
o Non-clustered index B contains 1 million rows. Each row is 80
bytes long.
o The index create memory option is set to 2 MB.
A fill factor value of
80 is used for all existing and new indexes. This means the pages are 80% full.
Note: As a result of creating a clustered
index, the two non-clustered indexes must be rebuilt to replace the row
indicator with the new clustered index key.
Disk Space
Calculations for an Offline Index Operation
In the following
steps, both temporary disk space to be used during the index operation and
permanent disk space to store the new indexes are calculated. The calculations
shown are approximate: results are rounded up and consider only the size of
index leaf level. The tilde (~) is used to indicate approximate calculations.
1.
Determine the
size of the existing (source) structures
Heap: 1 million * 200 bytes ~ 200 MB
Non-clustered index A: 1 million * 50 bytes / 80% ~ 63 MB
Non-clustered index B: 1 million * 80 bytes / 80% ~ 100 MB
Total size of existing structures: 363 MB
Heap: 1 million * 200 bytes ~ 200 MB
Non-clustered index A: 1 million * 50 bytes / 80% ~ 63 MB
Non-clustered index B: 1 million * 80 bytes / 80% ~ 100 MB
Total size of existing structures: 363 MB
2.
Determine the size of
the new (target) index structures
Assume that the new clustered key is 24 bytes long including a uniqueifier. The row indicator (8 bytes long) in both non-clustered indexes will be replaced by this clustered key.
Clustered index: 1 million * 200 bytes / 80% ~ 250 MB
Non-clustered index A: 1 million * (50 – 8 + 24) bytes / 80% ~ 83 MB
Non-clustered index B: 1 million * (80 – 8 + 24) bytes / 80% ~ 120 MB
Total size of new structures: 453 MB
Total disk space required to support both the source and target structures for the duration of the index operation is 816 MB (363 + 453). The space currently allocated to the source structures will be deallocated after the index operation is committed.
Assume that the new clustered key is 24 bytes long including a uniqueifier. The row indicator (8 bytes long) in both non-clustered indexes will be replaced by this clustered key.
Clustered index: 1 million * 200 bytes / 80% ~ 250 MB
Non-clustered index A: 1 million * (50 – 8 + 24) bytes / 80% ~ 83 MB
Non-clustered index B: 1 million * (80 – 8 + 24) bytes / 80% ~ 120 MB
Total size of new structures: 453 MB
Total disk space required to support both the source and target structures for the duration of the index operation is 816 MB (363 + 453). The space currently allocated to the source structures will be deallocated after the index operation is committed.
3.
Determine additional
temporary disk space for sorting
Space requirements are shown for sorting in tempdb (with SORT_IN_TEMPDB set to ON) and sorting in the target location (with SORT_IN_TEMPDB set to OFF).
Space requirements are shown for sorting in tempdb (with SORT_IN_TEMPDB set to ON) and sorting in the target location (with SORT_IN_TEMPDB set to OFF).
# When SORT_IN_TEMPDB is set to ON, tempdb must have sufficient disk space to hold the largest index (1 million * 200 bytes ~ 200 MB). Fill factor is not considered in the sorting operation. Additional disk space (in the tempdb location) equal to the index create memory Option value = 2 MB. Total size of temporary disk space with SORT_IN_TEMPDB set to ON ~ 202 MB.
# When
SORT_IN_TEMPDB is set to OFF (default), the 250 MB of disk space already
considered for the new index in step 2 is used for sorting. Additional disk
space (in the target location) equal to the index create memory Option value =
2 MB. Total size of temporary disk space with SORT_IN_TEMPDB set to OFF = 2 MB.
Using tempdb, a total
of 1018 MB (816 + 202) would be needed to create the clustered and
non-clustered indexes. Although using tempdb increases the amount of temporary
disk space used to create an index, it may reduce the time that is required to
create an index when tempdb is on a different set of disks than the user
database. For more information about using tempdb, see tempdb and Index
Creation.
Without using tempdb,
a total of 818 MB (816+ 2) would be needed to create the clustered and
non-clustered indexes.
When you create, drop,
or rebuild a clustered index online, additional disk space is required to build
and maintain a temporary mapping index. This temporary mapping index contains
one record for each row in the table, and its contents are the union of the old
and new bookmark columns.
To calculate the disk
space needed for an online clustered index operation, follow the steps shown
for an offline index operation and add those results to the results of the
following step.
Determine space for
the temporary mapping index. In this example, the old bookmark is the row ID
(RID)) of the heap (8 bytes) and the new bookmark is the clustering key (24
bytes including a uniqueifier). There are no overlapping columns between the
old and new bookmarks.
Temporary mapping
index size = 1 million * (8 bytes + 24 bytes) / 80% ~ 40 MB. This disk space
must be added to the required disk space in the target location if
SORT_IN_TEMPDB is set to OFF, or to tempdb if SORT_IN_TEMPDB is set to ON.
Transaction Log Disk
Space for Index Operations
To make sure that the
Index operations can be rollback the transaction log cannot be truncated until
the index operations has completed. Therefore, the transaction log must have
sufficient room to store both the index operation transactions and any
concurrent user transactions for the duration of the index operation. This is
for both offline online index operations.
When we are using
large scale index operation we must remember
o
The Transaction log
should be backed up and truncated before index operation online. The log has
sufficient space to store the project index and user transactions.
o
SORT_IN_TEMPDB options
should be ON. This separates the index transactions from concurrent user
transactions.
o
Use database recovery
model that allows minimal logging of the index operation. This may reduce the
size of the log and prevent the log from filling the log space.
o
Don't run the ONLINE
index operations in explicit transactions. The log will not be truncated until
the explicit transaction ends.
Index and Statistics
Microsoft SQL Server
collects statistical information related to indexes and column data stored in
the database. These statistics are used by the SQL Server query optimizer to
choose the most efficient plan for retrieving or updating data.
If we understand the
index statistics in better way, it helps us to optimize and fine tune our
queries.
As the Statistics play
a very important role in the performance of MS SQL Server query.
What we mean by MS SQL
server Index statistics
Before
going to the actual definition of Index statistics, I am going to take an
example so that we can understand it easily.
"Suppose
you are a football coach and u have 20 players in your team and for paying
final tournament you need 11 players out of 20. Now, you have to know (or
maintain a list) all the statistics of each payer before making your final
team."
Index
statistics contains information related to the distribution of index key
values. It means that the number of rows associated with each key value. To
determine what kind of execution plan to be used when processing a query MS SQL
Server query optimizer uses this information.
After
creating the table Index when we perform the insert, update, or delete
operations in the table, the statistics of the table become out of date.
The MS SQL Server doesn't update the statistics information every time we
made some changes (Insert/Update/Delete Operations) on the
table. Statistics track the distribution of values within an index or
within a particular column.
Note
that, if a column is not indexed but can benefit from an index, SQL Server will
automatically create statistics for that column.
When the Index
Statistics are updated
It is very important
to understand, when the MS SQL Server update the statistics. The database
settings AUTO_UPDATE_STATISTICS controls when the statistics are automatically
updated. By default the AUTO_UPDATE_STATISTICS is true that means that the
statistics are automatically updated.
MS SQL Server
determines that when to update the statistics after creating it. It is based on
how old the statics is. It determines the outdated statistics based on
number of Insert, Update and Delete from the date when the statistics is last
updated and then recreate the statistics based on a threshold.
This threshold is
relative to the number of records in the table. It means that the when the DML
operation performs the index statistics slowly get older, until SQL Server
determines to update the statistics.
Limitations:
When we have very
large table and we are doing bulk Insert/Update/Delete operation the
AUTO_UPDATE_STATISTICS can be overhead on our system. Because the
AUTO_
UPDATE_STATISTICS busy to update the statistics and causes the system overhead.
In such type of situation we must turn off the AUTO_UPDATE_STATISTICS and later
we must manually update the statistics.
Rules used by
AUTO_UPDATE_STATISTICS
"rowmodctr" column of
the sysindexes table is used to
determine the number of changes made since the last update of the statistics.
When the MS SQL Server
updates the statistics it will follow the following rules.
- If our table have 6 or fewer rows, the statistics will be updated after 6 changes.
- If a table has 500 or fewer rows, statistics will be updated after 500 changes
- If a table has more than 500 rows, statistics will be updated after 20% of the total rows plus 500 rows are changed.
Syntax
related to AUTO_UPDATE_STATISTICS
The
syntax is mentioned bellow:
sp_helpdb DBName
GO
ALTER DATABASE DBName SET AUTO_UPDATE_STATISTICS ON
GO
Determine that the
Index statistics are out of date
MS SQL Server uses the
sampling methods to keep track of the last time when the statistics were
updated. This information is used to determine how old your statistics is.
The function
STATS_DATE is used to determine when the statistics was last updated. The
sample script is uses this function to display index statistic date for all
user defined indexes.
SELECT
schema_name(o.schema_id)AS SchemaName,
OBJECT_NAME(si.object_id)AS TableName,
si.nameAS IndexName,
STATS_DATE(i.object_id, i.index_id)AS StatDate
FROM sys.indexes si
INNER JOIN sys.objects o
ON si.object_id= o.object_id
INNER JOIN sys.indexes i
ON i.object_id= si.object_id
AND i.index_id = si.index_id
WHERE o.type<>'S'
AND STATS_DATE(i.object_id, i.index_id)IS
NOT NULL;
Updating the
statistics
As we see that the
problem related to MS SQL Server updating statistics automatically, to get the
optimal output we need to manually update it when needed.
To update an Index
statistics we can drop the Index and then recreate the Index, it will
automatically update the statistics information. It works but it is not the
good way to update statistics information manually.
The system stored
procedure named "sp_updatestats" helps us to update the statistical
information.
The syntax is
mentioned below:
sp_updatestats[ [ @resample = ] 'resample']
Parameters
[ @resample =] 'resample'
Specifies
that sp_updatestats will use the RESAMPLE option of the UPDATE STATISTICS statement.
If 'resample' is not specified, sp_updatestats updates
statistics by using the default
sampling. Resample is varchar(8) with a default value of
NO.
Example:
USE AdventureWorks2012;
GO
EXECsp_updatestats;
The UPDATE STATISTICS
is another option to update the statistical information.
The syntax is
mentioned bellow:
UPDATE STATISTICS
table_or_indexed_view_name
[
{
{ index_or_statistics__name }
| ( { index_or_statistics_name } [ ,...n ] )
}
]
[ WITH
[
FULLSCAN
| SAMPLE number { PERCENT | ROWS }
| RESAMPLE
| <update_stats_stream_option> [ ,...n ]
]
[ [ , ] [ ALL | COLUMNS | INDEX ]
[ [ , ] NORECOMPUTE ]
] ;
<update_stats_stream_option>
::=
[
STATS_STREAM = stats_stream ]
[
ROWCOUNT = numeric_constant ]
[
PAGECOUNT = numeric_contant ]
For more information
about syntax, follow the MSDN
Example:
USE AdventureWorks2012;
GO
UPDATE STATISTICS Sales.SalesOrderDetail WITH FULLSCAN;
GO
How to see the content
of Index statistics
To see the actual
contents of the statistics we use the DBCC SHOW_STATISTICS.
The syntax is
mentioned bellow.
DBCC SHOW_STATISTICS
(Table_Name, Index_Name)
Parameters:
Table_Name : The name
of the Table.
Index_Name : The Index
name of the Mentioned Table.
For more information
about SHOW_STATISTICS, follow the MSDN
Hope you like
it.
Posted
by: MR. JOYDEEP DAS