Tuesday, 31 July 2012

Index and Statistics


Introductions
Microsoft SQL Server collects statistical information related to indexes and column data stored in the database. These statistics are used by the SQL Server query optimizer to choose the most efficient plan for retrieving or updating data.
If we understand the index statistics in better way, it helps us to optimize and fine tune our queries.
As the Statistics play a very important role in the performance of MS SQL Server query.
In this article I am trying discussing related to it. Before writing this article, I read several article and MSDN to gathering facts related to it.

Point to Focus
In this article I am trying to focus related to the following things of the SQL Server Index statistics.
1.    What we mean by MS SQL server Index statistics
2.    When the Index Statistics are Updated
3.    Rules used by AUTO_UPDATE_STATISTICS
4.    Determine that the Index statistics are out of date.
5.    Updating the statistics
6.    How to see the content of Index statistics
7.    New feature for statistics in MS SQL Server 2008

What we mean by MS SQL server Index statistics
Before going to the actual definition of Index statistics, I am going to take an example so that we can understand it easily.

"Suppose you are a football coach and u have 20 players in your team and for paying final tournament you need 11 players out of 20.  Now, you have to know (or maintain a list) all the statistics of each payer before making your final team." 

Index statistics contains information related to the distribution of index key values. It means that the number of rows associated with each key value. To determine what kind of execution plan to be used when processing a query MS SQL Server query optimizer uses this information.

After creating the table Index when we perform the insert, update, or delete operations in the table, the statistics of the table become out of date.  The MS SQL Server doesn't update the statistics information every time we made some changes (Insert/Update/Delete Operations) on the table. Statistics track the distribution of values within an index or within a particular column.

Note that, if a column is not indexed but can benefit from an index, SQL Server will automatically create statistics for that column.

When the Index Statistics are updated
It is very important to understand, when the MS SQL Server update the statistics. The database settings AUTO_UPDATE_STATISTICS controls when the statistics are automatically updated. By default the AUTO_UPDATE_STATISTICS is true that means that the statistics are automatically updated.
MS SQL Server determines that when to update the statistics after creating it. It is based on how old the statics is.  It determines the outdated statistics based on number of Insert, Update and Delete from the date when the statistics is last updated and then recreate the statistics based on a threshold.
This threshold is relative to the number of records in the table. It means that the when the DML operation performs the index statistics slowly get older, until SQL Server determines to update the statistics.
Limitations:
When we have very large table and we are doing bulk Insert/Update/Delete operation the AUTO_UPDATE_STATISTICS can be overhead on our system. Because the                        AUTO_ UPDATE_STATISTICS busy to update the statistics and causes the system overhead. In such type of situation we must turn off the AUTO_UPDATE_STATISTICS and later we must manually update the statistics.
Rules used by AUTO_UPDATE_STATISTICS
"rowmodctr" column of the sysindexes table is used to determine the number of changes made since the last update of the statistics.
When the MS SQL Server updates the statistics it will follow the following rules.
Ø  If our table have 6 or fewer rows, the statistics will be updated after 6 changes.
Ø  If a table has 500 or fewer rows, statistics will be updated after 500 changes
Ø  If a table has more than 500 rows, statistics will be updated after 20% of the total rows    plus 500 rows are changed.  
Syntax related to AUTO_UPDATE_STATISTICS

The syntax is mentioned bellow:

sp_helpdb DBName
GO

ALTER DATABASE DBName SET AUTO_UPDATE_STATISTICS ON
GO


Determine that the Index statistics are out of date
MS SQL Server uses the sampling methods to keep track of the last time when the statistics were updated. This information is used to determine how old your statistics is.
The function STATS_DATE is used to determine when the statistics was last updated. The sample script is uses this function to display index statistic date for all user defined indexes.

SELECT  schema_name(o.schema_id)AS SchemaName,
        OBJECT_NAME(si.object_id)AS TableName,
        si.nameAS IndexName,
        STATS_DATE(i.object_id, i.index_id)AS StatDate
FROM    sys.indexes si
        INNER JOIN sys.objects o
             ON si.object_id= o.object_id
        INNER JOIN sys.indexes i
             ON i.object_id= si.object_id
                AND i.index_id = si.index_id
WHERE   o.type<>'S'
        AND STATS_DATE(i.object_id, i.index_id)IS NOT NULL;

Updating the statistics
As we see that the problem related to MS SQL Server updating statistics automatically, to get the optimal output we need to manually update it when needed.
To update an Index statistics we can drop the Index and then recreate the Index, it will automatically update the statistics information. It works but it is not the good way to update statistics information manually.
The system stored procedure named "sp_updatestats" helps us to update the statistical information.
The syntax is mentioned below:
sp_updatestats[ [ @resample = ] 'resample']

Parameters
@resample =] 'resample'

Specifies that sp_updatestats will use the RESAMPLE option of the UPDATE STATISTICS statement. If 'resample' is not specified, sp_updatestats updates statistics by using the default sampling. Resample is varchar(8) with a default value of NO.
Example:
USE AdventureWorks2012;
GO
EXECsp_updatestats;
The UPDATE STATISTICS is another option to update the statistical information.
The syntax is mentioned bellow:

UPDATE STATISTICS table_or_indexed_view_name
    [
        {
            { index_or_statistics__name }
          | ( { index_or_statistics_name } [ ,...n ] )
                }
    ]
    [    WITH
        [
            FULLSCAN
            | SAMPLE number { PERCENT | ROWS }
            | RESAMPLE
            | <update_stats_stream_option> [ ,...n ]
        ]
        [ [ , ] [ ALL | COLUMNS | INDEX ]
        [ [ , ] NORECOMPUTE ]
    ] ;

<update_stats_stream_option> ::=
    [ STATS_STREAM = stats_stream ]
    [ ROWCOUNT = numeric_constant ]
    [ PAGECOUNT = numeric_contant ]

For more information about syntax, follow the MSDN
Example:
USE AdventureWorks2012;
GO
UPDATE STATISTICS Sales.SalesOrderDetail WITH FULLSCAN;
GO

How to see the content of Index statistics
To see the actual contents of the statistics we use the DBCC SHOW_STATISTICS.
The syntax is mentioned bellow.
DBCC SHOW_STATISTICS (Table_Name, Index_Name)

For more information about SHOW_STATISTICS, follow the MSDN
Parameters:
Table_Name : The name of the Table.
Index_Name : The Index name of the Mentioned Table.

New feature for statistics in MS SQL Server 2008
The new feature of statistics in SQL 2008 are mentioned as per MSDN
SQL Server 2008 provides the option to create filtered statistics, which are restricted to a subset of the rows in the table defined by a predicate, also called a filter. Filtered statistics are either created explicitly, or more commonly, through creating a filtered index.
·         Automatic creation: As mentioned, filtered statistics are automatically created as a by-product of filtered indexes. Unlike regular statistics, filtered statistics are never automatically created by the query optimizer as a result of query and update statements.
·         Automatic update: To determine when statistics need to be automatically updated, the query optimizer uses column modification counters. For filtered statistics, the counter is multiplied by the selectivity of the filter, which ranges from 0 to 1, to compensate for the fact that the column counter tracks changes to all rows in the table, not the changes to the rows in the filtered statistics object.
·         Sampling: To build the statistics, SQL Server 2008 reads every nth data page, according to the sampling rate, and then it applies the filter, if any.
·         If you don't specify sampling rate, the default behaviour is to sample based on the number of rows in the table and, for filtered statistics, based on the selectivity of the filter. The more selective the filter, the more rows need to be sampled to gather enough data to build the histogram.
·         Cardinality estimate: As with regular statistics, the optimizer attempts to pick the most relevant statistics when performing cardinality estimate. There is an additional check that the optimizer must make for filtered statistics: whether the statistics predicate contains the query predicate (that is, whether the query predicate must be true for a row if the statistics predicate is true for that row). Out of the possible filtered statistics, the query optimizer favours those that are as close to the query predicate as possible.

For more information about it, follow the MSDN

Related tropics



Referential Sources


Hope you like it.

Posted by: MR. JOYDEEP DAS

7 comments: