Sunday, 19 November 2017

SSIS - Script Task is not working on Prod Server

Introduction
A common error that developers are facing when the SSIS package is deployed in the Production Server.
To understand it properly, let’s take a simple example.

Suppose in our package, we have a script task which is used to open an Excel file and just rename the sheet with a dynamic values came from DB table. It works fine in Development environment. But when it is deployed in production server, the script task is showing error.

Reason Behind it
Our development environment contains Microsoft Office installed. To open the Excel file by script task, he needs some reference of Excel (Excel must be register in GAC). As the Microsoft Office is installed in the Development environment the script task have no problem to work.

But in the Production environment, if you observe carefully there is no Microsoft Office component. So in this situation the script task of our package not get any reference of excel that’s the reason of failing script task.

How to Solve it
Simple solution is just installed Microsoft Office in the Production Server. But it is not so easy task as licensing is related to it. Sometimes the client doesn’t want to install it in production server.

To solve such kind of thing, we can suggest client to install just “2007 Microsoft Office system Primary Interop Assemblies” in the production server.


Web link to download:

The 2007 Microsoft Office Primary Interop Assemblies (PIA) redistributable is a Microsoft Windows Installer package that contains the Primary Interop Assemblies for 2007 Microsoft Office products. Specifically, the redistributable contains the PIAs for the following products:


·         Microsoft Office Access 2007
·         Microsoft Office Excel 2007
·         Microsoft Office InfoPath 2007
·         Microsoft Office Outlook 2007
·         Microsoft Office PowerPoint 2007
·         Microsoft Office Project 2007
·         Microsoft Office Publisher 2007
·         Microsoft Office SharePoint Designer 2007
·         Microsoft Office Visio 2007
·         Microsoft Office Word 2007

  
Hope you like it.


Posted by: MR. JOYDEEP DAS

Monday, 6 November 2017

With(NOLOCK) Vs With(NOWAIT)

Introduction
We all know about dirty reads and it is not good for transaction table specially OLTP environment.
But sometimes customer wants like we are going to fire the query from frontend, if the table is locked by other transaction it just simply return a message. We don’t want any dirty read or not want to wait until the Transaction finish. If necessary, we again fire the query after sometime to get the result.
Before jumping into this type of solution we are going to show something related to dirty read.

Scenario
We have table named customer.

CREATE TABLE tbl_CustomerDetails
       (
              CustID        Int           NOT NULL IDENTITY Primary Key,
              CustName      Varchar(50)   NOT NULL,
              ContactNo     Varchar(12)   NOT NULL
       );
GO

INSERT tbl_CustomerDetails
       (CustName, ContactNo)
VALUES('Joydeep', '123456'),
      ('Deblina', '897654');

We are trying to update this customer table from session-1 and in the same time we are trying to retrieve record from customer table in session-2.

Session -1
BEGIN TRAN
   UPDATE tbl_CustomerDetails
      SET ContactNo = '99999'
   WHERE CustID = 1;

Please note that we are not providing any Commit or Rollback Transaction option in Session -1.
Now try to retrieve the records from session -2

Session -2
SELECT * FROM tbl_CustomerDetails;

Here we are not getting any output, until the Session -1 is committed or roll backed. Here the update statement of Session – 1 is just make lock on table object.
So it is not going to solve our solutions.
Now, to get the result.

Session -2
SELECT * FROM tbl_CustomerDetails With(NOLOCK);

Here we are using With(NOLOCK) keyword to get the dirty data.
CustID      CustName                                           ContactNo
----------- -------------------------------------------------- ------------
1           Joydeep                                            99999
2           Deblina                                            897654

(2 row(s) affected)

Now think about a situation.
In Session -1 we are using Rollback Transaction after getting the result in Session -2

Session – 1
ROLLBACK TRAN

So, what we get in the Session -2 is not correct data. That is called dirty read.
Now, it is not going to solve our problem as customer doesn’t want to get dirty records.
To solve this problem we are using With(NOWAIT) key word.

Session – 2
SELECT * FROM tbl_CustomerDetails With(NOWAIT);

CustID      CustName                                           ContactNo
----------- -------------------------------------------------- ------------
Msg 1222, Level 16, State 51, Line 1
Lock request time out period exceeded.

It is not waiting or not giving any dirty data. It just simply through an error message saying Lock request time out. It means Session-1 provides a lock on this table and for that it is unable to retrieve data.
So, we have to wait and fire the query again until the Session -1 finishes his transaction.
It is definitely solve our problem statement.
Hope you understand it.




Posted by: MR. JOYDEEP DAS


Sunday, 9 July 2017

Extracting Data from Flat file with Different Columns Delimiter

Introduction

Hare in this article, we are going to discuss to retrieve data from flat file. Sounds look easy. No, it’s not. As our flat file columns delimiters are little different.

Sample of Flat file

We have a flat file named StudentRecords.txt



If we look at the flat file we can find that the columns have different type of Delimiter like

Roll  !  StudentName  !  Calss  -  Section

Roll, Student Name and Class are separated by explanation sign (!) and Section is separated by minus sign (-).

Solution Approach

Control Flow Task



Step – 1 [ Retrieve Data from Flat file Using Flat file Connection Manager ]









In the columns Delimiter drop-down we never find the explanatory sign (!). So we need to type it down.

The Columns Delimiter Drop-down Contains

Value
Description
{CR}{LF}
Columns are delimited by a carriage return-line feed combination.
{CR}
Columns are delimited by a carriage return.
{LF}
Columns are delimited by a line feed.
Semicolon {;}
Columns are delimited by a semicolon.
Colon {:}
Columns are delimited by a colon.
Comma {,}
Columns are delimited by a comma.
Tab {t}
Columns are delimited by a tab.
Vertical bar {|}
Columns are delimited by a vertical bar.

Note:  There is an option in this Drop-down that if any column separator is not found, we can directly type it down.


Step – 2 [ Save the records into another Flat file Destination ]



In the destination flat file connection manager, we use column delimiter as minus sign (-).
Now execute the package to generate the destination flat file.

Step – 3 [ Now read the Destination Flat File as Source ]








Step – 4 [ Execute package ]




Hope you like it.




Posted by: MR. JOYDEEP DAS


Thursday, 22 June 2017

Related to Index

Introduction

When we think about the performance factors, the Index plays the main role. Here we are trying to discuss some points related to

  •      Clustered Index
  •       Index Fragmentation
  •      Index Defragmentation
  •      Index and Disk Space
  •      Index and Statistics

  
Clustered Index Structure

In order to understand the design principles of a good clustered index, we need to discuss how SQL Server stores clustered indexes. All table data is stored in 8 KB data pages. When a table contains a clustered index, the clustered index tells SQL Server how to order the table's data pages

In the SQL Server the index are organized as B-Tree structure.

o   Each pages of the index B-Tree are called index node.
o   The top node of the B-Tree is called the root node.
o   The bottom level of nodes in the index is called the leaf node.
o   Any level between the root and the leaf node are collectively known as intermediate level.

In a clustered index, the leaf nodes contain the data pages of the underlying table. The root and intermediate level nodes contain index pages holding index rows. Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index. The pages at each level of the index are linked in a doubly-linked list.




The leaf level of the B-tree is always Level- 0 and the root level has been always the highest level. In this figure there is only one intermediate Level, but in a large index will often have more than one intermediate Level, and a small index might not have an intermediate Level.




The Index pages in the root and intermediate levels contain a clustering key and a page pointing down into the Next level of the B-Tree. This process will be repeated until the leaf node is reached. The leaf node of the clustered index contains the actual data in an ordered fashion based on the clustering key.



How the Clustered Index Works

To understand it, please look at the bellow figure of B-Tree structure.




It contains the 1 million records of clustered key Student ID. In this figure the Level-1 and 2 is the Index page. In Level – 1 each page contains 50000 records and clustered index values plus a pointed down into the associated page on the next level.
To understand it properly I am going to take an example.

We want to retrieve the details of student ID 500. For this SQL server has to read the three pages. The root page in Level 2, the intermediate page in Level 1, and the appropriate leaf level page in Level 0. The root level tells the SQL Server to which intermediate level to read and intermediate levels tells that which leaf level page to read to retrieve the details.







Effective clustered index

The prime characteristics of a clustered index are mentioned bellow.

·                     Narrow
·                     Unique 
·                     Static
·                     Ever-increasing

We are trying to discuss it in little more details.

Narrow

The width of the Index refers to the number of bytes in the index key. Let's take an example to understand it.

CREATE TABLE student_dtl
       (
                  studentID             INT IDENTITY(1,1),
                  studentRefKEY   UNIQUEIDENTIFIER,
                  studentName       VARCHAR(50)
       ) 

The table has the 15 millions of rows and only two columns are candidates for use as clustering key.

o   The "studentID" has the INT data type used the 4 bytes.
o   The "studentRefKEY" has the UNIQUEIDENTIFIER data type used the 16 bytes.


INSERT INTO student_dtl
           (studentRefKEY,    studentName)
VALUES (newid(),       'Joydeep Das')

GO

SELECT datalength(studentID)  As  studentID,
               datalength(empRefKEY) As empRefKEY
FROM   emp_dtl

-- Result Output

studentID         studentRefKEY
-----                        ---------
4                              16

Which one is good for a clustered index?
If we see the B-Tree structure of both, in case of UNIQUEIDENTIFIER the number of intermediate levels is greeter then the INT. The facts are the UNIQUEIDENTIFIER takes the much more spacious than INT data type. When the clustered index is created few rows can be packed into each index page, and the clustered key requires an additional non-leaf level to store the Key.

Unique 

First of all we have to know that the SQL Server doesn’t require a clustered index to be UNIQUE.
A clustered index must be unique, if the clustered index is not a unique index; SQL Server makes any duplicate
 keys, unique by adding an internally generated value called an uniqueifier. This four-byte value is not visible to users.

Static

We want to choose a clustering key that will never be updated. SQL Server must ensure that data exists in a logical order based upon the clustering key. Therefore, when the clustering key value is updated, the data may need to be moved elsewhere in the clustered index so that the clustering order is maintained.

There are other considerations that can affect how many pages are updated, such as whether an update to a variable-length column causes the row to exceed the amount of available space. In such a case, the data would still need to be moved, although only the data page of the clustered index is affected.

So, the updating the clustering key is more expensive. Therefore, it is a best practice to avoid clustering on columns that are frequently updated.

Ever-increasing 

An integer identity column is an excellent example of an ever-increasing column. The identity property continuously increments by the value defined at creation, which is typically one. This allows SQL Server, as new rows are inserted, to keep writing to the same page until the page is full, then repeating with a newly allocated page.

SQL Server can much more efficiently write data if it knows the row will always be added to the most recently allocated, or last, page

Reduction in clustered index fragmentation, this fragmentation results from data modifications and can take the form of gaps in data pages, so wasting space, and a logical ordering of the data that no longer matches the physical ordering.

Index Fragmentation

Index fragmentation is a phenomenon where index contents are no longer stored continuously in the storage. When index contents become scattered in the storage, fragmented, performance on the index will degrade.


There are 2 types of Index fragmentation

  •      Extent or External fragmentation.
  •           Page or Internal fragmentation.


The Index fragmentation can occur in both Clustered and Non-Clustered Index. To understand the External and Page fragmentation I am taking an example.

Un-Fragmented Index





In the above figure all the pages are full and the physical order of the pages is sequential.
As a result of data modifications, in the cases of extent fragmentation the pages get out of the physical order.  This type of fragmentation produces random IO, which does not perform as well as sequential IO.


Extent Fragmentation

The bellow figure demonstrates the extent fragmentation.





In the internal fragmentation there are gaps in the data pages, which reduce the amount of data that can be stored on each page. So it increases the overall amount of space needed to store the data. When the pages are not full and additional pages are required to store the data, as more IO will be required to retrieve those additional pages.
  


Page Fragmentation





During a page split, a new page is allocated, and half the records are moved from the old page to the newly-allocated page. Each page has a pointer to the previous and next page in the index, so those pages will also need to be updated. The bellows figure demonstrates the process.

Page splitting due to fragmentation



Analyzing Fragmentation

To analyze SQL Server indexes, we use the system function. The function named  dm_db_index_physical_statsto determine which indexes are fragmented and the extent of that fragmentation. We used this function to analyzing all the index of the database, or all the index of the table or a specified index.

The function  dm_db_index_physical_statsto takes the following parameters described below.


Database ID

This is a small integer value that represents the ID number of a database. If null is specified, the function retrieves index-related data from all databases on a SQL Server instance.

Object ID

This is an integer value that represents the ID number of a table or view. If null is specified, the function retrieves index-related data for all tables and views in a specific database or a SQL Server instance. If you specify null, you must also specify null for the index ID and partition number.

Index ID

This is an integer value that represents the ID number of an index. If null is specified, the function retrieves index-related data for all indexes defined on the specified table or view. If you specify null, you must also specify null for the partition number. Also, if the object ID refers to a heap, use 0 as the index ID.

Partition number 

This is an integer value that represents the partition number of an index or heap. If null is specified, the function retrieves index-related information for all partitions related to a specific object.

Mode 

The scan level used to obtain index-related information. Valid inputs include NULL, DEFAULT, or one of the following three modes:

LIMITED

Scans the smallest number of pages, which means this is the fastest mode. The LIMITED mode is equivalent to NULL and DEFAULT.

SAMPLED

Scans 1% of all pages. If an index contains fewer than 10,000 pages, then DETAILED mode is used.

DETAILED

Scans all index pages, which means this is the slowest mode, but most accurate.
The sys.dm_db_index_physical_stats function returns a number of values that provide details about the indexes we specify. SQL Server Books Online provides details about each of these values.




Along with other information, there are two important columns that for detecting fragmentation, which are as follows:

avg_fragmentation_in_percent

This is a percentage value that represents external fragmentation. For a clustered table and leaf level of index pages, this is Logical fragmentation, while for heap, this is Extent fragmentation. The lower this value, the better it is. If this value is higher than 10%, some corrective action should be taken.

avg_page_space_used_in_percent

This is an average percentage use of pages that represents to internal fragmentation. Higher the value, the better it is. If this value is lower than 75%, some corrective action should be taken.

Here is the quick script to find fragmentation of the database for all the indexes.
SELECT         ps.database_id,
                        ps.OBJECT_ID,
                        ps.index_id,
                        b.name,
                        ps.avg_fragmentation_in_percent
FROM     sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL,NULL) ASps
              INNER JOIN sys.indexes AS b ON ps.OBJECT_ID = b.OBJECT_ID
                                        AND ps.index_id = b.index_id
WHERE ps.database_id = DB_ID()
ORDER BY ps.OBJECT_ID

Index Defragmentation

Fragmentation of the Index can be de-fragmented by two ways; it depends on size of the table and the level of the defragmentation.

o   Reorganize Index
o   Rebuild Index



Reorganize Index

It does not take much more system resources. That means it can be done when the user access the table objects where the index exists. This process reorganizes the leaf nodes of the index physically to match it with logical order. If the physical order matches the logical order of the index it increases the performance.  

To reorganize one or more indexes, use the ALTER INDEX statement with the REORGANIZE clause. This statement replaces the DBCC INDEXDEFRAG statement of the previous version of SQL Server.


To reorganize a single partition of a partitioned index, use the PARTITION clause of ALTER INDEX.
Reorganizing an index defragments the leaf level of clustered and non-clustered indexes on tables and views by physically reordering the leaf-level pages to match the logical order (left to right) of the leaf nodes. Having the pages in order improves index-scanning performance.

The index is reorganized within the existing pages allocated to it; no new pages are allocated. If an index spans more than one file, the files are reorganized one at a time. Pages do not migrate between files.

Reorganizing also compacts the index pages. Any empty pages created by this compaction are removed providing additional available disk space. Compaction is based on the fill factor value in the sys.indexes catalog view.

The reorganize process uses minimal system resources. Also, reorganizing is automatically performed online. The process does not hold long-term blocking locks; therefore, it will not block running queries or updates.

Reorganize an index when the index is not heavily fragmented. See the previous table for fragmentation guidelines. However, if the index is heavily fragmented, you will achieve better results by rebuilding the index.

Besides reorganizing one or more indexes, large object data types (LOBs) that are contained in the clustered index or underlying table are compacted by default when an index is reorganized.


The data types image, text, ntext,  varchar(max),  nvarchar(max),  varbinary(max), and xml are large object data types and compacting this data can cause better disk space use.

Reorganizing a specified clustered index will compact all LOB columns that are contained in the leaf level (data rows) of the clustered index.

Reorganizing a non-clustered index will compact all LOB columns that are non-key (included) columns in the index.

When ALL is specified, all indexes associated with the specified table or views are reorganized and all LOB columns associated with the clustered index, underlying table, or non-clustered index with included columns are compacted.


The LOB_COMPACTION clause is ignored if LOB columns are not present.

Rebuild Index

Rebuild index means the dropping an existing index of the table and create a new index. The older index is gone with the drop and in the new index the logical order matches the physical order of the index and improves the performance.

In doing this, fragmentation is removed, disk space is reclaimed by compacting the pages using the specified or existing fill factor setting, and the index rows are reordered in contiguous pages and allocating new pages as needed. This can improve disk performance by reducing the number of page reads required to obtain the requested data.

To rebuilds clustered and non-clustered indexes ALTER INDEX with the REBUILD clause. This statement replaces the DBCC DBREINDEX statement of previous version of the SQL Server and the CREATE INDEX with the DROP_EXISTING clause.

If we rebuilding is a cluster index, then our tables will be unavailable to the user during the rebuild process and if it is a non-clustered index only a shared lock will be placed on the table so users will be able to access the table, but would not be allowed to do modifications on it.
SQL Server 2005 the online option for index rebuilds is only available in Enterprise edition while in SQL Server 2008 it is available in Enterprise, Developer, and Evaluation editions.

When we are creating an index online, it does not drop the existing index rather it keeps the original index to be used by the users and creates an index parallel to it. During this process only the source index will be available for use by the users. Once the index rebuild operation is complete, the original index will be dropped and the new one will be updated and available for read and write operations.

Reorganize Index Example

-- Base Table Definition

CREATE TABLE emp_record
       (empID                  INT          IDENTITY(1,1) NOT NULL,
        empNAME          VARCHAR(50)  NOT NULL,
        empDEMP          CHAR(1),
        CONSTRAINT PK_emp_record PRIMARY KEY CLUSTERED(empID)
       )

-- Reorganize Index

ALTER INDEX PK_emp_record ON emp_record REORGANIZE

-- Reorganize All Index

ALTER INDEX ALL ON emp_record REORGANIZE

Rebuild Index Example

-- Rebuild Index

ALTER INDEX PK_emp_record ON emp_record REBUILD

-- Rebuild Index with option specified

ALTER INDEX ALL ON emp_record
REBUILD WITH (FILLFACTOR = 80, SORT_IN_TEMPDB = ON,
              STATISTICS_NORECOMPUTE = ON)


-- Rebuild Index online

ALTER INDEX PK_emp_record ON emp_record REBUILD WITH (ONLINE = ON)


ALTER INDEX ALL ON emp_record REBUILD WITH (ONLINE = ON)


Summarize the Differences

o   Index Rebuild drops the existing Index and Recreates the index from scratch.
o   Index Reorganize physically reorganizes the leaf nodes of the index.
o   Rebuild the Index when an index is over 30% fragmented.
o   Reorganize the Index when an index is between 10% and 30% fragmented.
o   If fragmentation is below 10%, no action required.

Rebuilding takes more server resources and uses locks unless you use the ONLINE option available in 2005 Enterprise and Development editions.

Defrag Index Stored Procedure

This stored procedure defrags the index accordingly. This is prototype only. You can modify it …

IF OBJECT_ID('proc_DEFRAG') IS NOT NULL
   BEGIN
       DROP PROCEDURE proc_DEFRAG
   END
GO

CREATE PROCEDURE proc_DEFRAG
       (
         @p_DBName   VARCHAR(50) = NULL,
         @p_SchemaNm VARCHAR(50) = NULL,
         @p_TBLName  VARCHAR(50) = NULL,
         @p_IndxName VARCHAR(50) = NULL
       )
AS
  DECLARE @v_FrgReOrgLmt   DECIMAL(10,2),
          @v_FrgReBuildLmt DECIMAL(10,2),
          @v_CurrentFreg   DECIMAL(10,2),
          @v_Flag          DECIMAL(1),
          @v_fillfactor    INT,
          @v_Str           VARCHAR(MAX),
          @v_Message       VARCHAR(MAX)
BEGIN
  SET @v_FrgReOrgLmt   = 10.00
  SET @v_FrgReBuildLmt = 30.00
  SET @v_CurrentFreg   = 0.00
  SET @v_Flag = 0
  SET @v_fillfactor = 80

  IF ISNULL(@p_DBName, '')<>''
         AND ISNULL(@p_TBLName, '')<>''
               AND ISNULL(@p_IndxName, '')<>''
                     AND ISNULL(@p_SchemaNm, '')<>''
     BEGIN
         SELECT @v_CurrentFreg = avg_fragmentation_in_percent
         FROM   sys.dm_db_index_physical_stats(DB_ID(@p_DBName), NULL, NULL, NULL, NULL)a
                INNER JOIN sys.tables b WITH (nolock) ON a.OBJECT_ID = b.OBJECT_ID
                INNER JOIN sys.indexes c WITH (nolock) ON a.OBJECT_ID = c.OBJECT_ID
                                                          AND a.index_id = c.index_id
                WHERE  b.name = @p_TBLName
                       AND c.name = @p_IndxName

          --Reorganizing the index
          IF ((@v_CurrentFreg >= @v_FrgReOrgLmt) 
                   AND (@v_CurrentFreg <= @v_FrgReBuildLmt))
             BEGIN
                SET @v_Str = 'ALTER INDEX ' + @p_IndxName +
                           ' ON [' + RTRIM(LTRIM(@p_SchemaNm)) +
                           '].[' + RTRIM(LTRIM(@p_TBLName)) +
                           '] REORGANIZE' 
                SET @v_Flag = 1
                SET @v_Message = 'Rebuild Index with REORGANIZE'
                GOTO WAYOUT
             END
          --Rebuilding the index
          IF (@v_CurrentFreg>@v_FrgReBuildLmt)
             BEGIN
               SET @v_Str = 'ALTER INDEX ' + @p_IndxName +
                   ' ON [' + RTRIM(LTRIM(@p_SchemaNm)) +
                   '].[' + RTRIM(LTRIM(@p_TBLName)) +
                   '] REBUILD WITH (FILLFACTOR = ' +
                   CONVERT(VARCHAR(3),@v_fillfactor) +
                   ', STATISTICS_NORECOMPUTE = OFF)'
               SET @v_Flag = 1
               SET @v_Message = 'Rebuild Index with REBGUILD'  
             END
WAYOUT:
          IF @v_Flag = 1   
             BEGIN
                   EXEC (@v_Str)
             END
         ELSE
            BEGIN
                   SET @v_Message = 'Parameter is NOT properly Supplied'  
            END

         RAISERROR(@v_Message, 0, 1) WITH NOWAIT
         RAISERROR('', 0, 1) WITH NOWAIT  
     END
END
GO

Index and Disk Space


Disk space is a very important factor to Create, Rebuild or Drop Index. Disk space causes the performance degradation of an index or even fails. So the understanding about disk space is very important factors.  In this article I am trying to discuss related to it.

o   Space required in Index Data Definition Language Operations
o   An Example of Index Disk Space
o   Transaction Log Disk Space for Index Operations

Space required in Index DDL Operations

Here I am mentioning some Index operations not required any additional disk space is mentioned bellow.

o   ALTER INDEX with REORGANIZE options not required any additional disk space as the index is reorganized within the existing pages allocated to it; no new pages are allocated. But the log space is required.
o   When we are dropping a non-clustered index by DROP INDEX or when we are dropping a clustered index offline without specifying the MOVE TO clause and non-clustered index do not exist.
o   When we are CREATE TABLE with PRIMARY KEY or UNIQUE constraints

The index operation that required additional disk space is mentioned bellow.
All other index DDL operations require additional temporary disk space to use during the operation, and the permanent disk space to store the new index structure or structures.


When a new index structure is created, disk space for both the old (source) and new (target) structures is required in their appropriate files and file groups. The old structure is not de-allocated until the index creation transaction commits.

The DDL operations that needs new index structures and required additional disk space is mentioned bellow.

o   CREATE INDEX
o   CREATE INDEX WITH DROP_EXISTING
o   ALTER INDEX REBUILD
o   ALTER TABLE ADD CONSTRAINT (PRIMARY KEY or UNIQUE)
o   ALTER TABLE DROP CONSTRAINT (PRIMARY KEY or UNIQUE) when the constraint is based on a clustered index
o   DROP INDEX MOVE TO (Applies only to clustered indexes.)
An Example of Index Disk Space
I am finding this example from MSDN.
The formula is used to measure, disk space requirements is mentioned bellow
IndexSize = NoOfRows *  AvgSizeOfIndex * 2 * (1/(100-FILLFACTOR)/100

In this example, disk space requirements to create a clustered index are determined. Assume the following conditions are true before creating the clustered index:
o   The existing table (heap) contains 1 million rows. Each row is 200 bytes long.
o   Non-clustered index A contains 1 million rows. Each row is 50 bytes long.
o   Non-clustered index B contains 1 million rows. Each row is 80 bytes long.
o   The index create memory option is set to 2 MB.

A fill factor value of 80 is used for all existing and new indexes. This means the pages are 80% full.

Note: As a result of creating a clustered index, the two non-clustered indexes must be rebuilt to replace the row indicator with the new clustered index key.
Disk Space Calculations for an Offline Index Operation

In the following steps, both temporary disk space to be used during the index operation and permanent disk space to store the new indexes are calculated. The calculations shown are approximate: results are rounded up and consider only the size of index leaf level. The tilde (~) is used to indicate approximate calculations.


1.             Determine the size of the existing (source) structures

Heap: 1 million * 200 bytes ~ 200 MB
Non-clustered index A: 1 million * 50 bytes / 80% ~ 63 MB
Non-clustered index B: 1 million * 80 bytes / 80% ~ 100 MB
Total size of existing structures: 363 MB
2.            Determine the size of the new (target) index structures

Assume that the new clustered key is 24 bytes long including a uniqueifier. The row indicator (8 bytes long) in both non-clustered indexes will be replaced by this clustered key.

Clustered index: 1 million * 200 bytes / 80% ~ 250 MB
Non-clustered index A: 1 million * (50 – 8 + 24) bytes / 80% ~ 83 MB
Non-clustered index B: 1 million * (80 – 8 + 24) bytes / 80% ~ 120 MB
Total size of new structures: 453 MB

Total disk space required to support both the source and target structures for the duration of the index operation is 816 MB (363 + 453). The space currently allocated to the source structures will be deallocated after the index operation is committed.
3.            Determine additional temporary disk space for sorting

Space requirements are shown for sorting in tempdb (with SORT_IN_TEMPDB set to ON) and sorting in the target location (with SORT_IN_TEMPDB set to OFF).

When SORT_IN_TEMPDB is set to ON, tempdb must have sufficient disk space to hold the largest index (1 million * 200 bytes ~ 200 MB). Fill factor is not considered in the sorting operation. Additional disk space (in the tempdb location) equal to the index create memory Option value = 2 MB. Total size of temporary disk space with SORT_IN_TEMPDB set to ON ~ 202 MB.
When SORT_IN_TEMPDB is set to OFF (default), the 250 MB of disk space already considered for the new index in step 2 is used for sorting. Additional disk space (in the target location) equal to the index create memory Option value = 2 MB. Total size of temporary disk space with SORT_IN_TEMPDB set to OFF = 2 MB.

Using tempdb, a total of 1018 MB (816 + 202) would be needed to create the clustered and non-clustered indexes. Although using tempdb increases the amount of temporary disk space used to create an index, it may reduce the time that is required to create an index when tempdb is on a different set of disks than the user database. For more information about using tempdb, see tempdb and Index Creation.

Without using tempdb, a total of 818 MB (816+ 2) would be needed to create the clustered and non-clustered indexes.

Disk Space Calculations for an Online Clustered Index Operation

When you create, drop, or rebuild a clustered index online, additional disk space is required to build and maintain a temporary mapping index. This temporary mapping index contains one record for each row in the table, and its contents are the union of the old and new bookmark columns.

To calculate the disk space needed for an online clustered index operation, follow the steps shown for an offline index operation and add those results to the results of the following step.

Determine space for the temporary mapping index. In this example, the old bookmark is the row ID (RID)) of the heap (8 bytes) and the new bookmark is the clustering key (24 bytes including a uniqueifier). There are no overlapping columns between the old and new bookmarks.

Temporary mapping index size = 1 million * (8 bytes + 24 bytes) / 80% ~ 40 MB. This disk space must be added to the required disk space in the target location if SORT_IN_TEMPDB is set to OFF, or to tempdb if SORT_IN_TEMPDB is set to ON.

Transaction Log Disk Space for Index Operations

To make sure that the Index operations can be rollback the transaction log cannot be truncated until the index operations has completed. Therefore, the transaction log must have sufficient room to store both the index operation transactions and any concurrent user transactions for the duration of the index operation. This is for both offline online index operations.

When we are using large scale index operation we must remember

o   The Transaction log should be backed up and truncated before index operation online. The log has sufficient space to store the project index and user transactions.
o   SORT_IN_TEMPDB options should be ON. This separates the index transactions from concurrent user transactions.
o   Use database recovery model that allows minimal logging of the index operation. This may reduce the size of the log and prevent the log from filling the log space.  
o   Don't run the ONLINE index operations in explicit transactions. The log will not be truncated until the explicit transaction ends.

Index and Statistics

Microsoft SQL Server collects statistical information related to indexes and column data stored in the database. These statistics are used by the SQL Server query optimizer to choose the most efficient plan for retrieving or updating data.
If we understand the index statistics in better way, it helps us to optimize and fine tune our queries.
As the Statistics play a very important role in the performance of MS SQL Server query.

What we mean by MS SQL server Index statistics
Before going to the actual definition of Index statistics, I am going to take an example so that we can understand it easily.

"Suppose you are a football coach and u have 20 players in your team and for paying final tournament you need 11 players out of 20.  Now, you have to know (or maintain a list) all the statistics of each payer before making your final team." 

Index statistics contains information related to the distribution of index key values. It means that the number of rows associated with each key value. To determine what kind of execution plan to be used when processing a query MS SQL Server query optimizer uses this information.

After creating the table Index when we perform the insert, update, or delete operations in the table, the statistics of the table become out of date.  The MS SQL Server doesn't update the statistics information every time we made some changes (Insert/Update/Delete Operations) on the table. Statistics track the distribution of values within an index or within a particular column.

Note that, if a column is not indexed but can benefit from an index, SQL Server will automatically create statistics for that column.

When the Index Statistics are updated
It is very important to understand, when the MS SQL Server update the statistics. The database settings AUTO_UPDATE_STATISTICS controls when the statistics are automatically updated. By default the AUTO_UPDATE_STATISTICS is true that means that the statistics are automatically updated.
MS SQL Server determines that when to update the statistics after creating it. It is based on how old the statics is.  It determines the outdated statistics based on number of Insert, Update and Delete from the date when the statistics is last updated and then recreate the statistics based on a threshold.
This threshold is relative to the number of records in the table. It means that the when the DML operation performs the index statistics slowly get older, until SQL Server determines to update the statistics.
Limitations:
When we have very large table and we are doing bulk Insert/Update/Delete operation the AUTO_UPDATE_STATISTICS can be overhead on our system. Because the                        AUTO_ UPDATE_STATISTICS busy to update the statistics and causes the system overhead. In such type of situation we must turn off the AUTO_UPDATE_STATISTICS and later we must manually update the statistics.
Rules used by AUTO_UPDATE_STATISTICS
"rowmodctr" column of the sysindexes table is used to determine the number of changes made since the last update of the statistics.
When the MS SQL Server updates the statistics it will follow the following rules.


  •       If our table have 6 or fewer rows, the statistics will be updated after 6 changes.
  •       If a table has 500 or fewer rows, statistics will be updated after 500 changes
  •       If a table has more than 500 rows, statistics will be updated after 20% of the total rows    plus 500 rows are changed.  



Syntax related to AUTO_UPDATE_STATISTICS

The syntax is mentioned bellow:

sp_helpdb DBName
GO

ALTER DATABASE DBName SET AUTO_UPDATE_STATISTICS ON
GO


Determine that the Index statistics are out of date
MS SQL Server uses the sampling methods to keep track of the last time when the statistics were updated. This information is used to determine how old your statistics is.
The function STATS_DATE is used to determine when the statistics was last updated. The sample script is uses this function to display index statistic date for all user defined indexes.

SELECT  schema_name(o.schema_id)AS SchemaName,
        OBJECT_NAME(si.object_id)AS TableName,
        si.nameAS IndexName,
        STATS_DATE(i.object_id, i.index_id)AS StatDate
FROM    sys.indexes si
        INNER JOIN sys.objects o
             ON si.object_id= o.object_id
        INNER JOIN sys.indexes i
             ON i.object_id= si.object_id
                AND i.index_id = si.index_id
WHERE   o.type<>'S'
        AND STATS_DATE(i.object_id, i.index_id)IS NOT NULL;

Updating the statistics
As we see that the problem related to MS SQL Server updating statistics automatically, to get the optimal output we need to manually update it when needed.
To update an Index statistics we can drop the Index and then recreate the Index, it will automatically update the statistics information. It works but it is not the good way to update statistics information manually.
The system stored procedure named "sp_updatestats" helps us to update the statistical information.

The syntax is mentioned below:

sp_updatestats[ [ @resample = ] 'resample']

Parameters
@resample =] 'resample'

Specifies that sp_updatestats will use the RESAMPLE option of the UPDATE STATISTICS statement. If 'resample' is not specified, sp_updatestats updates statistics by using the default sampling. Resample is varchar(8) with a default value of NO.


Example:

USE AdventureWorks2012;
GO
EXECsp_updatestats;


The UPDATE STATISTICS is another option to update the statistical information.
The syntax is mentioned bellow:

UPDATE STATISTICS table_or_indexed_view_name
    [
        {
            { index_or_statistics__name }
          | ( { index_or_statistics_name } [ ,...n ] )
                }
    ]
    [    WITH
        [
            FULLSCAN
            | SAMPLE number { PERCENT | ROWS }
            | RESAMPLE
            | <update_stats_stream_option> [ ,...n ]
        ]
        [ [ , ] [ ALL | COLUMNS | INDEX ]
        [ [ , ] NORECOMPUTE ]
    ] ;

<update_stats_stream_option> ::=
    [ STATS_STREAM = stats_stream ]
    [ ROWCOUNT = numeric_constant ]
    [ PAGECOUNT = numeric_contant ]

For more information about syntax, follow the MSDN
Example:
USE AdventureWorks2012;
GO
UPDATE STATISTICS Sales.SalesOrderDetail WITH FULLSCAN;
GO

How to see the content of Index statistics
To see the actual contents of the statistics we use the DBCC SHOW_STATISTICS.
The syntax is mentioned bellow.


DBCC SHOW_STATISTICS (Table_Name, Index_Name)

Parameters:
Table_Name : The name of the Table.
Index_Name : The Index name of the Mentioned Table.

For more information about SHOW_STATISTICS, follow the MSDN



Hope you like it.



Posted by: MR. JOYDEEP DAS