Monday, April 2, 2012

Execution Plans – Table, Clustered Index and Non-Clustered Index Scan

An important operator to understand in execution plans is a scan. A Scan can be good and bad, so understanding the difference between them is a basic skill needed by a DBA or developer that is going to do performance tuning.

The Table Scan means that a table does not have a clustered index and the optimal query plan decided to scan all leaf level pages in the table. The table is stored as a Heap because there is not clustered index. A table can have only one clustered index.

We are going to drop the primary key/clustered index on the DatabaseLog table of the AdventureWorks database.

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[DatabaseLog]') AND name = N'PK_DatabaseLog')
ALTER TABLE [dbo].[DatabaseLog] DROP CONSTRAINT [PK_DatabaseLog]
GO

Then, we are going to execute the following code with Include Actual Execution Plan query option activated. In SQL Server Management Studio open a query window, either right-click the query window and choose Include Actual Execution Plan or select Include Actual Execution Plan from the Query menu.


image


Execute this code:



SELECT [DatabaseLogID]      ,[PostTime]      ,[DatabaseUser]      ,[Event]
      ,[Schema]      ,[Object]      ,[TSQL]      ,[XmlEvent]
  FROM [dbo].[DatabaseLog]

You will get the following execution plan table scan


image


The optimizer says a full Table Scan is the best plan. The operator Table Scan indicates that the table does not have a clustered index. By default, creating a primary key will make the key a clustered index, but that is not always the case. I am working on a Fact table at work where a Unique Constraint is the clustered index, but the primary key is a 7 column compound key of Surrogate Keys.


By adding a Clustered Index to table DatabaseLog (Primary Key), we can change this Table scan to a clustered index scan. A clustered index changes the logical/physical structure of the table to follow an index tree.



ALTER TABLE [dbo].[DatabaseLog] 
    ADD  CONSTRAINT [PK_DatabaseLog] 
        PRIMARY KEY CLUSTERED 
        ([DatabaseLogID] ASC) 

We get the following if we execute the SELECT query again like above:


image


So, what is the difference in Cost? None.


image     image


So what is the big deal, they both costs the same. Well, when you start using a WHERE clause and the no clustered index table still does a full scan, but the clustered index might do an Clustered Index Seek if the density/cardinality gives the optimizer a better choice, you will see the performance difference. More on this is a later blog.


My advice, ALWAYS have a primary AND/OR a clustered index on all tables. The decision on making the primary key a clustered index is dependent on your experience in managing and planning the correct implementation.


All right, lastly we will see a Index Scan. First, we will add a Non-clustered index to the table DatabaseLog.



CREATE NONCLUSTERED INDEX [idxDatabaseLog_DatabaseUserScehmaObject] 
    ON [dbo].[DatabaseLog]
(    [DatabaseUser], [Schema], [Object] )

image


Now, the query optimizer creates a plan to scan the new non-clustered index to retrieve the data needed in the SELECT statement. If we add a WHERE clause to the SELECT, it will change to a Index Seek.



SELECT [DatabaseUser], [Schema], [Object]
  FROM [dbo].[DatabaseLog]
  WHERE DatabaseUser = 'dbo'
image

Reading execution plans can be a lot easier than understanding them to help performance tuning. I know I spent many hours thinking I finally got it, before some article on SQL Server Central website or SQL Server Magazine changed the understanding I have about T-SQL statements and indexing for performance. But, you have to start somewhere, then keep improving your understanding.

5 comments:

  1. thanks. made my concepts a little bit clear.

    ReplyDelete
  2. not problem, love to share what has been taught to me from others, as well as experience...

    Thomas

    ReplyDelete
  3. who decide which index should be used , i have one database on two different environment. Its same database dump - indexes , columns are same , then one execution plan has with non clustered index & other with clustered index.

    ReplyDelete
  4. You might want to Reindex and/or Update statistics on both databases then look at execution plans again.

    Thanks,
    Thomas

    ReplyDelete