Dismal Blog

The tales of an under paid, over worked Employee in IT

SQL Server and Duplicate Indexes

Please note that I'm not a DBA, nor do I profess to be all knowing when it comes to SQL Server, these are observations I've made over the years that I think people may find benefit from.

Introduction

Everyone should know that indexes on a database are both useful and to be feared!

One of the most common things I've come across in my years of working on database backed products is that people grossly underestimate the impact of adding indexes.  Although people are acutely aware that adding indexes can decrease the performance, this is often overlooked (probably as they believe the performance gains outweigh the losses).  The issue is that, over time, the database can get littered with indexes that are not used for lookup, but have to be maintained for inserts/updates.

The other big issue with indexes that are not needed is that they take up space, in both memory and disk space.  For databases with very large tables (millions of rows), additional indexes can take Gigabytes of additional space.

Why do people add indexes if they're not needed?

This is a good question, the answer is usually laziness (in my opinion).

The most common thing I've found is that people, once adding new indexes, don't actually check to make sure that they're worth adding in real world scenarios.  By this I mean that, you may look at some queries and speed them up by adding indexes for them, however, in real world usage the actual index that is used is different to the one you've added.  Alternatively, the query that you've optimised is used very infrequently, and not nearly as often as you think.

The next most common is people adding indexes as the application matures, without analysing the old indexes to see if there are superfluous ones due to the new ones that have been added.

But... But... SQL Server Database Engine Tuning Advisor told me to and it's smarter than me!

There are a few reasons why this is a problem.

Firstly, unless you're using a real data (you ran the trace during a typical period on you're live production systems and it included ALL the usage that will happen), the suggestions won't be exact.  The suggestions are only as good as the data it's given (incidently, if you don't understand the importance of this then please tell me what software you create so I can stay clear).

Secondly, you will not be told that you should "replace" an index with another (i.e. combine 2 indexes).  Now, I'm not sure this part is true, but I've never come across a situation where this has been suggested.

Finally, unless you do this atleast as often as you change you're code, things will get out of date fast.

Can you give me some examples?

I have tons, but here's some of the things that I've seen recently.

1. Creating a Non-Clustered covering index on the Primary Key.

I've seen this work, but most of the time, it doesn't get used and has to be maintained for every insert/update/delete.

2. Two indexes with the same fields but different included columns

This one just made me laugh, a prime example of "the query was running slow, and the tuning advisor says it needs this index, it must be right".

3. A second index with additional columns

This is a little more subtle, so you create an index on columns 1 and 2, then another on columns 1, 2 and 3.  This is the most common I've found.

So what can I do to fix it?

Now I've told you what you're likely doing wrong, it wouldn't be a very useful article if I didn't tell you how you can fix it.

As with most application tuning, the best thing to do first is to attack the easy, less risky stuff (in business terms, the "low hanging fruit").  To me, this is removing those indexes that aren't used. 

So how do we find these easily? luckily, SQL Server has a DMV (Dynamic Management View) that is updated with the usage stats of the indexes.

This query will give you a list of all the indexes, what tables there on, and how many times they're been searched, how many times they've been updated, and the size in memory.

USE eproc2_nep
SELECT 
	sch.name + '.' + t.name AS [Table Name], 
	i.name AS [Index Name], 
	i.type_desc,  
	ISNULL(user_updates,0) AS [Total Writes], 
	ISNULL(user_seeks + user_scans + user_lookups,0) AS [Total Reads],
	s.last_user_seek, 
	s.last_user_scan ,
	s.last_user_lookup,
	ISNULL(user_updates,0) - ISNULL((user_seeks + user_scans + user_lookups),0) AS [Difference],  
	p.reserved_page_count * 8.0 / 1024 as SpaceInMB
FROM sys.indexes AS i WITH (NOLOCK)  
	LEFT OUTER JOIN sys.dm_db_index_usage_stats AS s	WITH (NOLOCK) ON s.object_id = i.object_id  AND i.index_id = s.index_id  AND s.database_id=db_id()  AND objectproperty(s.object_id,'IsUserTable') = 1  
	INNER JOIN		sys.tables					AS t	WITH (NOLOCK) ON i.object_id = t.object_id  
	INNER JOIN		sys.schemas					AS sch	WITH (NOLOCK) ON t.schema_id = sch.schema_id  
	LEFT OUTER JOIN sys.dm_db_partition_stats	AS p	WITH (NOLOCK) ON i.index_id = p.index_id and i.object_id = p.object_id
WHERE (1=1)
	--AND ISNULL(user_updates,0) >= ISNULL((user_seeks + user_scans + user_lookups),0) --shows all indexes including those that have not been used  
	--AND ISNULL(user_updates,0) - ISNULL((user_seeks + user_scans + user_lookups),0)>0 --only shows those indexes which have been used  
	--AND i.index_id > 1			-- Only non-first indexes (I.E. non-primary key)
	--AND i.is_primary_key<>1		-- Only those that are not defined as a Primary Key)
	--AND i.is_unique_constraint<>1 -- Only those that are not classed as "UniqueConstraints".  
ORDER BY [Table Name], [index name]

I've left in the additional where clauses that will allow you to filter this list by things like whether it's a primary key or not, whether the index has been used etc.

The second script I found on the internet a while ago.  It does a pretty good job of finding duplicate indexes based on the columns.

/* This script will generate 3 reports that give an overall or high level
view of the indexes in a particular database. The sections are as follows:
1.  Lists ALL indexes and constraints along with the key details of each
2.  Lists any tables with potential Redundant indexes
3.  Lists any tables with potential Reverse indexes
*/
--  Create a table variable to hold the core index info
DECLARE @AllIndexes TABLE (
 [Table ID] [int] NOT NULL,
 [Schema] [sysname] NOT NULL,
 [Table Name] [sysname] NOT NULL,
 [Index ID] [int] NULL,
 [Index Name] [nvarchar](128) NULL,
 [Index Type] [varchar](12) NOT NULL,
 [Constraint Type] [varchar](11) NOT NULL,
 [Object Type] [varchar](10) NOT NULL,
 [AllColName] [nvarchar](2078) NULL,
 [ColName1] [nvarchar](128) NULL,
 [ColName2] [nvarchar](128) NULL,
 [ColName3] [nvarchar](128) NULL,
 [ColName4] [nvarchar](128) NULL,
 [ColName5] [nvarchar](128) NULL,
 [ColName6] [nvarchar](128) NULL,
 [ColName7] [nvarchar](128) NULL,
 [ColName8] [nvarchar](128) NULL,
 [ColName9] [nvarchar](128) NULL,
 [ColName10] [nvarchar](128) NULL
)

--  Load up the table variable with the index information to be used in follow on queries
INSERT INTO @AllIndexes
 ([Table ID],[Schema],[Table Name],[Index ID],[Index Name],[Index Type],[Constraint Type],[Object Type]
 ,[AllColName],[ColName1],[ColName2],[ColName3],[ColName4],[ColName5],[ColName6],[ColName7],[ColName8],
 [ColName9],[ColName10])
SELECT o.[object_id] AS [Table ID] ,u.[name] AS [Schema],o.[name] AS [Table Name],
 i.[index_id] AS [Index ID]
 , CASE i.[name]
 WHEN o.[name] THEN '** Same as Table Name **'
 ELSE i.[name] END AS [Index Name],
 CASE i.[type]
 WHEN 1 THEN 'CLUSTERED'
 WHEN 0 THEN 'HEAP'
 WHEN 2 THEN 'NONCLUSTERED'
 WHEN 3 THEN 'XML'
 ELSE 'UNKNOWN' END AS [Index Type],
 CASE
 WHEN (i.[is_primary_key]) = 1 THEN 'PRIMARY KEY'
 WHEN (i.[is_unique]) = 1 THEN 'UNIQUE'
 ELSE '' END AS [Constraint Type],
 CASE
 WHEN (i.[is_unique_constraint]) = 1
 OR (i.[is_primary_key]) = 1
 THEN 'CONSTRAINT'
 WHEN i.[type] = 0 THEN 'HEAP'
 WHEN i.[type] = 3 THEN 'XML INDEX'
 ELSE 'INDEX' END AS [Object Type],
 (SELECT COALESCE(c1.[name],'') FROM [sys].[columns] AS c1 INNER JOIN [sys].[index_columns] AS ic1
 ON c1.[object_id] = ic1.[object_id] AND c1.[column_id] = ic1.[column_id] AND ic1.[key_ordinal] = 1
 WHERE ic1.[object_id] = i.[object_id] AND ic1.[index_id] = i.[index_id]) +
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].['+ o.[name] + ']', i.[index_id], 2) IS NULL THEN ''
 ELSE ', '+INDEX_COL('[' + u.[name] + '].['+ o.[name] + ']', i.[index_id],2) END +
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id], 3) IS NULL THEN ''
 ELSE ', '+INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],3) END +
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id], 4) IS NULL THEN ''
 ELSE ', '+INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],4) END +
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].['+ o.[name] + ']', i.[index_id], 5) IS NULL THEN ''
 ELSE ', '+INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],5) END  +
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id], 6) IS NULL THEN ''
 ELSE ', '+INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],6) END +
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id], 7) IS NULL THEN ''
 ELSE ', '+INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id], 7) END +
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],8) IS NULL THEN ''
 ELSE ', '+INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],8) END +
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].['+ o.[name] + ']', i.[index_id], 9) IS NULL THEN ''
 ELSE ', '+INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],9) END +
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].['+ o.[name] + ']', i.[index_id], 10) IS NULL THEN ''
 ELSE ', '+INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],10) END  AS [AllColName],
 (SELECT COALESCE(c1.[name],'') FROM [sys].[columns] AS c1 INNER JOIN [sys].[index_columns] AS ic1
 ON c1.[object_id] = ic1.[object_id] AND c1.[column_id] = ic1.[column_id] AND ic1.[key_ordinal] = 1
 WHERE ic1.[object_id] = i.[object_id] AND ic1.[index_id] = i.[index_id])   AS [ColName1],
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].['+ o.[name] + ']', i.[index_id], 2) IS NULL THEN ''
 ELSE INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],2) END AS [ColName2],
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id], 3) IS NULL THEN ''
 ELSE INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],3) END AS [ColName3],
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].['+ o.[name] + ']', i.[index_id], 4) IS NULL THEN ''
 ELSE INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],4) END AS [ColName4],
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].['+ o.[name] + ']', i.[index_id], 5) IS NULL THEN ''
 ELSE INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],5) END AS [ColName5],
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].['+ o.[name] + ']', i.[index_id], 6) IS NULL THEN ''
 ELSE INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],6) END AS [ColName6],
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id], 7) IS NULL THEN ''
 ELSE INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],7) END AS [ColName7],
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].['+ o.[name] + ']', i.[index_id],8) IS NULL THEN ''
 ELSE INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],8) END AS [ColName8],
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].['+ o.[name] + ']', i.[index_id], 9) IS NULL THEN ''
 ELSE INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],9) END AS [ColName9],
 CASE
 WHEN INDEX_COL('[' + u.[name] + '].['+ o.[name] + ']', i.[index_id], 10) IS NULL THEN ''
 ELSE INDEX_COL('[' + u.[name] + '].[' + o.[name] + ']', i.[index_id],10) END AS [ColName10]
FROM [sys].[objects] AS o WITH (NOLOCK)
 LEFT OUTER JOIN [sys].[indexes] AS i WITH (NOLOCK)
 ON o.[object_id] = i.[object_id]
 JOIN [sys].[schemas] AS u WITH (NOLOCK)
 ON o.[schema_id] = u.[schema_id]
WHERE o.[type] = 'U' --AND i.[index_id] < 255
 AND o.[name] NOT IN ('dtproperties')
 AND i.[name] NOT LIKE '_WA_Sys_%'

-----------
SELECT 'Listing All Indexes' AS [Comments]

SELECT I.*
 FROM @AllIndexes AS I
 ORDER BY [Table Name]

-----------
SELECT 'Listing Possible Redundant Index keys' AS [Comments]

SELECT DISTINCT I.[Table Name], I.[Index Name] ,I.[Index Type],  I.[Constraint Type], I.[AllColName]
 FROM @AllIndexes AS I
 JOIN @AllIndexes AS I2
 ON I.[Table ID] = I2.[Table ID]
 AND I.[ColName1] = I2.[ColName1]
 AND I.[Index Name] <> I2.[Index Name]
 AND I.[Index Type] <> 'XML'
 ORDER BY I.[Table Name], I.[AllColName]

----------
SELECT 'Listing Possible Reverse Index keys' AS [Comments]

SELECT DISTINCT I.[Table Name], I.[Index Name], I.[Index Type],  I.[Constraint Type], I.[AllColName]
 FROM @AllIndexes AS I
 JOIN @AllIndexes AS I2
 ON I.[Table ID] = I2.[Table ID]
 AND I.[ColName1] = I2.[ColName2]
 AND I.[ColName2] = I2.[ColName1]
 AND I.[Index Name] <> I2.[Index Name]
 AND I.[Index Type] <> 'XML'

Conclusion

Finding and removing indexes in SQL server is an excellent way to remove bottlenecks, especially if you've "inherited" the application or (like me) you've spent so much time on application code that databases have been a bit of voodoo.  It's an easy thing to do, and can have dramatic effects.

Add comment