Filtered Indexes are a great way to improve performance when your application frequently requests the latest data from a large table. Unlike standard indexes, Filtered indexes only index a subset of the data in the table. This gives you a much smaller index that is faster to query.
Today I’m using the AdventureWorksDW2012 database to run all my tests on a SQL 2012 box (I’ll be updating my test rig soon).
Here’s my setup code:
This creates a test table named FilteredIndexTest by copying the OrderDate column from the FactInternetSales table. I then create a filtered index that only includes dates of 2008-07-01 or greater.
USE [AdventureWorksDW2012]; SET NOCOUNT ON; IF EXISTS(SELECT * FROM sys.tables t WHERE t.name = 'FilteredIndexTest') DROP TABLE [FilteredIndexTest]; SELECT OrderDate INTO [FilteredIndexTest] FROM FactInternetSales; CREATE NONCLUSTERED INDEX [IX_OrderDate_from_20080701] ON [FilteredIndexTest](OrderDate) WHERE OrderDate >= '2008-07-01 00:00:00.000';
A quick sp_spaceused shows us that the table has 60,398 rows and is about 2MB. The filtered index is just 80KB. This proves that the filtered index contains just a subset of the data in the table. If you were to SELECT from the FilteredIndexTest table using the WHERE clause of the filtered index you would see that this subset is just 2,209 rows.
EXEC sp_spaceused FilteredIndexTest;
The following SELECT statement uses the filtered index to perform a seek. This is because it is only interested in recent data that happens to be included in the filtered index.
SELECT * FROM [FilteredIndexTest] WHERE OrderDate > '2008-07-03 00:00:00.000'
So this is good. We can leave all the old data in the table in case we need it later and just concentrate on the most recent data with the lightweight filtered index. The problem arises when we want to use a variable instead of hardcoding values.
DECLARE @OrderDate DATETIME2 SET @OrderDate = '2008-07-03 00:00:00.000' SELECT * FROM [FilteredIndexTest] WHERE OrderDate > @OrderDate
With the variable we get a table scan.
SQL Server gets one chance when the query is first executed to build a plan that will be able to process the query no matter what the value of the variable. The query optimiser doesn’t know what the value of the variable will be at run time. This is because each statements execution plan is generated separately and information can not be passed from one statements plan to the other at run time. For this reason it can’t build a plan that uses the filtered index because the variable may have a value that is out of the filtered indexes range. In our case that leaves it no other option but to scan the entire table.
I hope you also noticed the yellow warning on the SELECT operator.
The warning is that our index is “Unmatched”. This warning indicates that a filtered index was not used due parameterisation. It is possible to shred the cached plans to make sure this warning is not happening on your server but that is left as an extra exercise for the reader.
Also note the estimation of 18,119 rows is inaccurate compared to the actual 1,970 rows. This is because the query optimiser couldn’t use the statistics created with the filtered index.
A workaround for this is to include the query hint OPTION(RECOMPILE). This allows the query optimiser to build a plan based on the value of the variable as it is at run time. It can do this because the plan will be rebuilt for each execution and only ever used once, so it doesn’t have to be built in a way that caters for differing values. Each execution gets it’s own bespoke plan.
OPTION(RECOMPILE) shouldn’t be used for all queries as it will cause more CPU burn due to the recompilations but this is usually acceptable if your CPU usage is not high. Applying this hint to stored procedures that will be executed multiple times a second would probably be a bad idea.
DECLARE @OrderDate DATETIME2; SET @OrderDate = '2008-07-03 00:00:00.000'; SELECT * FROM [FilteredIndexTest] WHERE OrderDate > @OrderDate OPTION (RECOMPILE);
The plan looks a little more complex now due to the Compute Scalar and Constant Scan caused by OPTION (RECOMPILE), but we are again hitting the filtered index with an index seek using an accurate estimate. The query no longer needs to scan every record in the table which on a very large table would save lots of time and IO.
Filtered indexes can really ease the pain of querying recent data in large tables but never assume your query is using the filtered index. This is not the only limitation of filtered indexes. For more information try some of these excellent links: