There are times on Azure SQL Database when tables get to a certain size and that trying to create indexes results in timeouts.
A few months ago when I started to get these famous timeouts, I had reached 10 million records and I felt like I was looking for a needle in a hay stack!
This blog post is all about creating NONCLUSTERED INDEXES, I will try to go over best practices and reasons to keep in mind when you use them in Azure SQL Database.
Interesting Facts
- If a CLUSTERED INDEX is present on the table, then NONCLUSTERED INDEXES will use its key instead of the table ROW ID.
- To reduce the size consumed by the NONCLUSTERED INDEXES it’s imperative that the CLUSTERED INDEX KEY is kept as narrow as possible.
- Physical reorganization of the CLUSTERED INDEX does not physically reorder NONCLUSTERED INDEXES.
- SQL Database can JOIN and INTERSECT INDEXES in order to satisfy a query without having to read data directly from the table.
- Favor many narrow NONCLUSTERED INDEXES that can be combined or used independently over wide INDEXES that can be hard to maintain.
- Create Filtered INDEXES to create highly selective sets of keys for columns that may not have a good selectivity otherwise.
- Use Covering INDEXEs to reduce the number of bookmark lookups required to gather data that is not present in the other INDEXES.
- Covering INDEXES can be used to physically maintain the data in the same order as is required by the queries’ result sets reducing the need for SORT operations.
- Covering INDEXES have an increased maintenance cost, therefore you must see if performance gain justifies the extra maintenance cost.
- NONCLUSTERED INDEXES can reduce blocking by having SQL Database read from NONCLUSTERED INDEX data pages instead of the actual tables.
Creating a NONCLUSTERED INDEX
In order to look at a few examples lets start with the following table.
CREATE TABLE [dbo].[TestTable] ( [Id] UNIQUEIDENTIFIER DEFAULT (newid()) NOT NULL, [FirstName] NVARCHAR (10) NOT NULL, [LastName] NVARCHAR (10) NOT NULL, [Type] INT DEFAULT ((0)) NOT NULL, [City] NVARCHAR (10) NOT NULL, [Country] NVARCHAR (10) NOT NULL, [Created] DATETIME2 (7) DEFAULT (getdate()) NOT NULL, [Timestamp] ROWVERSION NOT NULL, PRIMARY KEY NONCLUSTERED ([Id] ASC) ); GO CREATE CLUSTERED INDEX [IX_TestTable] ON [dbo].[TestTable]([Created] ASC);
The table has a CLUSTERED INDEX on the Created column. The reasons why this might be an interesting choice for the CLUSTERED INDEX were discussed in my previous blog post about building Clustered Indexes on non-primary key columns in Azure SQL Database.
For the sake of this example, lets imagine that this table contains roughly 30 millions records. Creating a NONCLUSTERED INDEX on this table might result in a timeout or an aborted transaction due to restrictions imposed by Azure SQL Database. One of these restrictions, is that a transaction log cannot exceed 1GB in size.
On Azure SQL Database, you must use the ONLINE=ON option in order to reduce locking and the transaction log size. Furthermore, this option will greatly reduce your chances of getting a timeout.
CREATE INDEX [IX_TestTable_Country_City_LastName_FirstName] ON [dbo].[TestTable] ([Country] ASC,[City] ASC,[LastName] ASC,[FirstName] ASC) WITH(ONLINE=ON);
Creating a Filtered NONCLUSTERED INDEX
A where clause is added to the index. This results in a Filtered index and greatly helps to create smaller and more effective INDEXES. I recommend taking a closer look at this type of index because bytes are precious on Azure SQL Database.
CREATE INDEX [IX_TestTable_Type_LastName_FirstName_where_Type_greater_than_1] ON [dbo].[TestTable] ([Type] ASC,[LastName] ASC,[FirstName] ASC) WHERE [Type] > 1 WITH(ONLINE=ON);
Creating a Covering NONCLUSTERED INDEX
The include clause can be used to include columns that are selected by the queries but that are not filtered upon through where clauses. The include clause can also be used to include columns that cannot be added to the INDEX key.
CREATE INDEX [IX_TestTable_Covering_Index] ON [dbo].[TestTable] ([Type] ASC,[LastName] ASC,[FirstName] ASC) INCLUDE([City], [Country], [Id]) WITH(ONLINE=ON);
Very nice blog. Impressive.
LikeLike
Can i use ‘Online=ON’ option in entity framework migrations?
LikeLike
I have never done it. I personally use Microsoft SQL Server Data Tools and teh SQL Database Project to manage that complexity. https://alexandrebrisebois.wordpress.com/2013/07/20/take-control-windows-azure-sql-database-application-life-cycle-management-made-easy/
LikeLike
One thing to note, I believe that you have to have a clustered index for tables in SQL Azure.
LikeLike
You are correct. I wrote post about creating Clustered Indexes that can be updated on Azure SQL Database.
LikeLike
If you have a table with 10’s of millions of rows and you can prevent it from blowing up when creating an index, my question is should you?
Will building the index take a long time? Will it slow down insertions or deletions?
LikeLike
Agreed on both accounts. It all depends on your consumption needs. Some times I have a write table that I merge into a heavily indexed denormalized read table.
LikeLike
Do you copy everything into a new table, insert into the new table, drop the old table and rename the new table?
LikeLike
No I Merge from the write table into the read table in batches. I use this for massive data imports. https://alexandrebrisebois.wordpress.com/2013/01/16/optimizing-merge-performance-in-windows-azure-sql-database/
LikeLike
This post explains how I do it https://alexandrebrisebois.wordpress.com/2013/02/18/ingesting-massive-amounts-of-relational-data-with-windows-azure-sql-database-70-million-recordsday/
LikeLike
Hi,
I have a query over the clustered and non-clustered index usage.Currently I am facing the performance issue on the azure sql database.
Basic Select query taking more than 10 minutes to retrieve the data.
select * from table_name
Total Number of Rows : More than 10 Lakhs
One clustered and 8 non-clustered indexes are available and there is no relationship on the table as well but still when I query this table with the basic select it is taking too much of time.
Shall I make the WITH ONLINE=ON option for all non-clustered and clustered index.
May I know the actual problem and what exactly to be done over here to over come the performance issue?
Note : I could see the 1001 lock escalation on the table and more number of scans on the table.
LikeLike
Hi,
I have a query over the clustered and non-clustered index usage.Currently I am facing the performance issue on the azure sql database.
Basic Select query taking more than 10 minutes to retrieve the data.
select * from table_name
Total Number of Rows : More than 10 Lakhs
One clustered and 8 non-clustered indexes are available and there is no relationship on the table as well but still when I query this table with the basic select it is taking too much of time.
Shall I make the WITH ONLINE=ON option for all non-clustered and clustered index.Will it have an any impact of the production database.Currently we were utilizing the views to get the report from Power BI tool.Due to this it could happen?
May I know the actual problem and what exactly to be done over here to over come the performance issue? Kindly provide your valuable comments on the same.
Note : I could see the 1001 lock escalation on primary column in the table and it undergone more number of scans.
LikeLike
Hi, a select * does not use the indexes. Have you tried adding a where clause?
LikeLike
Hi,
Thanks for your reply.
Yes i have added where clause too even though it is going for table scan and not calling the individual non-clustered index.
If we mention the index column in the where clause it would call the non-clustered index right? but that couldn’t happen here.
Correct me if i am wrong.
But select itself is taking long time.approximately 10 minutes.What could be the problem?Based on your above post i could understand that we could create ONLINE=ON option for indexes would ignore from the locking and timeout.Is it advisable?
We were using Power BI report tool which could cause any problem?Since couldn’t find solution for performance issue.This table does not have foreign key relationship too. It is independent table. However table would receive an insert on hourly basis.
Also you were mention in your another blog stating that “Azure Doesn’t Support Heap which is created from NON-CLUSTERED INDEX. Might be My Non clustered index would have impact on the table.
Kindly suggest how to check and resolve the same?
LikeLike
I was able to eradicate the timeout issue by using ONLINE = ON. Thank you for the great tip!
LikeLiked by 1 person