Creating NONCLUSTERED INDEXES on Massive Tables in Azure SQL Database

September 29, 2013 — 19 Comments

200163425-001.tif There are times on Azure SQL Database when tables get to a certain size and that trying to create indexes results in timeouts.

A few months ago when I started to get these famous timeouts, I had reached 10 million records and I felt like I was looking for a needle in a hay stack!

This blog post is all about creating NONCLUSTERED INDEXES, I will try to go over best practices and reasons to keep in mind when you use them in Azure SQL Database.

Interesting Facts

  • If a CLUSTERED INDEX is present on the table, then NONCLUSTERED INDEXES will use its key instead of the table ROW ID.
  • To reduce the size consumed by the NONCLUSTERED INDEXES it’s imperative that the CLUSTERED INDEX KEY is kept as narrow as possible.
  • Physical reorganization of the CLUSTERED INDEX does not physically reorder NONCLUSTERED INDEXES.
  • SQL Database can JOIN and INTERSECT INDEXES in order to satisfy a query without having to read data directly from the table.
  • Favor many narrow NONCLUSTERED INDEXES that can be combined or used independently over wide INDEXES that can be hard to maintain.
  • Create Filtered INDEXES to create highly selective sets of keys for columns that may not have a good selectivity otherwise.
  • Use Covering INDEXEs to reduce the number of bookmark lookups required to gather data that is not present in the other INDEXES.
  • Covering INDEXES can be used to physically maintain the data in the same order as is required by the queries’ result sets reducing the need for SORT operations.
  • Covering INDEXES have an increased maintenance cost, therefore you must see if performance gain justifies the extra maintenance cost.
  • NONCLUSTERED INDEXES can reduce blocking by having SQL Database read from NONCLUSTERED INDEX data pages instead of the actual tables.

Creating a NONCLUSTERED INDEX

In order to look at a few examples lets start with the following table.

CREATE TABLE [dbo].[TestTable] (
    [Id]        UNIQUEIDENTIFIER DEFAULT (newid()) NOT NULL,
    [FirstName] NVARCHAR (10)    NOT NULL,
    [LastName]  NVARCHAR (10)    NOT NULL,
    [Type]      INT              DEFAULT ((0)) NOT NULL,
    [City]      NVARCHAR (10)    NOT NULL,
    [Country]   NVARCHAR (10)    NOT NULL,
    [Created]   DATETIME2 (7)    DEFAULT (getdate()) NOT NULL,
    [Timestamp] ROWVERSION       NOT NULL,
    PRIMARY KEY NONCLUSTERED ([Id] ASC)
);

GO
CREATE CLUSTERED INDEX [IX_TestTable]
    ON [dbo].[TestTable]([Created] ASC);

The table has a CLUSTERED INDEX on the Created column. The reasons why this might be an interesting choice for the CLUSTERED INDEX were discussed in my previous blog post about building Clustered Indexes on non-primary key columns in Azure SQL Database.

For the sake of this example, lets imagine that this table contains roughly 30 millions records. Creating a NONCLUSTERED INDEX on this table might result in a timeout or an aborted transaction due to restrictions imposed by Azure SQL Database. One of these restrictions, is that a transaction log cannot exceed 1GB in size.

On Azure SQL Database, you must use the ONLINE=ON option in order to reduce locking and the transaction log size. Furthermore, this option will greatly reduce your chances of getting a timeout.

CREATE INDEX [IX_TestTable_Country_City_LastName_FirstName] 
    ON [dbo].[TestTable] 
    ([Country] ASC,[City] ASC,[LastName] ASC,[FirstName] ASC)
    WITH(ONLINE=ON);

Creating a Filtered NONCLUSTERED INDEX

A where clause is added to the index. This results in a Filtered index and greatly helps to create smaller and more effective INDEXES. I recommend taking a closer look at this type of index because bytes are precious on Azure SQL Database.

CREATE INDEX [IX_TestTable_Type_LastName_FirstName_where_Type_greater_than_1] 
    ON [dbo].[TestTable] 
    ([Type] ASC,[LastName] ASC,[FirstName] ASC)
    WHERE [Type] > 1
    WITH(ONLINE=ON);

Creating a Covering NONCLUSTERED INDEX

The include clause can be used to include columns that are selected by the queries but that are not filtered upon through where clauses. The include clause can also be used to include columns that cannot be added to the INDEX key.

CREATE INDEX [IX_TestTable_Covering_Index] 
    ON [dbo].[TestTable] 
    ([Type] ASC,[LastName] ASC,[FirstName] ASC)
    INCLUDE([City], [Country], [Id])
    WITH(ONLINE=ON);

19 responses to Creating NONCLUSTERED INDEXES on Massive Tables in Azure SQL Database

  1. 

    Very nice blog. Impressive.

    Like

  2. 

    Can i use ‘Online=ON’ option in entity framework migrations?

    Like

  3. 

    One thing to note, I believe that you have to have a clustered index for tables in SQL Azure.

    Like

  4. 

    If you have a table with 10’s of millions of rows and you can prevent it from blowing up when creating an index, my question is should you?

    Will building the index take a long time? Will it slow down insertions or deletions?

    Like

    • 

      Agreed on both accounts. It all depends on your consumption needs. Some times I have a write table that I merge into a heavily indexed denormalized read table.

      Like

      • 

        Do you copy everything into a new table, insert into the new table, drop the old table and rename the new table?

        Like

        • 

          No I Merge from the write table into the read table in batches. I use this for massive data imports. https://alexandrebrisebois.wordpress.com/2013/01/16/optimizing-merge-performance-in-windows-azure-sql-database/

          Like

          • 

            Hi,

            I have a query over the clustered and non-clustered index usage.Currently I am facing the performance issue on the azure sql database.

            Basic Select query taking more than 10 minutes to retrieve the data.

            select * from table_name

            Total Number of Rows : More than 10 Lakhs

            One clustered and 8 non-clustered indexes are available and there is no relationship on the table as well but still when I query this table with the basic select it is taking too much of time.

            Shall I make the WITH ONLINE=ON option for all non-clustered and clustered index.

            May I know the actual problem and what exactly to be done over here to over come the performance issue?

            Note : I could see the 1001 lock escalation on the table and more number of scans on the table.

            Like

          • 

            Hi,

            I have a query over the clustered and non-clustered index usage.Currently I am facing the performance issue on the azure sql database.

            Basic Select query taking more than 10 minutes to retrieve the data.

            select * from table_name

            Total Number of Rows : More than 10 Lakhs

            One clustered and 8 non-clustered indexes are available and there is no relationship on the table as well but still when I query this table with the basic select it is taking too much of time.

            Shall I make the WITH ONLINE=ON option for all non-clustered and clustered index.Will it have an any impact of the production database.Currently we were utilizing the views to get the report from Power BI tool.Due to this it could happen?

            May I know the actual problem and what exactly to be done over here to over come the performance issue? Kindly provide your valuable comments on the same.

            Note : I could see the 1001 lock escalation on primary column in the table and it undergone more number of scans.

            Like

            • 

              Hi, a select * does not use the indexes. Have you tried adding a where clause?

              Like

              • 

                Hi,

                Thanks for your reply.

                Yes i have added where clause too even though it is going for table scan and not calling the individual non-clustered index.

                If we mention the index column in the where clause it would call the non-clustered index right? but that couldn’t happen here.
                Correct me if i am wrong.

                But select itself is taking long time.approximately 10 minutes.What could be the problem?Based on your above post i could understand that we could create ONLINE=ON option for indexes would ignore from the locking and timeout.Is it advisable?

                We were using Power BI report tool which could cause any problem?Since couldn’t find solution for performance issue.This table does not have foreign key relationship too. It is independent table. However table would receive an insert on hourly basis.

                Also you were mention in your another blog stating that “Azure Doesn’t Support Heap which is created from NON-CLUSTERED INDEX. Might be My Non clustered index would have impact on the table.

                Kindly suggest how to check and resolve the same?

                Like

  5. 

    I was able to eradicate the timeout issue by using ONLINE = ON. Thank you for the great tip!

    Liked by 1 person

Trackbacks and Pingbacks:

  1. Reading Notes 2013-10-14 | Matricis - October 15, 2013

    […] Creating NONCLUSTERED INDEXES on Massive Tables in Windows Azure SQL Database – Nice post that demystify the indexes and explains why we should use them. […]

    Like

  2. Steps Towards Optimized Windows Azure SQL Databases | Alexandre Brisebois - November 30, 2013

    […] NonClustered Indexes should be created for foreign key columns and for any column used in a where, a…. I personally start by creating them for foreign keys. Then I observe how my application extracts data from the database schema and create indexes based on these observations. […]

    Like

  3. The Top 10 Most-Read #WindowsAzure Posts of 2013 | Alexandre Brisebois - December 29, 2013

    […] Creating NONCLUSTERED INDEXES on Massive Tables in Windows Azure SQL Database – 1,355 reads […]

    Like

  4. Indexes are Crucial on Windows Azure SQL Database! | Alexandre Brisebois - January 26, 2014

    […] Create Non-Clustered Indexes on your Primary Keys […]

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.