Creating NONCLUSTERED INDEXES on Massive Tables in Azure SQL Database

September 29, 2013 — 19 Comments

200163425-001.tif There are times on Azure SQL Database when tables get to a certain size and that trying to create indexes results in timeouts.

A few months ago when I started to get these famous timeouts, I had reached 10 million records and I felt like I was looking for a needle in a hay stack!

This blog post is all about creating NONCLUSTERED INDEXES, I will try to go over best practices and reasons to keep in mind when you use them in Azure SQL Database.

Interesting Facts

If a CLUSTERED INDEX is present on the table, then NONCLUSTERED INDEXES will use its key instead of the table ROW ID.
To reduce the size consumed by the NONCLUSTERED INDEXES it’s imperative that the CLUSTERED INDEX KEY is kept as narrow as possible.
Physical reorganization of the CLUSTERED INDEX does not physically reorder NONCLUSTERED INDEXES.
SQL Database can JOIN and INTERSECT INDEXES in order to satisfy a query without having to read data directly from the table.
Favor many narrow NONCLUSTERED INDEXES that can be combined or used independently over wide INDEXES that can be hard to maintain.
Create Filtered INDEXES to create highly selective sets of keys for columns that may not have a good selectivity otherwise.
Use Covering INDEXEs to reduce the number of bookmark lookups required to gather data that is not present in the other INDEXES.
Covering INDEXES can be used to physically maintain the data in the same order as is required by the queries’ result sets reducing the need for SORT operations.
Covering INDEXES have an increased maintenance cost, therefore you must see if performance gain justifies the extra maintenance cost.
NONCLUSTERED INDEXES can reduce blocking by having SQL Database read from NONCLUSTERED INDEX data pages instead of the actual tables.

Creating a NONCLUSTERED INDEX

In order to look at a few examples lets start with the following table.

CREATE TABLE [dbo].[TestTable] (
    [Id]        UNIQUEIDENTIFIER DEFAULT (newid()) NOT NULL,
    [FirstName] NVARCHAR (10)    NOT NULL,
    [LastName]  NVARCHAR (10)    NOT NULL,
    [Type]      INT              DEFAULT ((0)) NOT NULL,
    [City]      NVARCHAR (10)    NOT NULL,
    [Country]   NVARCHAR (10)    NOT NULL,
    [Created]   DATETIME2 (7)    DEFAULT (getdate()) NOT NULL,
    [Timestamp] ROWVERSION       NOT NULL,
    PRIMARY KEY NONCLUSTERED ([Id] ASC)
);

GO
CREATE CLUSTERED INDEX [IX_TestTable]
    ON [dbo].[TestTable]([Created] ASC);

The table has a CLUSTERED INDEX on the Created column. The reasons why this might be an interesting choice for the CLUSTERED INDEX were discussed in my previous blog post about building Clustered Indexes on non-primary key columns in Azure SQL Database.

For the sake of this example, lets imagine that this table contains roughly 30 millions records. Creating a NONCLUSTERED INDEX on this table might result in a timeout or an aborted transaction due to restrictions imposed by Azure SQL Database. One of these restrictions, is that a transaction log cannot exceed 1GB in size.

On Azure SQL Database, you must use the ONLINE=ON option in order to reduce locking and the transaction log size. Furthermore, this option will greatly reduce your chances of getting a timeout.

CREATE INDEX [IX_TestTable_Country_City_LastName_FirstName] 
    ON [dbo].[TestTable] 
    ([Country] ASC,[City] ASC,[LastName] ASC,[FirstName] ASC)
    WITH(ONLINE=ON);

Creating a Filtered NONCLUSTERED INDEX

A where clause is added to the index. This results in a Filtered index and greatly helps to create smaller and more effective INDEXES. I recommend taking a closer look at this type of index because bytes are precious on Azure SQL Database.

CREATE INDEX [IX_TestTable_Type_LastName_FirstName_where_Type_greater_than_1] 
    ON [dbo].[TestTable] 
    ([Type] ASC,[LastName] ASC,[FirstName] ASC)
    WHERE [Type] > 1
    WITH(ONLINE=ON);

Creating a Covering NONCLUSTERED INDEX

The include clause can be used to include columns that are selected by the queries but that are not filtered upon through where clauses. The include clause can also be used to include columns that cannot be added to the INDEX key.

CREATE INDEX [IX_TestTable_Covering_Index] 
    ON [dbo].[TestTable] 
    ([Type] ASC,[LastName] ASC,[FirstName] ASC)
    INCLUDE([City], [Country], [Id])
    WITH(ONLINE=ON);

19 responses to Creating NONCLUSTERED INDEXES on Massive Tables in Azure SQL Database

Karthik [KKB] November 10, 2013 at 8:56 PM

Very nice blog. Impressive.

LikeLike

Reply
Saket May 2, 2014 at 6:21 AM

Can i use ‘Online=ON’ option in entity framework migrations?

LikeLike

Reply
- Alexandre Brisebois May 2, 2014 at 10:46 AM
  
  I have never done it. I personally use Microsoft SQL Server Data Tools and teh SQL Database Project to manage that complexity. https://alexandrebrisebois.wordpress.com/2013/07/20/take-control-windows-azure-sql-database-application-life-cycle-management-made-easy/
  
  LikeLike
  
  Reply
Chris June 5, 2014 at 11:36 AM

One thing to note, I believe that you have to have a clustered index for tables in SQL Azure.

LikeLike

Reply
- Alexandre Brisebois June 5, 2014 at 11:53 AM
  
  You are correct. I wrote post about creating Clustered Indexes that can be updated on Azure SQL Database.
  
  LikeLike
  
  Reply
Chris June 5, 2014 at 11:40 AM

If you have a table with 10’s of millions of rows and you can prevent it from blowing up when creating an index, my question is should you?

Will building the index take a long time? Will it slow down insertions or deletions?

LikeLike

Reply
- Alexandre Brisebois June 5, 2014 at 11:55 AM
  
  Agreed on both accounts. It all depends on your consumption needs. Some times I have a write table that I merge into a heavily indexed denormalized read table.
  
  LikeLike
  
  Reply
  - Chris June 5, 2014 at 12:03 PM
    
    Do you copy everything into a new table, insert into the new table, drop the old table and rename the new table?
    
    LikeLike
    
    Reply
    - Alexandre Brisebois June 5, 2014 at 12:05 PM
      
      No I Merge from the write table into the read table in batches. I use this for massive data imports. https://alexandrebrisebois.wordpress.com/2013/01/16/optimizing-merge-performance-in-windows-azure-sql-database/
      
      LikeLike
      
      Reply
    - Alexandre Brisebois June 5, 2014 at 12:07 PM
      
      This post explains how I do it https://alexandrebrisebois.wordpress.com/2013/02/18/ingesting-massive-amounts-of-relational-data-with-windows-azure-sql-database-70-million-recordsday/
      
      LikeLike
      
      Reply
      - baskarv July 1, 2016 at 11:06 AM
        
        Hi,
        
        I have a query over the clustered and non-clustered index usage.Currently I am facing the performance issue on the azure sql database.
        
        Basic Select query taking more than 10 minutes to retrieve the data.
        
        select * from table_name
        
        Total Number of Rows : More than 10 Lakhs
        
        One clustered and 8 non-clustered indexes are available and there is no relationship on the table as well but still when I query this table with the basic select it is taking too much of time.
        
        Shall I make the WITH ONLINE=ON option for all non-clustered and clustered index.
        
        May I know the actual problem and what exactly to be done over here to over come the performance issue?
        
        Note : I could see the 1001 lock escalation on the table and more number of scans on the table.
        
        LikeLike
        
        Reply
      - baskarv July 4, 2016 at 8:45 AM
        
        Hi,
        
        I have a query over the clustered and non-clustered index usage.Currently I am facing the performance issue on the azure sql database.
        
        Basic Select query taking more than 10 minutes to retrieve the data.
        
        select * from table_name
        
        Total Number of Rows : More than 10 Lakhs
        
        One clustered and 8 non-clustered indexes are available and there is no relationship on the table as well but still when I query this table with the basic select it is taking too much of time.
        
        Shall I make the WITH ONLINE=ON option for all non-clustered and clustered index.Will it have an any impact of the production database.Currently we were utilizing the views to get the report from Power BI tool.Due to this it could happen?
        
        May I know the actual problem and what exactly to be done over here to over come the performance issue? Kindly provide your valuable comments on the same.
        
        Note : I could see the 1001 lock escalation on primary column in the table and it undergone more number of scans.
        
        LikeLike
        
        Reply
        
        Alexandre Brisebois July 4, 2016 at 8:50 AM
        
        Hi, a select * does not use the indexes. Have you tried adding a where clause?
        
        LikeLike
        
        Reply
        
        baskarv July 4, 2016 at 9:56 AM
        
        Hi,
        
        Thanks for your reply.
        
        Yes i have added where clause too even though it is going for table scan and not calling the individual non-clustered index.
        
        If we mention the index column in the where clause it would call the non-clustered index right? but that couldn’t happen here.
        Correct me if i am wrong.
        
        But select itself is taking long time.approximately 10 minutes.What could be the problem?Based on your above post i could understand that we could create ONLINE=ON option for indexes would ignore from the locking and timeout.Is it advisable?
        
        We were using Power BI report tool which could cause any problem?Since couldn’t find solution for performance issue.This table does not have foreign key relationship too. It is independent table. However table would receive an insert on hourly basis.
        
        Also you were mention in your another blog stating that “Azure Doesn’t Support Heap which is created from NON-CLUSTERED INDEX. Might be My Non clustered index would have impact on the table.
        
        Kindly suggest how to check and resolve the same?
        
        LikeLike
        
        Reply
Krishna October 1, 2014 at 8:16 PM

I was able to eradicate the timeout issue by using ONLINE = ON. Thank you for the great tip!

LikeLiked by 1 person

Reply

Trackbacks and Pingbacks:

Reading Notes 2013-10-14 | Matricis - October 15, 2013
[…] Creating NONCLUSTERED INDEXES on Massive Tables in Windows Azure SQL Database – Nice post that demystify the indexes and explains why we should use them. […]

LikeLike
Steps Towards Optimized Windows Azure SQL Databases | Alexandre Brisebois - November 30, 2013
[…] NonClustered Indexes should be created for foreign key columns and for any column used in a where, a…. I personally start by creating them for foreign keys. Then I observe how my application extracts data from the database schema and create indexes based on these observations. […]

LikeLike
The Top 10 Most-Read #WindowsAzure Posts of 2013 | Alexandre Brisebois - December 29, 2013
[…] Creating NONCLUSTERED INDEXES on Massive Tables in Windows Azure SQL Database – 1,355 reads […]

LikeLike
Indexes are Crucial on Windows Azure SQL Database! | Alexandre Brisebois - January 26, 2014
[…] Create Non-Clustered Indexes on your Primary Keys […]

LikeLike

Creating NONCLUSTERED INDEXES on Massive Tables in Azure SQL Database

Interesting Facts

Creating a NONCLUSTERED INDEX

Creating a Filtered NONCLUSTERED INDEX

Creating a Covering NONCLUSTERED INDEX

19 responses to Creating NONCLUSTERED INDEXES on Massive Tables in Azure SQL Database

Trackbacks and Pingbacks:

Leave a comment Cancel reply

Follow this Blog

Twitter Updates

Disclaimer

Upcoming Events

Tags

Statistics

Creating NONCLUSTERED INDEXES on Massive Tables in Azure SQL Database

Interesting Facts

Creating a NONCLUSTERED INDEX

Creating a Filtered NONCLUSTERED INDEX

Creating a Covering NONCLUSTERED INDEX

Share this:

Related

19 responses to Creating NONCLUSTERED INDEXES on Massive Tables in Azure SQL Database

Trackbacks and Pingbacks:

Leave a comment Cancel reply

Follow this Blog

Twitter Updates

Disclaimer

Upcoming Events

Tags

Statistics