Building Clustered Indexes on Non-Primary Key Columns in Azure SQL Database

September 29, 2013 — 6 Comments

space-star-cluster-1920x1200

Azure SQL Database doesn’t support Heap Tables. Thus creating tables in Azure SQL Database requires Clustered Indexes. By default Azure SQL Database creates the Clustered Index on the table’s Primary Key.

CREATE TABLE [dbo].[TestTable] (
    [Id]        UNIQUEIDENTIFIER NOT NULL,
    [FirstName] NVARCHAR (10)    NOT NULL,
    [LastName]  NVARCHAR (10)    NOT NULL,
    [Type]      INT              NOT NULL,
    [Timestamp] ROWVERSION       NOT NULL,
    PRIMARY KEY (Id));

Primary Keys often make good Clustered Indexes because data is regularly accessed through them. But there are times when the Primary Key just doesn’t cut it. Fortunately, you can define a Clustered Index on other columns.

CREATE TABLE [dbo].[TestTable] (
    [Id]        UNIQUEIDENTIFIER NOT NULL,
    [FirstName] NVARCHAR (10)    NOT NULL,
    [LastName]  NVARCHAR (10)    NOT NULL,
    [Type]      INT              NOT NULL,
    [Timestamp] ROWVERSION       NOT NULL,
    PRIMARY KEY NONCLUSTERED (Id));
CREATE CLUSTERED INDEX IX_TestTable on TestTable (FirstName, LastName);

Changing a clustered index in Azure SQL Database is possible but it comes with some constraints. Failing to satisfy the existing  constraints enforce by the index will result in “The new index definition does not match the constraint being enforced by the existing index”.

CREATE CLUSTERED INDEX IX_TestTable
ON TestTable ([Type],[FirstName],[LastName])
WITH (DROP_EXISTING=ON)
ON [Primary]

Things to keep in mind when creating Indexes

  • Focus on building narrow clustered indexes by choosing columns and their data types wisely. For the best performance, keep the overall byte size of the index as small as possible.
  • Non Clustered Indexes use the clustered index key as a row key
  • Columns who get updated frequently shouldn’t be part of the Index.
  • The physical table data is sorted based on the clustered index columns.
  • Support the most commonly used access path to the data because there can only be one Clustered Index per table.

Tips and Tricks for Azure SQL Database

First of all, don’t forget about index maintenance on Azure SQL Database! Then think long and hard about your Clustered Indexes. Many people suggest that GUIDs (uniqueidentifiers) are bad candidates for clustering keys given they will not be ordered and cause page splits, causing higher latencies and fragmentation? This is not true Azure SQL Database! Well at least not to the degree that you experience in an on-premise SQL Server. Azure SQL Database dbs give you 3 replicas and that means the characteristics of writes are very different compared to a single SQL DB without High Availability (HA).

In Azure SQL Database the writes have to be confirmed by 2 out of the 3 copies thus are always network level writes… A network write today is much slower in latency compared to what a page split can cause. But you shouldn’t forget that there are impacts of doing more work with GUIDs (uniqueidentifiers) [More…] and that they probably don’t make the best Clustered Index keys either!

In a blog post “Uniqueidentifier and Clustered Indexes” the author points out that the easiest and best approach to creating a Clustered Index is to use a datetime2 column. This insures that the data for the new row is inserted at the end of the table data. Consequently, there is no need for rearranging of the cluster. Adding the data to the end ensures the best performance for inserts. So to sum things up, if you’re using Guids, as I often do for so many reasons, your Primary Key probably shouldn’t be your Clustered Index.

datetime2 has a resolution of 1/1000000 of a second

Find Out More About Indexes

6 responses to Building Clustered Indexes on Non-Primary Key Columns in Azure SQL Database

  1. 

    In this piece of code you have mentioned:
    CREATE CLUSTERED INDEX IX_TestTable
    ON TestTable ([Type],[FirstName],[LastName])
    WITH (DROP_EXISTING=ON)
    ON [Primary]

    You have used “On [PRIMARY]”, but I believe this is not supported in SQL Azure. It’s not working for me. Is this not the case for you?

    Like

Trackbacks and Pingbacks:

  1. Steps Towards Optimized Windows Azure SQL Databases | Alexandre Brisebois - November 30, 2013

    […] the primary key. On Windows Azure SQL Databases there are times when it can be more advantageous to define Clustered Indexes on non-Primary Key columns. For example, you may want to store your data in a particular order so that you don’t need to […]

    Like

  2. GUIDs VS. IDs on Windows Azure SQL Database | Alexandre Brisebois - January 13, 2014

    […] if you are using GUIDs as primary keys, I strongly recommend creating clustered indexes on non-primary key columns. This will allow you to further optimize data […]

    Like

  3. Indexes are Crucial on Windows Azure SQL Database! | Alexandre Brisebois - January 26, 2014

    […] Create Clustered Indexes on the tables’ Natural Keys. Natural Keys, are the fields that best identify the row’s data. For example, the Primary Key used for a table might be a Unique Identifier or a Big Int, but I might access data through a combination of columns. For a car these columns could be Year, Make, Model and VIN. When you create the Clustered Index, order the columns by their selectivity and use this same order when you query for the information. […]

    Like

Leave a reply to Sameer Abdul Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.