Optimizing MERGE Performance in Azure SQL Database

January 16, 2013 — Leave a comment

When merging large datasets in Azure SQL Database its imperative to optimize our queries. Failure to do so will most likely result in 40552 : The session has been terminated because of excessive transaction log space usage.

Optimizing a MERGE statement can be achieved through various individual optimizations which depend on the shape and size of the datasets that need to be merged. In my case the following optimizations did the trick.

  • Create an index on the join columns in the source table that is unique and covering.
  • Create a unique clustered index on the join columns in the target table.
  • Parameterize all literal values in the ON clause and in the the WHEN clauses.
  • Merge subsets of data from the source to the target table by using OFFSET and ROWS FETCH NEXT or by defining views on the source or target that return the filtered rows and reference the view as the source or target table. Furthermore the use of the WITH <common table expression> clause of the TOP clause to filter out rows from the source or target tables is not recommended because they can generate incorrect results.
DECLARE @offset as bigint
DECLARE @offsetIndex as int = 0
DELCARE @batchSize as int = 10000

SELECT @offset = @offsetIndex *  @batchSize

MERGE Customers AS Target
USING (SELECT [Name], [LastName], [Email] from CustomerInsertTable
       ORDER BY [Email]
       OFFSET  @offset ROWS 
       FETCH NEXT @batchSize ROWS ONLY 
) AS Source
ON (Target.[Email] = Source.[Email])
WHEN MATCHED THEN
    UPDATE SET Target.[Name] = Source.[Name]
             , Target.[LastName] = Source.[LastName]
WHEN NOT MATCHED BY TARGET THEN
    INSERT ([Name], [LastName], [Email])
    VALUES (
     Source.[Name] 
     , Source.[LastName]
     , Source.[Email]
    );
  • When the source is smaller than the target table, a nested loops operator is preferable.
MERGE Customers AS Target
USING (SELECT [Name],[LastName],[Email] from CustomerInsertTable
) AS Source
ON (Target.[Email] = Source.[Email])
WHEN MATCHED THEN
    UPDATE SET Target.[Name] = Source.[Name]
             , Target.[LastName] = Source.[LastName]
WHEN NOT MATCHED BY TARGET THEN
    INSERT ([Name], [LastName], [Email])
    VALUES (
     Source.[Name] 
     , Source.[LastName]
     , Source.[Email]
    )
    OPTION (LOOP JOIN);
  • When the source and target are of similar size and the index are applied to the source and target tables, a merge join operator is the most efficient.
MERGE Customers AS Target
USING (SELECT [Name],[LastName],[Email] from CustomerInsertTable
) AS Source
ON (Target.[Email] = Source.[Email])
WHEN MATCHED THEN
    UPDATE SET Target.[Name] = Source.[Name]
             , Target.[LastName] = Source.[LastName]
WHEN NOT MATCHED BY TARGET THEN
    INSERT ([Name], [LastName], [Email])
    VALUES (
     Source.[Name] 
     , Source.[LastName]
     , Source.[Email]
    )
    OPTION (MERGE JOIN);

After putting all these optimizations in place a MERGE query on a 1M record dataset took 2 minutes of execution time instead of 20 minutes. To further optimize the query try different batch sizes.

Possible optimizations for MERGE queries are not limited to the optimizations mentioned in this blog post. If you find other interesting optimizations feel free to mention them in the comments below.

More information

No Comments

Be the first to start the conversation!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s