Windows Azure SQL Data Sync – Optimizing The Initial Sync Process For Large Data Sets

May 6, 2013 — 2 Comments

database-swapToday I setup SQL Data Sync to synchronize
30 million records between 3 database instances.
 
A master database (HUB) needed to be replicated to a staging database hosted in Windows Azure SQL Database and to a second instance hosted on-premises in SQL Server 2008 R2.

Needless to say that I rapidly hit a bottleneck… The initial replication took forever, so I started looking for ways to get it to complete in a timely manner.

The first alternative I came up with, only works when you’re replicating databases within the Windows Azure SQL Database Service. Using the Create Database as Copy of statement as described in "Creating a Staging Database Instance From a Production Instance on Windows Azure SQL Database". I created a copy of my Hub database and proceeded to creating a sync relationship between the Hub database and the newly created copy.

The second alternative also relies on creating a copy of the Hub database. The main difference is that the copy can be created on-premises or in a other datacenters. Using the export functionality, create a backpac of your Hub database and restore (import) it in the environment of your choice. Then create a sync relationship between both database instances.

The prescribed best practice is to start with data in only one database. If you have data in multiple databases,
SQL Data Sync treats each row as a conflict that needs resolution. This causes the initial synchronization to go very slow – taking several days to several months, depending on the database size.

Following this advice, it took close to 5 hours to complete the first synchronization and it took close to 20 hours to complete the synchronization with the on-premise databases. Originally, this is why I looked into making a copy of my database before creating the sync group. 

Creating a copy of my database took close to 4 hours and the initial synchronization took close to 10 hours. The process of synchronizing two exact copies of my database took me about 14 hours instead of the 5 hours it took me when I started off with an empty database. 

References

2 responses to Windows Azure SQL Data Sync – Optimizing The Initial Sync Process For Large Data Sets

  1. 

    It is interesting that you chose to replicate the data across all three databases prior to the initial synchronization. How did that work for you? Did it shorten the initial sync?

    According to MS documentation, you should have data in only one db prior to the initial sync. Otherwise SQL Data Sync treats each row as a data conflict that must be resolved – thus slowing down the initial sync. See http://msdn.microsoft.com/en-us/library/hh667328.aspx#InitialSync.

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.