I recently tried to use Windows Azure SQL Data Sync to replicate the data from my production database to a staging SQL Database (located in the same datacenter as the production SQL Database) instance and also to an
on-premise SQL Server Database instance.
Every 24 hours, the application inserts 10 to 40 million rows and removes all stale data from the tables. As you can imagine, this generates quite a large change-set for Windows Azure SQL Data Sync. In summary, the volume of data is quite important and it is short-lived.
The resulting Windows Azure SQL Data Sync log entries revealed that the daily data synchronization could take up to 35 hours to complete (24 hours worth of changes).
Windows Azure to On-Premise Data Sync
Sync completed successfully in 128931.16 seconds. (35.81 hours)
Upload: 0 changes applied
Download: 22 248 369 changes applied
For more information, provide tracing ID ‘xxxx-xxxx-xxxx-xxxx-xxxx’ to customer support.Sync completed successfully in 84766.87 seconds. (23.54 hours)
Upload: 0 changes applied
Download: 19 396 868 changes applied
For more information, provide tracing ID ‘xxxx-xxxx-xxxx-xxxx-xxxx’ to customer support.
Within Windows Azure (Same Datacenter) Data Sync
Sync completed successfully in 37076.88 seconds. (10.29 hours)
Upload: 0 changes applied
Download: 5 569 615 changes applied
For more information, provide tracing ID ‘xxxx-xxxx-xxxx-xxxx-xxxx’ to customer support.
Windows Azure SQL Data Sync stores its tracking data within your SQL Database, this means that if you start off with a 10 GB database, chances are that it will rapidly grow to 20 GB. This can be quite problematic since the size of
SQL Database instances does not automatically grow when it reaches it max size.
Furthermore, doubling the size of your SQL Database also means that your operational costs will increase.
The following is a screenshot depicting the amount of space that Windows Azure SQL Data Sync used to track my changes. As you can observe, the tracking table is almost as big as the actual data. The color codes in the screenshot below show related tables.
Since the data changes drastically from day to day, the database grew and the sync times didn’t improve.
This caused issues on the on-premise SQL Server because the CPU was entirely used for the Data Sync operations. Other applications that relied on this server were rendered useless…
Lessons Learned
- When you estimate the size and costs for Windows Azure SQL Database which uses Windows Azure SQL Data Sync, it’s important to factor in the size of the tacking data.
- It’s imperative that you monitor the size of the database. Data can grow rapidly and can eventually cause your database to lockup because you have reached the database max size.
- Windows Azure SQL Data Sync is an amazing offering for database tables that deal with smaller volumes of changes per sync cycle. For example Windows Azure SQL Data Sync is great for synchronizing data like daily purchase orders, daily logs, daily events and much more.
- Keep a close eye on the Windows Azure SQL Data Sync logs, they will reveal on going issues. The earlier you catch them, the less they will cost you to fix.
Conclusion
It was well worth the try, but Windows Azure SQL Data Sync was not the solution for our requirements. I reverted to having the full system run in staging and on-premise as well as in production.
Creating a database for staging can be done in the cloud by creating a copy of the production database. Creating a baseline database for the on-premise SQL Server database can be done by exporting the production database and restoring it on-premise.
Don’t get me wrong though, Windows Azure SQL Data Sync is an amazing service with huge benefits. You really need to try it out for yourself to see if it meets your requirements.
Tables with data that doesn’t change as much as mine, are great candidates and should not be overlooked. Using Windows Azure SQL Data Sync creates many opportunities for your organization.
How do you plan to use Windows Azure SQL Data Sync in your organization?
Related Posts
- What Does Your Cloud Disaster Recovery Plan Look Like?
- Important Lessons Usually Don’t Come Easy… Replication is Important!
- Windows Azure SQL Data Sync – Optimizing The Initial Sync Process For Large Data Sets
- Windows Azure SQL Data Sync – The Maximum Limit For Index or Statistics Key Column List is 16
Alexandre,
is there something one can do to clean up the tracking data from time to time?
Regards,
Thomas
LikeLike
Hi Thomas,
I haven’t done any work around this, but my guess is that removing historical data might trigger conflicts.
LikeLike
OK,
so the ‘useless’ logs accumulate and finally congest the server…
LikeLike
Let me look into this and find out what more about this process.
LikeLike
Hi Thomas,
So I looked into it, and the service has built-in maintenance around this. As observed in the screenshot above, the amount of data used for tracking changes will reflect the amount of data stored in the tables that you are syncing via the SQL Data Sync.
It will not grow uncontrollably. Therefore, you must plan for the extra data, but you probably wont get any nasty surprises because of logs and tracking data.
LikeLike
Thanks Alexandre,
that is good news.
LikeLike