Have you ever been asked to modify large amounts of important data? Have you ever made a mistake that requires an embarrassing amount of man hours to fix?
Well I have! Especially with the Windows Azure Blob Storage Service, where editing blobs couldn’t be easier.
Windows Azure SQL Database has a transaction mechanism that allows us to rollback when something goes wrong. On the other hand, Windows Azure Blob Storage Service does not provide you with transactions. When you overwrite a blob, the previous version is gone… But don’t worry, there are a couple of ways that can help you go about securing a backup.
Before attempting any dangerous data manipulations on the content of your blobs:
- Make a copy on your local machine (This works great when you don’t have much data)
- Copy them to a different Windows Azure Blob Storage Service container (This is a bit drastic, but it works)
- Create a Snapshot for each blob (This is your best bet)
- The remainder of this post will focus on exploiting Blob Snapshots in order to provide a means by which we can rollback if something do horribly wrong with our batch processes.
Before anything happens to our data the first things we absolutely need, is a guarantee that we have a backup just in case something goes horribly wrong.
To satisfy this requirement I created a tool that is available on GitHub. Feel free to grab the code and use this tool.
The following code is an example of how I create a backup by creating Blob Snapshots for all the Blobs in the container.
public void CreateBackup(string backupName, Action<int> reportCompleted, Action<StorageAccount> onComplete) { Task.Run(() => { var dateString = DateTime.UtcNow.ToString(CultureInfo.InvariantCulture); var dictionary = new Dictionary<string, string> { {"Name", backupName}, {"DateTime", dateString} }; var list = selectedContainer.ListBlobs(useFlatBlobListing: true).ToList(); var count = list.Count; for (var index = 0; index < count; index++) { var backupBlob = list[index]; var blockBlob = backupBlob as CloudBlockBlob; if (blockBlob == null) continue; blockBlob.CreateSnapshot(dictionary); reportCompleted(Convert.ToInt32(((1d + index)/count)*100)); } onComplete(this); }); }
The following code is an example of how I find backups stored in the container
private void FindBackups(Action<StorageAccount> onCompleted) { Task.Run(() => { var details = BlobListingDetails.Snapshots | BlobListingDetails.Metadata; var list = selectedContainer.ListBlobs(useFlatBlobListing: true, blobListingDetails: details) .OfType<CloudBlockBlob>() .Where(b => b.SnapshotTime.HasValue) .ToList(); Snapshots.Clear(); Snapshots.AddRange(list); IdentifyBackups(); }).ContinueWith(task => { if (!task.IsCompleted) return; onCompleted(this); }); } private void IdentifyBackups() { Backups.Clear(); Backups.AddRange( Snapshots.Select(b => b.Metadata["Name"]) .Distinct()); }
The following code is an example of how I use Blob Snapshots to restore all the Blobs in the container
public void RestoreBackup(string backup, Action<int> reportCompleted) { Task.Run(() => { var snapshotsToRestore = snapshots.Where(s => s.Metadata["Name"] == backup) .ToList(); var count = snapshotsToRestore.Count; for (var index = 0; index < count; index++) { var backupBlob = snapshotsToRestore[index]; var blob = selectedContainer.GetBlockBlobReference(backupBlob.Name); blob.StartCopyFromBlob(backupBlob); reportCompleted(Convert.ToInt32(((1d + index) / count) * 100)); } }); }
The following code is an example of how I delete Blob Snapshots associated with a backup from the container
public void DeleteBackup(string backup, Action<int> reportCompleted, Action<StorageAccount> onCompleted) { Task.Run(() => { var snapshotsToDelete = snapshots.Where(s => s.Metadata["Name"] == backup) .ToList(); var count = snapshotsToDelete.Count; for (var index = 0; index < count; index++) { snapshotsToDelete[index].DeleteIfExists(); reportCompleted(Convert.ToInt32(((1d + index) / count) * 100)); } FindBackups(onCompleted); }); }
Using the Tool
Enter the storage account connection string in the top textbox, then press “Load”. The first list box will list the storage account’s containers. Select a container, the application will search for backups and l list them in the second list box from the left.
Create a backup by entering a name for your backup in the text box labeled “Name”, then click on “Create Backup”.
Restoring a backup is achieved by selecting a backup from the list box, then click on “Restore”.
Deleting a backup is achieved by selecting a backup from the list box, then click on “Delete”.
Lessons Learned
The replication of data in Windows Azure Storage will not protect against application errors since these are problems at the application layer which will get committed on the replicas that Windows Azure Storage maintains. Consequently, its imperative that we manage backups ourselves.
Depending on the amount of blobs per container, creating a backup or restoring from a backup can take a considerable amount of time. I strongly recommend making time for backups, they will save you a lot time if something goes horribly wrong. You’ll also be able to sleep at night =)