Windows Azure Table Storage Service – Migrating Tables Between Storage Accounts

June 20, 2013 — 6 Comments

2013-06-03_18h15_18Building on top of the code from my post “Windows Azure Blob Storage Service – Migrating Blobs Between Accounts” I added logic so that the Windows Azure Storage Account migration process recreate all the Tables from the source account in the target account. Then It downloads the entities from the source Tables using segmented table queries and inserts(or replaces) them into the target Tables.

The process is surprisingly fast compared to the migration of blob containers. When we copy a blob from one container to another, a command is queued and it can take some time to complete. Migrating Tables on the other hand, requires significantly more bandwidth, because we need to download the data from the source Tables and upload it into the target Tables located in the target Windows Azure Storage Account.

Entities are downloaded 1000 at a time. Then they are fed into my TableStorageWriter, which regroups the entities by Partition Key and inserts them in batches of 100.

By reading the Tables top down, we greatly benefit from the fact that entities are sorted by their Partition Keys and then by their Row Keys, because it helps to produce complete insert batches. When you try to insert many entities that aren’t part of the same Partition Key, the TableStorageWriter is forced to execute one batch per distinct Partition Key.

The code below requires the following configurations:

<appSettings>
  <add key="source" value="SOURCE ACCOUNT"/>
  <add key="target" value="TARGET ACCOUNT"/>

  <add key="MigrateBlobs" value="false"/>
  <add key="MigrateTables" value="true"/>

  <add key="TablesToCreateButNotMigrate" value="TableName1|TableName2" />
</appSettings>

These configurations have helped me throughout my migration process. I usually start by migrating blobs, then I migrate tables. Sometimes, you may need to execute the blob migration more than once in order to catch all the blobs.

  • Set the source key to hold the source storage account connection string.
  • Set the target key to hold the target storage account connection string
  • Set the migrate blobs key to true if you want to migrate the blob containers from your source storage account
  • Set the migrate tables key to true if you want to migrate the tables from your source storage account

    If you need to omit tables like performance counters, you can add the names to the value of the tables to create but not migrate key. Use a ‘|’ character to delimit the table names.

In order to read the code specific to migrating tables, start at the MigrateTableStorage method.

To Execute a migration, create an instance of StorageAccountMigrator in a console project and call its Start method.

public class StorageAccountMigrator
{
    private readonly CloudStorageAccount sourceAccount;
    private readonly CloudStorageAccount targetAccount;

    public StorageAccountMigrator()
    {
        var sourceCs = CloudConfigurationManager.GetSetting("source");
        sourceAccount = CloudStorageAccount.Parse(sourceCs);

        var targetCs = CloudConfigurationManager.GetSetting("target");
        targetAccount = CloudStorageAccount.Parse(targetCs);
    }

    public async Task<string> Start()
    {
        return await Task.Run(() => ExecuteMigration());
    }

    private string ExecuteMigration()
    {
        var migrateBlobs = CloudConfigurationManager
                                .GetSetting("MigrateBlobs") == "true";

        var migrateTables = CloudConfigurationManager
                                .GetSetting("MigrateTables") == "true";
        var tasks = new[]
                {
                    migrateBlobs 
                        ? MigrateBlobContainers() 
                        : Task.Run(() => { }),
                    migrateTables 
                        ? MigrateTableStorage() 
                        : Task.Run(() => { }),
                };

        Task.WaitAll(tasks);
        return "done";
    }

    private Task MigrateTableStorage()
    {
        return Task.Run(() =>
        {
            CopyTableStorageFromSource();
            return "done";
        });
    }

    private void CopyTableStorageFromSource()
    {
        var source = sourceAccount.CreateCloudTableClient();

        var cloudTables = source.ListTables()
            .OrderBy(c => c.Name)
            .ToList();

        foreach (var table in cloudTables)
            CopyTables(table);
    }

    private void CopyTables(CloudTable table)
    {
        var target = targetAccount.CreateCloudTableClient();

        var targetTable = target.GetTableReference(table.Name);

        targetTable.CreateIfNotExists();

        targetTable.SetPermissions(table.GetPermissions());

        Console.WriteLine("Created Table Storage :" + table.Name);

        var omit = CloudConfigurationManager
            .GetSetting("TablesToCreateButNotMigrate")
            .Split(new[] { "|" }, StringSplitOptions.RemoveEmptyEntries);

        if (!omit.Contains(table.Name))
            CopyData(table);
    }

    readonly List<ICancellableAsyncResult> queries
        = new List<ICancellableAsyncResult>();

    readonly Dictionary<string, long> retrieved
        = new Dictionary<string, long>();

    readonly TableQuery<DynamicTableEntity> query
        = new TableQuery<DynamicTableEntity>();

    private void CopyData(CloudTable table)
    {
        ExecuteQuerySegment(table, null);
    }

    private void ExecuteQuerySegment(CloudTable table,
                                        TableContinuationToken token)
    {
        var reqOptions = new TableRequestOptions();

        var ctx = new OperationContext { ClientRequestID = "StorageMigrator" };

        queries.Add(table.BeginExecuteQuerySegmented(query,
                                                        token,
                                                        reqOptions,
                                                        ctx,
                                                        HandleCompletedQuery(),
                                                        table));
    }

    private AsyncCallback HandleCompletedQuery()
    {
        return ar =>
        {
            var cloudTable = ar.AsyncState as CloudTable;
            if (cloudTable == null) return;

            var response = cloudTable
                            .EndExecuteQuerySegmented<DynamicTableEntity>(ar);
            var token = response.ContinuationToken;

            if (token != null)
                Task.Run(() => ExecuteQuerySegment(cloudTable, token));

            var retrieved = response.Count();
                    
            if(retrieved > 0)
                Task.Run(() => WriteToTarget(cloudTable, response));
                   

            var recordsRetrieved = retrieved;

            UpdateCount(cloudTable, recordsRetrieved);

            Console.WriteLine("Table " +
                                cloudTable.Name +
                                " |> Records = " +
                                recordsRetrieved +
                                " | Total Records = " +
                                this.retrieved[cloudTable.Name]);
        };
    }

    private void UpdateCount(CloudTable cloudTable, int recordsRetrieved)
    {
        if (!retrieved.ContainsKey(cloudTable.Name))
            retrieved.Add(cloudTable.Name, recordsRetrieved);
        else
            retrieved[cloudTable.Name] += recordsRetrieved;
    }

    private static void WriteToTarget(CloudTable cloudTable,
                                        IEnumerable<DynamicTableEntity> response)
    {
        var writer = new TableStorageWriter(cloudTable.Name, "target");
        foreach (var entity in response)
        {
            writer.InsertOrReplace(entity);
        }
        writer.Execute();
    }

    public Task<string> MigrateBlobContainers()
    {
        return Task.Run(() =>
        {
            CopyBlobContainersFromSource();
            return "done";
        });
    }

    private void CopyBlobContainersFromSource()
    {
        var source = sourceAccount.CreateCloudBlobClient();

        var cloudBlobContainers = source.ListContainers()
            .OrderBy(c => c.Name)
            .ToList();

        foreach (var cloudBlobContainer in cloudBlobContainers)
            CopyBlobContainer(cloudBlobContainer);
    }

    private void CopyBlobContainer(CloudBlobContainer sourceContainer)
    {
        var targetContainer = MakeContainer(sourceContainer);

        var targetBlobs = targetContainer.ListBlobs(null,
                                                    true,
                                                    BlobListingDetails.All)
                                            .Select(b => (ICloudBlob)b)
                                            .ToList();

        Trace.WriteLine(sourceContainer.Name + " Created");

        Trace.WriteLine(sourceContainer.Name + " List all blobs");

        var sourceBlobs = sourceContainer
                            .ListBlobs(null,
                                        true,
                                        BlobListingDetails.All)
                            .Select(b => (ICloudBlob)b)
                            .ToList();

        var missingBlobTask = Task.Run(() =>
        {
            AddMissingBlobs(sourceContainer,
                            sourceBlobs,
                            targetBlobs,
                            targetContainer);
        });

        var updateBlobs = Task.Run(() => UpdateBlobs(sourceContainer,
                                                        sourceBlobs,
                                                        targetBlobs,
                                                        targetContainer));

        Task.WaitAll(new[] { missingBlobTask, updateBlobs });

    }

    private void UpdateBlobs(CloudBlobContainer sourceContainer,
                                IEnumerable<ICloudBlob> sourceBlobs,
                                IEnumerable<ICloudBlob> targetBlobs,
                                CloudBlobContainer targetContainer)
    {
        var updatedBlobs = sourceBlobs
            .AsParallel()
            .Select(sb =>
            {
                var tb = targetBlobs.FirstOrDefault(b => b.Name == sb.Name);
                if (tb == null)
                    return new
                    {
                        Source = sb,
                        Target = sb,
                    };

                if (tb.Properties.LastModified < sb.Properties.LastModified)
                    return new
                    {
                        Source = sb,
                        Target = tb,
                    };

                return new
                {
                    Source = sb,
                    Target = sb,
                };
            })
            .Where(b => b.Source != b.Target)
            .ToList();

        Console.WriteLine(targetContainer.Name + " |> " +
                            "Updating :" +
                            updatedBlobs.Count +
                            " blobs");

        Trace.WriteLine(sourceContainer.Name + " Start update all blobs");

        Parallel.ForEach(updatedBlobs, blob =>
        {
            TryCopyBlobToTargetContainer(blob.Source,
                                        targetContainer,
                                        sourceContainer);
        });

        Trace.WriteLine(sourceContainer.Name + " End update all blobs");
    }

    private void AddMissingBlobs(CloudBlobContainer sourceContainer,
                                    IEnumerable<ICloudBlob> sourceBlobs,
                                    IEnumerable<ICloudBlob> targetBlobs,
                                    CloudBlobContainer targetContainer)
    {
        var missingBlobs = sourceBlobs.AsParallel()
                                        .Where(b => NotExists(targetBlobs, b))
                                        .ToList();

        Console.WriteLine(targetContainer.Name +
                            " |> " +
                            "Adding missing :" +
                            missingBlobs.Count +
                            " blobs");

        Trace.WriteLine(sourceContainer.Name + " Start copy missing blobs");

        Parallel.ForEach(missingBlobs, blob =>
        {
            TryCopyBlobToTargetContainer(blob,
                                        targetContainer,
                                        sourceContainer);
        });

        Trace.WriteLine(sourceContainer.Name + " End copy missing blobs");
    }

    private static bool NotExists(IEnumerable<ICloudBlob> targetBlobs,
                                    ICloudBlob b)
    {
        return targetBlobs.All(tb => tb.Name != b.Name);
    }

    private CloudBlobContainer MakeContainer(CloudBlobContainer sourceContainer)
    {
        var target = targetAccount.CreateCloudBlobClient();
        var targetContainer = target.GetContainerReference(sourceContainer.Name);

        Trace.WriteLine(sourceContainer.Name + " Started");

        targetContainer.CreateIfNotExists();

        var blobContainerPermissions = sourceContainer.GetPermissions();

        if (blobContainerPermissions != null)
            targetContainer.SetPermissions(blobContainerPermissions);

        Trace.WriteLine(sourceContainer.Name + " Set Permissions");

        foreach (var meta in sourceContainer.Metadata)
            targetContainer.Metadata.Add(meta);

        targetContainer.SetMetadata();

        Trace.WriteLine(sourceContainer.Name + " Set Metadata");

        return targetContainer;
    }

    private void TryCopyBlobToTargetContainer(ICloudBlob item,
                                                CloudBlobContainer targetContainer,
                                                CloudBlobContainer sourceContainer)
    {
        try
        {
            var blob = (CloudBlockBlob)item;
            var blobRef = targetContainer.GetBlockBlobReference(blob.Name);

            var source = new Uri(GetShareAccessUri(blob.Name,
                                                    360,
                                                    sourceContainer));
            var result = blobRef.StartCopyFromBlob(source);
            Trace.WriteLine(blob.Properties.LastModified.ToString() +
                            " |>" +
                            blob.Name +
                            " :" +
                            result);
        }
        catch (StorageException ex)
        {
            Trace.WriteLine(ex.Message);
        }
    }

    private string GetShareAccessUri(string blobname,
                                    int validityPeriodInMinutes,
                                    CloudBlobContainer container)
    {
        var toDateTime = DateTime.Now.AddMinutes(validityPeriodInMinutes);

        var policy = new SharedAccessBlobPolicy
        {
            Permissions = SharedAccessBlobPermissions.Read,
            SharedAccessStartTime = null,
            SharedAccessExpiryTime = new DateTimeOffset(toDateTime)
        };

        var blob = container.GetBlockBlobReference(blobname);
        var sas = blob.GetSharedAccessSignature(policy);
        return blob.Uri.AbsoluteUri + sas;
    }
}

6 responses to Windows Azure Table Storage Service – Migrating Tables Between Storage Accounts

  1. 

    you reference tableStorageWriter class and point to https://alexandrebrisebois.wordpress.com/2013/05/22/windows-azure-blob-storage-service-migrating-blobs-between-accounts/
    I checked that link and don’t see TableStorageWriter anywhere

    Liked by 1 person

  2. 

    I’m running your code, but found that the tables are created, but the data are not copied to new storage. It seems something wrong in the method
    private AsyncCallback HandleCompletedQuery()
    in below code, to get continuationToken, but this token always be null, so that in follow code, you don’t do copy.
    var token = response.ContinuationToken;

    Like

    • 

      You’re right, I had forgotten to write out the response entities when the continuation tokens were null.

      You need to execute Task.Run(() => WriteToTarget(cloudTable, response)); if you have results. Splitting the if statement in two would do the trick.

      Like

Trackbacks and Pingbacks:

  1. Dew Drop – June 21, 2013 (#1,570) | Alvin Ashcraft's Morning Dew - June 21, 2013

    […] Windows Azure Table Storage Service – Migrating Tables Between Storage Accounts (Alexandre Brisebois) […]

    Like

  2. WindowsAzureRocks - June 21, 2013

    Windows Azure Table Storage Service – Migrating Tables Between Storage Accounts

    Thank you for submitting this cool story – Trackback from WindowsAzureRocks

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.