Inserting & Modifying Large Amounts of Data in Windows Azure Table Storage Service

March 6, 2013 — 4 Comments

table-storage

Modifying data in Windows Azure Table Storage Service can be done operation by operation or it can be done in batches. The golden rules below describe the limitations and constraints imposed by the service. Even though these rules may seem restrictive, they exist in order to ensure acceptable performances. Sending too much or too little data can create bottle necks.

The Golden Rules

  1. You can perform updates, deletes, and inserts in the same single batch operation.
  2. A single batch operation can include up to 100 entities.
  3. All entities in a single batch operation must have the same partition key.
  4. While it is possible to perform a query as a batch operation, it must be the only operation in the batch.
  5. Tables don’t enforce a schema on entities, which means a single table can contain entities that have different sets of properties. An account can contain many tables, the size of which is only limited by the 100TB storage account limit.
  6. An entity is a set of properties, similar to a database row. An entity can be up to 1MB in size.
  7. A property is a name-value pair. Each entity can include up to 252 properties to store data. Each entity also has 3 system properties that specify a partition key, a row key, and a timestamp. Entities with the same partition key can be queried more quickly, and inserted/updated in atomic operations. An entity’s row key is its unique identifier within a partition.
  8. A property value may be up to 64 KB in size.
  9. By default a property is created as type String, unless you specify a different type

I created the TableStorageWriter to handle the batching process of table operations. It uses a ConcurrentQueue to store operations. Once the Execute method is called, it dequeues the operations, groups them by PartitionKey and sends them to the Windows Azure Table Storage Service  in batches of 100.

The following example starts with an empty Storage Account which does not contain any tables. When the TableStorageWriter is instantiated, checks if the target table exists. If it doesn’t exist, it creates the target table.
When the Execute method is called, table operation are sent to the Windows Azure Table Storage Service and
are removed from the internal queue so that operations are not set twice to the Windows Azure Table Storage Service.

var writer = new TableStorageWriter("birthdays");
// created the Table if not exists
            
var entity = new DynamicTableEntity("alexandre", "brisebois");
entity["city"]= new EntityProperty("montreal");
entity["born"] =  new EntityProperty(new DateTime(1900,01,01));
            
writer.Insert(entity);
//queue table operation

var newEntity = new DynamicTableEntity("alex", "brisebois");
newEntity["city"] = new EntityProperty("montreal");
newEntity["born"] = new EntityProperty(new DateTime(1900, 01, 01));
            
writer.InsertOrReplace(newEntity);
// queue table operation

writer.Execute();
// sent table operations to table storage service in batches of 100

The code from this Post is part of the Brisebois.WindowsAzure NuGet Package

To install Brisebois.WindowsAzure, run the following command in the Package Manager Console

PM> Install-Package Brisebois.WindowsAzure

Get more details about the Nuget Package.

 

TableStorageWriter

public class TableStorageWriter
{
 private const int BatchSize = 100;
 private readonly ConcurrentQueue<Tuple<ITableEntity, TableOperation>> operations;
 private readonly CloudStorageAccount storageAccount;
 private readonly string tableName;

 public TableStorageWriter(string tableName)
 {
     this.tableName = tableName;

     var cs = CloudConfigurationManager.GetSetting("StorageConnectionString");

     storageAccount = CloudStorageAccount.Parse(cs);

     var tableReference = MakeTableReference();

     tableReference.CreateIfNotExists();

     operations = new ConcurrentQueue<Tuple<ITableEntity, TableOperation>>();
 }

 private CloudTable MakeTableReference()
 {
   var tableClient = storageAccount.CreateCloudTableClient();
   var tableReference = tableClient.GetTableReference(tableName);
   return tableReference;
 }

 public decimal OutstandingOperations
 {
     get { return operations.Count; }
 }

 public void Insert<TEntity>(TEntity entity) 
     where TEntity : ITableEntity
 {
     var e = new Tuple<ITableEntity, TableOperation>
         (entity,
             TableOperation.Insert(entity));
     operations.Enqueue(e);
 }

 public void Delete<TEntity>(TEntity entity) 
     where TEntity : ITableEntity
 {
     var e = new Tuple<ITableEntity, TableOperation>
         (entity,
             TableOperation.Delete(entity));
     operations.Enqueue(e);
 }

 public void InsertOrMerge<TEntity>(TEntity entity)
     where TEntity : ITableEntity
 {
     var e = new Tuple<ITableEntity, TableOperation>
         (entity,
             TableOperation.InsertOrMerge(entity));
     operations.Enqueue(e);
 }

 public void InsertOrReplace<TEntity>(TEntity entity) 
     where TEntity : ITableEntity
 {
     var e = new Tuple<ITableEntity, TableOperation>
         (entity,
             TableOperation.InsertOrReplace(entity));
     operations.Enqueue(e);
 }

 public void Merge<TEntity>(TEntity entity)
     where TEntity : ITableEntity
 {
     var e = new Tuple<ITableEntity, TableOperation>
         (entity,
             TableOperation.Merge(entity));
     operations.Enqueue(e);
 }

 public void Replace<TEntity>(TEntity entity)
     where TEntity : ITableEntity
 {
     var e = new Tuple<ITableEntity, TableOperation>
         (entity,
             TableOperation.Replace(entity));
     operations.Enqueue(e);
 }

 public void Execute()
 {
  var count = operations.Count;
  var toExecute = new List<Tuple<ITableEntity, TableOperation>>();
  for (var index = 0; index < count; index++)
  {
      Tuple<ITableEntity, TableOperation> operation;
      operations.TryDequeue(out operation);
      if (operation != null)
          toExecute.Add(operation);
  }

  toExecute
     .GroupBy(tuple => tuple.Item1.PartitionKey)
     .ToList()
     .ForEach(g =>
     {
         var opreations = g.ToList();

         var batch = 0;
         var operationBatch = GetOperations(opreations, batch);

         while (operationBatch.Any())
         {
             var tableBatchOperation = MakeBatchOperation(operationBatch);

             ExecuteBatchWithRetries(tableBatchOperation);

             batch++;
             operationBatch = GetOperations(opreations, batch);
         }
     });
 }

 private void ExecuteBatchWithRetries(TableBatchOperation tableBatchOperation)
 {
     var tableRequestOptions = MakeTableRequestOptions();

     var tableReference = MakeTableReference();

     tableReference.ExecuteBatch(tableBatchOperation, tableRequestOptions);
 }

 private static TableRequestOptions MakeTableRequestOptions()
 {
     return new TableRequestOptions
         {
             RetryPolicy = new ExponentialRetry(TimeSpan.FromMilliseconds(2),
                                                100)
         };
 }

 private static TableBatchOperation MakeBatchOperation(
     List<Tuple<ITableEntity, TableOperation>> operationsToExecute)
 {
     var tableBatchOperation = new TableBatchOperation();
     operationsToExecute.ForEach(tuple => tableBatchOperation.Add(tuple.Item2));
     return tableBatchOperation;
 }

 private static List<Tuple<ITableEntity, TableOperation>> GetOperations(
     IEnumerable<Tuple<ITableEntity, TableOperation>> opreations,
     int batch)
 {
     return opreations
         .Skip(batch*BatchSize)
         .Take(BatchSize)
         .ToList();
 }
}

 

Be sure to include the following settings in your app.config or cloud configurations

<appSettings>
    <add key="StorageConnectionString" value="[Storage Connection String]" />
</appSettings>

References

4 responses to Inserting & Modifying Large Amounts of Data in Windows Azure Table Storage Service

  1. 

    Exactly what I was looking for! Thanks a lot!

    Like

  2. 

    Thanks a lot.
    Could you please explain what is the use of BatchSize=100 and why we use here ? Any specific reason?

    Like

Trackbacks and Pingbacks:

  1. Reading Notes 2013-03-25 | Matricis - March 25, 2013

    […] Inserting & Modifying Large Amounts of Data in Windows Azure Table Storage Service (Alexandre Brisebois) – Nice post that explains best practices with Azure Table Storage. […]

    Like

  2. Windows Azure Table Storage Service – Migrating Tables Between Storage Accounts « Alexandre Brisebois ☁ - October 14, 2014

    […] are downloaded 1000 at a time. Then they are fed into my TableStorageWriter, which regroups the entities by Partition Key and inserts them in batches of […]

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.