Querying Over Windows Azure Table Storage Service

March 12, 2013 — 9 Comments

download

Windows Azure Table Storage Service can be queried using various approaches. I use the Windows Azure Storage NuGet package and reusable queries.

A Query on Windows Azure Table Storage Service usually completes in 200 milliseconds. On busy systems this is an eternity! To help my services perform better, and to reduce costs of operation, I built the TableStorageReader to executes queries with cache, in turn reducing the number of transactions and reducing the latencies created by communications over the network.

The reader executes reusable queries. Reusable queries greatly simplify the application’s design by encapsulating query logic in named concepts. They are like Stored Procedures in SQL Server, where team members can discover functionality by reading file names from the query folders. More details about reusable queries in the reference section at the bottom of this post.

The TableStorageReader is a fluent api that allows you to create tables, query tables and apply cache over queries.

The code from this Post is part of the Brisebois.WindowsAzure NuGet Package

To install Brisebois.WindowsAzure, run the following command in the Package Manager Console

PM> Install-Package Brisebois.WindowsAzure

Get more details about the Nuget Package.

Querying Table Storage For The Top 4 Entities From a Partition

var q = new GetTopEntriesForPartition<DynamicTableEntity>("partitionKey", 4);

var entities =  await TableStorageReader
                        .Table("birthdays")
                        .Execute(q);

 

Querying Table Storage Service With Query Caching

Caching can be extremely beneficial in situation where stale data is acceptable. If your application or system is able to work with eventual consistency, meaning that it is able to work with data that is more than a second old, then caching is for you. It will speedup your application’s response time by reducing the number of storage transactions. In busy systems using a 2 seconds cache can yield surprising results!

Query With a Cache Miss Executed in 1.7452408 seconds

Query With a Cache Hit Executed in 0.0004883 seconds

Using a 1 minute cache

var q = new GetTopEntriesForPartition<DynamicTableEntity>("partitionKey", 4);

var entities =  await TableStorageReader
                        .Table("birthdays")
                        .WithCache()
                        .Execute(q);

Using a 30 minute cache

var absoluteExpiration = new DateTimeOffset(DateTime.UtcNow, 
                                            TimeSpan.FromMinutes(30));
var policy = new CacheItemPolicy
    {
        AbsoluteExpiration = absoluteExpiration
    };

var q = new GetTopEntriesForPartition<DynamicTableEntity>("partitionKey", 4);

var entities =  await TableStorageReader
                        .Table("birthdays")
                        .WithCache(policy)
                        .Execute(q);

Using a 30 minute cache with an additional cache hint

var absoluteExpiration = new DateTimeOffset(DateTime.UtcNow, 
                                            TimeSpan.FromMinutes(30));
var policy = new CacheItemPolicy
    {
        AbsoluteExpiration = absoluteExpiration
    };

var q = new GetTopEntriesForPartition<DynamicTableEntity>("partitionKey", 4);

var entities =  await TableStorageReader
                        .Table("birthdays")
                        .WithCache(policy,"special cache")
                        .Execute(q);

 

TableStorageReader

public class TableStorageReader
{
  private readonly CloudStorageAccount storageAccount;
  private readonly CloudTableClient tableClient;
  private readonly CloudTable tableReference;
  private CacheItemPolicy cachePolicy;
  private string cacheHint = string.Empty;

  private TableStorageReader(string tableName)
  {
      var cs = CloudConfigurationManager.GetSetting("StorageConnectionString");
      storageAccount = CloudStorageAccount.Parse(cs);
      tableClient = storageAccount.CreateCloudTableClient();
      tableReference = tableClient.GetTableReference(tableName);

      var tableServicePoint = ServicePointManager
                               .FindServicePoint(storageAccount.TableEndpoint);
      tableServicePoint.UseNagleAlgorithm = false;
  }

  public TableStorageReader CreateIfNotExist()
  {
      tableReference.CreateIfNotExists();
      return this;
  }

  public async Task<ICollection<TEntity>> 
      Execute<TEntity>(CloudTableQuery<TEntity> query)
      where TEntity : ITableEntity
  {
      if (query == null)
          throw new ArgumentNullException("query");

      return await Task.Run(() =>
          {
              if (cachePolicy == null)
                  return query.Execute(tableReference);

              var cached = new CloudTableQueryCache<TEntity>(query, 
                                                             cachePolicy, 
                                                             cacheHint);
              return cached.Execute(tableReference);
          });
  }

  public static TableStorageReader Table(string tableName)
  {
      return new TableStorageReader(tableName);
  }

  //Cache for 1 minute
  public TableStorageReader WithCache()
  {
      var policy = new CacheItemPolicy
          {
              AbsoluteExpiration = DateTime.UtcNow.AddMinutes(1d)
          };
      return WithCache(policy);
  }

  public TableStorageReader WithCache(CacheItemPolicy policy)
  {
      return WithCache(policy, string.Empty);
  }

  public TableStorageReader WithCache(CacheItemPolicy policy,
                                      string cacheKey)
  {
      if (policy == null) throw new ArgumentNullException("policy");
      if (cacheKey == null) throw new ArgumentNullException("cacheKey");

      cachePolicy = policy;
      cacheHint = cacheKey;

      return this;
  }
}

 

Reusable Query Example

The reusable query implements a CloudTableQuery, which contains logic for computing the hash for the cache key. Implement the Execute method and the CacheKey property. The CacheKey property should return a unique value for each permutation of the query object. The Execute method will be executed by the TableStorageReader, if WithCache is used, the query result will be cached. Furthermore, queries are immutable and parameters are passed through the queries’ constructor. This means that you can execute the same query on many TableStorageReader instances in parallel. Once all the parallel queries have executed, you can aggregate the results for the consumer.

public class GetTopEntriesForPartition<TEntity> :
    CloudTableQuery<TEntity>
    where TEntity : ITableEntity, new()
{
  private readonly string tablePartition;
  private readonly int take;
  private readonly string cacheKey;

  public GetTopEntriesForPartition(string partition)
      : this(partition, 100)
  {
  }

  public GetTopEntriesForPartition(string partition, int take)
  {
      if (partition == null)
          throw new ArgumentNullException("partition");

      tablePartition = partition;
      this.take = take;

      var queryCacheHint = "GetTopEntriesForPartition"
                           + tablePartition 
                           + take;

      cacheKey = queryCacheHint;
  }

  public override Task<ICollection<TEntity>> Execute(CloudTable model)
  {
      if (model == null)
          throw new ArgumentNullException("model");

      return Task.Run(() =>
          {
              var condition = MakePartitionKeyCondition();

              var tableQuery = new TableQuery<TEntity>();

              tableQuery = tableQuery.Where(condition).Take(take);

              return (ICollection<TEntity>)model.ExecuteQuery(tableQuery)
                                                .ToList();
          });
  }

  public override string GenerateCacheKey(CloudTable model)
  {
     return cacheKey;
  }

  private string MakePartitionKeyCondition()
  {
      var value = tablePartition.ToUpperInvariant();
      return TableQuery.GenerateFilterCondition("PartitionKey",
                                           QueryComparisons.Equal,
                                           value);
  }
}

 

CloudTableQuery

public abstract class CloudTableQuery<TEntity>
    : IModelQuery<Task<ICollection<TEntity>>, CloudTable>
{
    public abstract Task<ICollection<TEntity>> Execute(CloudTable model);

    public abstract string GenerateCacheKey(CloudTable model);

    protected string MakeCacheKeyHash(string queryCacheHint)
    {
        if(string.IsNullOrWhiteSpace(queryCacheHint))
            throw new ArgumentNullException("queryCacheHint");

        using (var unmanaged = new SHA1CryptoServiceProvider())
        {
            var bytes = Encoding.UTF8.GetBytes(queryCacheHint);

            var computeHash = unmanaged.ComputeHash(bytes);
                
            if (computeHash.Length == 0)
                return string.Empty;
                
            return Convert.ToBase64String(computeHash);
        }
    }
}

 

References

9 responses to Querying Over Windows Azure Table Storage Service

  1. 

    Good post this. I’m going to implement something similar soon. BTW I think your “reusable queries” pattern is actually just the GoF “Specification” pattern ;)

    Like

  2. 

    Does the nuget version include support for segments?

    Like

  3. 

    Is there anyway to get the records when a set(an array) of partitionKeys are provided?

    Like

    • 

      I don’t recall that feature being present. You will need to build out a list of OR statements. If you try to match on too many values, you may find that your query is slow. Try to target queries to subsets of partitions. If you find that your query is too slow. Try splitting it up in many queries that you execute in parallel, then join and reduce the results in order to satisfy your application’s contract.

      Like

Trackbacks and Pingbacks:

  1. The Top 10 Most-Read #WindowsAzure Posts of 2013 | Alexandre Brisebois - December 29, 2013

    […] Querying Over Windows Azure Table Storage Service – 2,242 reads […]

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.