Create Predictable GUIDs for your Windows Azure Table Storage Entities

November 14, 2013 — 5 Comments

fingerprint-secret This week I was face with an odd scenario. I needed to track URIs that are store in Windows Azure Table Storage. Since I didn’t want to use the actual URIs as row keys I tried to find a way to create a consistent hash compatible with Windows Azure Table Storage Row & Primary Keys. This is when I came across an answer on Stack Overflow about converting URIs into GUIDs.

The "correct" way (according to RFC 4122 §4.3) is to create a name-based UUID. The advantage of doing this (over just using a MD5 hash) is that these are guaranteed not to collide with non-named-based UUIDs, and have a very (very) small possibility of collision with other name-based UUIDs. [source]

Using the code referenced in this answer I was able to put together an IdentityProvider who’s job is to generate GUIDs based on strings. In my case I use it to create GUIDs based on URIs.

public class IdentityProvider
{
    public Guid MakeGuid(Uri uri)
    {
        var guid = GuidUtility.Create(GuidUtility.UrlNamespace, uri.AbsoluteUri);
        return guid;
    }
}

Creating predictable GUIDs can come quite in handy with Windows Azure Table Storage, because it allows you to create a GUID based an entities’ content. By doing so, you can find entities without performing full table scans. Using the following code, found on GitHub, I was able to build an efficient URI matching system.  Be sure to give credit if you use it.

using System;
using System.Security.Cryptography;
using System.Text;

namespace Logos.Utility
{
    /// <summary>
    /// Helper methods for working with <see cref="Guid"/>.
    /// </summary>
    public static class GuidUtility
    {
        /// <summary>
        /// Creates a name-based UUID using the algorithm from RFC 4122 4.3.
        /// </summary>
        /// <param name="namespaceId">The ID of the namespace.</param>
        /// <param name="name">The name (within that namespace).</param>
        /// <returns>A UUID derived from the namespace and name.</returns>
        /// <remarks>
        /// See <a href="http://code.logos.com/blog/2011/04/generating_a_deterministic_guid.html"&gt;
        /// Generating a deterministic GUID</a>.
        /// </remarks>
        public static Guid Create(Guid namespaceId, string name)
        {
            return Create(namespaceId, name, 5);
        }

        /// <summary>
        /// Creates a name-based UUID using the algorithm from RFC 4122 4.3.
        /// </summary>
        /// <param name="namespaceId">The ID of the namespace.</param>
        /// <param name="name">The name (within that namespace).</param>
        /// <param name="version">The version number of the UUID to create; this value must be either
        /// 3 (for MD5 hashing) or 5 (for SHA-1 hashing).</param>
        /// <returns>A UUID derived from the namespace and name.</returns>
        /// <remarks>
        /// See <a href="http://code.logos.com/blog/2011/04/generating_a_deterministic_guid.html"&gt;
        /// Generating a deterministic GUID</a>.
        /// </remarks>
        public static Guid Create(Guid namespaceId, string name, int version)
        {
            if (name == null)
                throw new ArgumentNullException("name");
            if (version != 3 && version != 5)
                throw new ArgumentOutOfRangeException("version", "version must be either 3 or 5.");

            // convert the name to a sequence of octets
            // (as defined by the standard or conventions of its namespace) (step 3)
            // ASSUME: UTF-8 encoding is always appropriate
            byte[] nameBytes = Encoding.UTF8.GetBytes(name);

            // convert the namespace UUID to network order (step 3)
            byte[] namespaceBytes = namespaceId.ToByteArray();
            SwapByteOrder(namespaceBytes);

            // comput the hash of the name space ID concatenated with the name (step 4)
            byte[] hash;
            using (HashAlgorithm algorithm = version == 3 ? (HashAlgorithm)MD5.Create() : SHA1.Create())
            {
                algorithm.TransformBlock(namespaceBytes, 0, namespaceBytes.Length, null, 0);
                algorithm.TransformFinalBlock(nameBytes, 0, nameBytes.Length);
                hash = algorithm.Hash;
            }

            // most bytes from the hash are copied straight to the bytes of
            // the new GUID (steps 5-7, 9, 11-12)
            byte[] newGuid = new byte[16];
            Array.Copy(hash, 0, newGuid, 0, 16);

            // set the four most significant bits (bits 12 through 15) of the time_hi_and_version field
            // to the appropriate 4-bit version number from Section 4.1.3 (step 8)
            newGuid[6] = (byte)((newGuid[6] & 0x0F) | (version << 4));

            // set the two most significant bits (bits 6 and 7) of the clock_seq_hi_and_reserved
            // to zero and one, respectively (step 10)
            newGuid[8] = (byte)((newGuid[8] & 0x3F) | 0x80);

            // convert the resulting UUID to local byte order (step 13)
            SwapByteOrder(newGuid);
            return new Guid(newGuid);
        }

        /// <summary>
        /// The namespace for fully-qualified domain names (from RFC 4122, Appendix C).
        /// </summary>
        public static readonly Guid DnsNamespace = new Guid("6ba7b810-9dad-11d1-80b4-00c04fd430c8");

        /// <summary>
        /// The namespace for URLs (from RFC 4122, Appendix C).
        /// </summary>
        public static readonly Guid UrlNamespace = new Guid("6ba7b811-9dad-11d1-80b4-00c04fd430c8");

        /// <summary>
        /// The namespace for ISO OIDs (from RFC 4122, Appendix C).
        /// </summary>
        public static readonly Guid IsoOidNamespace = new Guid("6ba7b812-9dad-11d1-80b4-00c04fd430c8");

        // Converts a GUID (expressed as a byte array) to/from network order (MSB-first).
        internal static void SwapByteOrder(byte[] guid)
        {
            SwapBytes(guid, 0, 3);
            SwapBytes(guid, 1, 2);
            SwapBytes(guid, 4, 5);
            SwapBytes(guid, 6, 7);
        }

        private static void SwapBytes(byte[] guid, int left, int right)
        {
            byte temp = guid[left];
            guid[left] = guid[right];
            guid[right] = temp;
        }
    }
}

5 responses to Create Predictable GUIDs for your Windows Azure Table Storage Entities

  1. 

    sha256 also pretty much guarantee it won’t collide

    Like

Trackbacks and Pingbacks:

  1. Dew Drop – November 15, 2013 (#1668) | Morning Dew - November 15, 2013

    […] Create Predictable GUIDs for your Windows Azure Table Storage Entities (Alexandre Brisebois) […]

    Like

  2. Windows Azure - November 22, 2013

    Windows Azure Community News Roundup #77

    Welcome to the newest edition of our weekly roundup of the latest community-driven news, content and

    Like

  3. Windows Azure Community News Roundup #77 - Windows Azure Blog - November 22, 2013

    […] Create Predictable GUIDs for your Windows Azure Table Storage Entities by Alexandre Brisebois […]

    Like

  4. 微软云计算: Windows Azure 中文博客 - November 27, 2013

    Windows Azure 社区新闻综述(#77 版)

    欢迎查看最新版本的每周综述,其中包含有关云计算和 Windows Azure 的社区推动新闻、内容和对话。以下是本周的亮点。 文章、视频和博客文章 文章: Windows Azure

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.