These days I’m all about automation. As most of us are focused on Python, C# JavaScipt and Node I’m taking a different approach to Azure DocumentDB. This experiment’s goal is to facilitate the creation and seeding of DocumentDBs with very little effort from JSON documents stored an Azure Blob Storage container.

Meet DocumentDB

Azure DocumentDB is a NoSQL document database service designed from the ground up to natively support JSON and JavaScript directly inside the database engine. It’s the right solution for web and mobile applications when predictable throughput, low latency, and flexible query are key. Microsoft consumer applications like OneNote already use DocumentDB in production to support millions of users.

Getting Started

We need a DocumentDB and the best way to create one is to navigate to portal.azure.com. Then click on + NEW then Everything then Data, storage, cache, + backup and finally DocumentDB. Fill in all the required information and find something to do for the next 5 minutes (while you wait for Azure to provision your DocumentDB instance).

Creating the Database & Collection

This adventure initially started with me trying to use the REST API directly from PowerShell. I used available documentation and fiddler to produce the following test script. Unfortunately I was unable to generate a valid authorization token from the Get-Key function.

$DocumentDbUrl = 'https://{name}.documents.azure.com'

$DocumentDbKey = '{your key}'

$DocumentDbApiVersion = '2014-08-21'

function Get-Key {
  param
  (
    [System.String]
    $Verb,

    [System.String]
    $ResourceId = '',

    [System.String]
    $ResourceType,

    [HashTable]
    $Headers
  )

  $message = $($Verb + '\n' + $ResourceType + '\n' + $ResourceId +'\n' + $Headers.'x-ms-date' + '\n\n')

  $key = [System.Convert]::FromBase64String($DocumentDbKey)

  $hmacsha = new-object -TypeName System.Security.Cryptography.HMACSHA256 -ArgumentList (,$key) 

  $messageBytes =[Text.Encoding]::UTF8.GetBytes($message.ToLowerInvariant())
  $hash = $hmacsha.ComputeHash($messageBytes)
  $signature = [System.Convert]::ToBase64String($hash)

  return [System.Web.HttpUtility]::UrlEncode($('type=master&ver=1.0&sig=' + $signature))
}

function Add-Database{
  param
  (
    [System.String]
    $DbName
  )  

  $ResourceUrl = '/dbs'

  $date = Get-Date
  $utcDate = $date.ToUniversalTime()
  $dateString = $utcDate.ToString('r',[System.Globalization.CultureInfo]::InvariantCulture)

  $headers = @{
             'x-ms-date'= $dateString
              }

  $verb = 'POST'
  $resourceType ='dbs'
  $resourceId = [String]::Empty

  $authorization = Get-Key -Verb $verb -ResourceType $resourceType -ResourceId $resourceId -Headers $headers

  $headers.Add('authorization', $authorization)

  $body = '{"id":"'+$DbName+'" }'

  $url = $($DocumentDbUrl+$ResourceUrl) 

  $response = Invoke-RestMethod -Method $verb -Uri $url -Headers $headers -Body $body

  $response | Format-List
}

Add-Database -DbName 'DocumentDatabaseName'

Executing this consistently returns

Invoke-RestMethod : The remote server returned an error: (401) Unauthorized.

Not accepting this failure and wanting to setup my environments through PowerShell I decided to create my own Cmdlets.

Using Cmdlets to Wrap the .NET DocumentDB Client

After a couple hours of (401) Unauthorized, I decided to write a series of Cmdlets that wrap the .NET DocumentDB client library. This turned out to be faster than trying to figure out the APIs authorization token.

The Cmdlets

  • New-Context – this builds a PowerShell object that contains the DocumentDB URL and Key
  • Add-Database – this creates the Database in DocumentDB and returns the Database object from which we extract the Self Link.
  • Add-DocumentCollection – this creates a new Document Collection and returns the Document Collection object from which we extract the Self Link
  • Import-FromBlobStorageContainer – this reads all blobs from the target container and adds each blob as a new Document to the target Document Collection

Using the Cmdlets

Import-Module 'DocumentDB.Cmdlet' -Force

$ctx = New-Context -Url 'https://{name}.documents.azure.com' -Key '{key}'

$database = Add-Database -Context $ctx -Name '{name}'

$collection = Add-DocumentCollection -Context $ctx -DatabaseLink $database.SelfLink -Name '{name}'

Import-FromBlobStorageContainer -Context $ctx -DocumentCollectionLink $collection.SelfLink -ContainerName '{name}' -StorageConnectionString 'DefaultEndpointsProtocol=https;AccountName={name};AccountKey={key}'

Getting Started With Development

Cmdlets are created by inheriting from one of these classes:

  • Cmdlet: A simple cmdlet using a .NET class derived from the Cmdlet base
    class. This type of cmdlet does not depend on the Windows PowerShell runtime
    and can be called directly from a .NET language.
  • PSCmdlet: A more complex cmdlet based on a .NET class that derives from the
    PSCmdlet base class. This type of cmdlet depends on the Windows PowerShell
    runtime, and therefore executes within a runspace.

I decided to inherit from PSCmdlet, because I will not use these Cmdlets without Windows PowerShell. Once you’ve chosen a base class, the next challenge is finding the DLL that contains the System.Management.Automation namespace.

There are two ways to get this DLL. The first is to install the Windows SDK which will install the DLL in Program Files (x86)\Reference Assemblies\Microsoft\WindowsPowerShell\3.0. The second option is to run the following PowerShell command.

Copy ([PSObject].Assembly.Location) C:\

DocumentDB Connection Context

This Cmdlet creates a Context PSObject used to pass DocumentDB connection information for the other Cmdlets.

[Cmdlet(VerbsCommon.New, "Context")]
public class NewContext : PSCmdlet
{
    [Parameter(Position = 0,
        Mandatory = true,
        ValueFromPipeline = true,
        ValueFromPipelineByPropertyName = true,
        HelpMessage = "DocumentDB Url")]
    public string Url { get; set; }

    [Parameter(Position = 0,
        Mandatory = true,
        ValueFromPipeline = true,
        ValueFromPipelineByPropertyName = true,
        HelpMessage = "DocumentDB Key")]
    public string Key { get; set; }

    protected override void ProcessRecord()
    {
        var obj = new PSObject();
        obj.Properties.Add(new PSVariableProperty(new PSVariable("Url",Url)));
        obj.Properties.Add(new PSVariableProperty(new PSVariable("Key",Key)));
        WriteObject(obj);
    }
}

DocumentDB Cmdlet Base Class

This is the base class for the subsequent Cmdlets. It adds the Context object parameter at position 0.

public abstract class DocumentDbCmdlet : PSCmdlet
{
    [Parameter(Position = 0,
        Mandatory = true,
        ValueFromPipeline = true,
        ValueFromPipelineByPropertyName = true,
        HelpMessage = "DocumentDB Connection Information")]
    public PSObject Context { get; set; }
}

Add-Database

Using a Context and a Name, this Cmdlet uses the .NET DocumentDB client to create a new Database and returns the Database resource from which we can get a SelftLink.

[Cmdlet(VerbsCommon.Add, "Database")]
public class AddDatabase : DocumentDbCmdlet
{
    [Parameter(Position = 1,
        Mandatory = true,
        ValueFromPipeline = true,
        ValueFromPipelineByPropertyName = true,
        HelpMessage = "Database Name")]
    public string Name { get; set; }

    protected override void ProcessRecord()
    {
        var url = base.Context.Properties["Url"].Value.ToString();
        var key = base.Context.Properties["Key"].Value.ToString();

        var client = new DocumentClient(new Uri(url), key);
        var task = client.CreateDatabaseAsync(new Database { Id = Name });

        task.Wait();

        WriteObject(task.Result.Resource);
    }
}

Add-DocumentCollection

Using a Context, a Name and a Database Link, this Cmdlet uses the .NET DocumentDB client to create a new Document Collection and returns the Document Collection resource from which we can get a SelftLink.

[Cmdlet(VerbsCommon.Add, "DocumentCollection")]
public class AddDocumentCollection : DocumentDbCmdlet
{
    [Parameter(Position = 1,
        Mandatory = true,
        ValueFromPipeline = true,
        ValueFromPipelineByPropertyName = true,
        HelpMessage = "Database Link")]
    public string DatabaseLink { get; set; }

    [Parameter(Position = 2,
        Mandatory = true,
        ValueFromPipeline = true,
        ValueFromPipelineByPropertyName = true,
        HelpMessage = "Collection Name")]
    public string Name { get; set; }

    protected override void ProcessRecord()
    {
        var url = base.Context.Properties["Url"].Value.ToString();
        var key = base.Context.Properties["Key"].Value.ToString();

        var client = new DocumentClient(new Uri(url), key);

        var task = client.CreateDocumentCollectionAsync(DatabaseLink,new DocumentCollection()
        {
            Id = Name
        });

        task.Wait();

        WriteObject(task.Result.Resource);
    }
}

Add-FromBlobStorageContainer

Using a Context, a Document Collection Link, a Storage Connection String and a Container Name, this Cmdlet uses the .NET DocumentDB client to create a new Document for each JSON file it finds in the Azure Blob Storage Container.

[Cmdlet(VerbsData.Import, "FromBlobStorageContainer")]
public class ImportFromBlobStorageContainer : DocumentDbCmdlet
{
    [Parameter(Position = 1,
        Mandatory = true,
        ValueFromPipeline = true,
        ValueFromPipelineByPropertyName = true,
        HelpMessage = "Document Collection Link")]
    public string DocumentCollectionLink { get; set; }

    [Parameter(Position = 2,
        Mandatory = true,
        ValueFromPipeline = true,
        ValueFromPipelineByPropertyName = true,
        HelpMessage = "Azure Storage Connection String")]
    public string StorageConnectionString { get; set; }

    [Parameter(Position = 3,
        Mandatory = true,
        ValueFromPipeline = true,
        ValueFromPipelineByPropertyName = true,
        HelpMessage = "Blob Container Name")]
    public string ContainerName { get; set; }

    protected override void ProcessRecord()
    {
        WriteProgress(new ProgressRecord(0, "Downloading List of Blobs", "Reading from " + ContainerName));

        var account = CloudStorageAccount.Parse(StorageConnectionString);
        var blobClient = account.CreateCloudBlobClient();
        var container = blobClient.GetContainerReference(ContainerName);

        var blobs = container.ListBlobs(useFlatBlobListing: true).ToList();

        var count = blobs.Count;

        string url = base.Context.Properties["Url"].Value.ToString();
        string key = base.Context.Properties["Key"].Value.ToString();

        var client = new DocumentClient(new Uri(url), key);

        var progress = new ProgressRecord(0, "Importing Blobs from " + ContainerName, "Importing Blobs from " + ContainerName);
        WriteProgress(progress);

        for (var i = 0; i < count; i++)
        {
            var blockBlob = blobs[i] as CloudBlockBlob;

            if (blockBlob == null) continue;

            progress.StatusDescription = blockBlob.Name;
            WriteProgress(progress);

            blockBlob.DownloadTextAsync()
                .ContinueWith(t => client.CreateDocumentAsync(DocumentCollectionLink, t.Result))
                .Wait();
        }

        var collection = client.ReadDocumentCollectionAsync(DocumentCollectionLink);
        collection.Wait();

        WriteObject(collection.Result.Resource);
    }
}

SnapIn

In order to test the Cmdlets in Windows PowerShell ISE, I implemented a Custom PowerShell SnapIn and placed the the Cmdlet DLLs in Documents\WindowsPowerShell\Modules. This folder is read by PowerShell to find Modules. In my case, it found a folder named DocumentDB.Cmdlet which I used to import the Module.

[RunInstaller(true)]
public class SnapIn : CustomPSSnapIn
{
    private Collection cmdlets = new Collection();
    private Collection providers = new Collection();
    private Collection types = new Collection();
    private Collection formats = new Collection();

    public SnapIn()
        : base()
    {
        cmdlets.Add(new CmdletConfigurationEntry("New-Context", typeof(NewContext), null));
        cmdlets.Add(new CmdletConfigurationEntry("Add-Database", typeof(AddDatabase), null));
        cmdlets.Add(new CmdletConfigurationEntry("Add-DocumentCollection", typeof(AddDocumentCollection), null));
        cmdlets.Add(new CmdletConfigurationEntry("Import-FromBlobStorageContainer", typeof(ImportFromBlobStorageContainer), null));
    }
    public override string Name
    {
        get { return "DocumentDB Cmdlets"; }
    }
    public override string Vendor
    {
        get { return "Alexandre Brisebois"; }
    }
    public override string VendorResource
    {
        get { return "Alexandre Brisebois"; }
    }
    public override string Description
    {
        get { return "DocumentDB Cmdlet example"; }
    }
    public override string DescriptionResource
    {
        get { return "DocumentDB Cmdlet example"; }
    }
    public override Collection Cmdlets
    {
        get { return cmdlets; }
    }

    public override Collection Providers
    {
        get { return providers; }
    }

    public override Collection Types
    {
        get { return types; }
    }

    public override Collection Formats
    {
        get { return formats; }
    }
}

2 responses to Using PowerShell to Seed #Azure DocumentDB From Blob Storage

  1. 

    Hi,

    How can I launch DocumentDB through runbooks. I saw the sample runbook which creates a website + container + mysql Database. I have similar requirement so I used the same runbook and removed the mysql stuff. So now the runbooks creates a website + a container and updates the container variables to the website. How can I add documentDB to the same runbook. What commands can I use.

    Like

Trackbacks and Pingbacks:

  1. Get Last Sync Time for Read-Access Geo-Redundant Azure Storage « Alexandre Brisebois ☁ - February 21, 2016

    […] Building a Custom PSCmdlet to Seed Azure DocumentDB From Blob Storage […]

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.