Never would have imagined that the laws of physics would be so important in a world where virtualization is the new normal.

Data Locality is Important

Data Locality, refers to the ability to move the computation close to the data. This is important because when performance is key, IO quickly becomes our number one bottleneck. Data access times vary from milliseconds to seconds because of many factors like hardware specifications and network capabilities.

Let’s explore Data Locality through the following Scenario. I have eight files containing data about multiple trucks, and I need to Identify trips. A trip consists of many segments, including short stops. So if the driver stops for coffee and starts again, this is still considered the same trip. The strategy depicted below is to read each file and to group data points by truck. This can be referred to as mapping the data. Then we can compute the trips for each group in parallel over multiple threads. This can be referred to as reducing the data. And finally, we merge the results in a single CSV file so that we can easily import it to other systems like SQL Server and Power BI.

Single Machine

The single machine configuration results were promising. So I decided to break it apart and distribute the process across many task Virtual Machines (TVM). Azure Batch is the perfect service to schedule jobs. Continue Reading…


Big Compute or Big Data?

This question comes up on a fairly regular basis. So I thought it would be interesting to share my understanding in hopes to help you make the right decision.

Both are enablers, and they create opportunities through various approaches. When the problem is understood, and the algorithms vary by parameter, then Big Compute is definitely an approach to consider. When we know our input data, and are experimenting with various algorithms, Big Data is a clear winner.

This being said, let’s try to materialize this into something more concrete.

Big Compute shines at large scales. Easily parallelizable workloads are the best use cases, because they allow us to break the workload into independent tasks. This is where we can gain the most from large numbers of compute cores. Big Compute is all about executing any software package, written in any language by passing in variables. This creates an amazing opportunity for developers to optimize their code to be extremely efficient. Optimizations range from concurrency management, memory management, limiting IOPS and other aspects like network communication optimization. Possible scenarios are well known algorithms like Monte Carlo simulations, rendering and work flows.

Big Data is all about empowering us to experiment with our data by providing us with tools, query languages and scripting capabilities that are geared at giving us a lot of agility. Tinkering with algorithms, is the perfect use case. We know our data, and want to extract insights from it. This means that we’re going to clean it, shape it and question it. Big Data is built for this; it makes it possible to iterate through multiple versions of our algorithms in a way that’s difficult with Big Compute.

So now that we’ve nailed this down, which is right for your workload?

Share your thoughts in the comments below


As sessions make their way to Channel 9, we can download them using this PowerShell script.

Download All Sessions in SD Quality

$feedUrl = 'http://s.ch9.ms/Events/Build/2016/RSS'
[Environment]::CurrentDirectory=(Get-Location -PSProvider FileSystem).ProviderPath
function Get-Media
    $u = New-Object System.Uri($url)
    $name = $title
    $extension = [System.IO.Path]::GetExtension($u.Segments[-1])
    $fileName = $name + $extension
    $fileName = $fileName -replace "’", ''
    $fileName = $fileName -replace "\?", ''
    $fileName = $fileName -replace ":", ''
    $fileName = $fileName -replace '/', ''
    $fileName = $fileName -replace ",", ''
    $fileName = $fileName -replace '"', ''
    if (Test-Path($fileName)) {
        Write-Host 'Skipping file, already downloaded' -ForegroundColor Yellow
        Invoke-WebRequest $url -OutFile $fileName
$feed=[xml](New-Object System.Net.WebClient).DownloadString($feedUrl)
foreach($i in $feed.rss.channel.item) {
    foreach($m in $i.group){
        foreach($u in $m.content `
                | Where-Object { `
                        $_.url -like '*mid.mp4' `
                     } | Select-Object -Property @{Name='url'; Expression = {$_.url}}, `
                                                 @{Name='title'; Expression = {$i.title}})
            Get-Media -url $u.url -title $u.title
# Find and Download Keynotes
foreach($i in $feed.rss.channel.item) {
    foreach($m in $i.group){
        foreach($u in $m.content `
                | Where-Object { `
                        $_.url -like '*KEY0*' `
                        -and $_.type -eq 'video/mp4' `                       
                     } `
                     | Select-Object -Unique `
                     | Select-Object -Property @{Name='url'; Expression = {$_.url}}, `
                                                 @{Name='title'; Expression = {$i.title}})
            Get-Media -url $u.url -title $u.title

Continue Reading…


Getting to Know Azure Mobile App Cont.

Microsoft Azure Mobile App has recently gone GA (General Availability) and has definitely captured my attention. Mobile App is a tremendous accelerator that enables us to go from an idea to a functional prototype quickly. Then, we can continue to build on that initial investment to create a robust production ready app. Finally, this post is all about using Visual Studio Team Services (VSTS) to build and publish apps to HockeyApp, so that we can test and assess quality before our apps make it to our favorite app Stores.

Refreshing Authentication Tokens

Authentication Tokens are short-lived and having users login to the App frequently can cause friction. This is definitely undesirable and can be dealt with by identifying when a Token is no longer valid. When this condition is met, we can attempt to refresh the Authentication Token by calling the Azure App Service Token Store APIs. Continue Reading…

Keeping ARM CMDLETs Fresh

Open a PowerShell Console as an Administrator and used the following commands. It usually takes about 15 minutes to complete, so don’t do this if you’re in a hurry =)

Install-Module AzureRM

Installing AzureRM modules.
AzureRM.Profile 1.0.5 updated [1/29]...
Azure.Storage 1.0.5 updated [2/29]...
AzureRM.Backup 1.0.5 updated [3/29]...
AzureRM.RedisCache 1.1.3 updated [4/29]...
AzureRM.Tags 1.0.5 updated [5/29]...
AzureRM.SiteRecovery 1.1.4 updated [6/29]...
AzureRM.Insights 1.0.5 updated [7/29]...
AzureRM.OperationalInsights 1.0.5 updated [8/29]...
AzureRM.DataLakeAnalytics 1.0.5 updated [9/29]...
AzureRM.Dns 1.0.5 updated [10/29]...
AzureRM.Storage 1.0.5 updated [11/29]...
AzureRM.UsageAggregates 1.0.5 updated [12/29]...
AzureRM.HDInsight 1.0.6 updated [13/29]...
AzureRM.RecoveryServices 1.0.6 updated [14/29]...
AzureRM.Network 1.0.5 updated [15/29]...
AzureRM.Compute 1.2.4 updated [16/29]...
AzureRM.TrafficManager 1.0.5 updated [17/29]...
AzureRM.Websites 1.0.5 updated [18/29]...
AzureRM.LogicApp 1.0.1 updated [19/29]...
AzureRM.DataFactories 1.0.5 updated [20/29]...
AzureRM.DataLakeStore 1.0.5 updated [21/29]...
AzureRM.Sql 1.0.5 updated [22/29]...
AzureRM.Automation 1.0.5 updated [23/29]...
AzureRM.ApiManagement 1.0.5 updated [24/29]...
AzureRM.StreamAnalytics 1.0.5 updated [25/29]...
AzureRM.Batch 1.0.5 updated [26/29]...
AzureRM.Resources 1.0.5 updated [27/29]...
AzureRM.NotificationHubs 1.0.5 updated [28/29]...
AzureRM.KeyVault 1.1.4 updated [29/29]...

Making a Self-Signed Certificate

A lot of services on Azure and on-premis require us to create or buy certificates. Now there are a couple of ways to create certificates. I used to do it using makecert

makecert -sky exchange -r -n "CN=<Domain Name>" -pe -a sha1 -len 2048 -ss My -sv <Domain Name>.pvk <Domain Name>.cer 
pvk2pfx -pvk <Domain Name>.pvk -pi <Password> -spc <Domain Name>.cer -pfx <Domain Name>.pfx

Recently, I started making my certificates using PowerShell Continue Reading…


Copying Files Over a PSSession

I recently bought a Raspberry Pi 3, and now that it’s running Windows IoT Core, I wanted to make it do something. So I wrote a basic UWP App and I was looking for a way to deploy it to the device. Luckily, WinRM is enabled on Windows IoT Core. This allowed me to use PowerShell to remote into the device and copy my appx package.

net start WinRm

$ip = ""

Set-Item WSMan:\localhost\Client\TrustedHosts -Value $ip -Force

$PWord = ConvertTo-SecureString –String "p@ssw0rd" –AsPlainText -Force
$Credential = New-Object –TypeName System.Management.Automation.PSCredential `
                         –ArgumentList "$ip\Administrator", $PWord

$session = New-PSSession -ComputerName $ip -Credential $Credential

Copy-Item -ToSession $session `
          -Path "C:\Users\brise\Downloads\worker.appx" `
          -Destination "C:\Data\Users\Administrator\Documents"
2016-02-29 (2)

Azure Resource Manager enables you to work with the resources in your solution as a group. You can deploy, update or delete all of the resources for your solution in a single, coordinated operation. You use a template for deployment and that template can work for different environments such as testing, staging and production. Resource Manager provides security, auditing, and tagging features to help you manage your resources after deployment.

ARM Tools for VS Code

What tools are available to help me work with Azure Resource Manager (ARM) Templates? This is a question that’s come up a few times over the last few months. And I completely understand the root of this question. Working with large JSON files can be scary, and can also be enjoyable given the right tools are made available to us. Continue Reading…