Archives For HDInsight


Don’t Frown on CSVs

In Microsoft Azure (Azure) CSV and Avro can help you deal with unpredictable amounts of data.

CSV files are surprisingly compact. They compresses really well and allows us to work with datasets that do not fit in RAM. This low-tech solution is often overlooked and frowned upon by developers who don’t get the opportunity to work with very large datasets.

Root cause analysis scenarios have led me to comb through several days’ worth of logs. More often than not, this represents gigabytes worth of data. Exporting application logs to a CSV files, I was able to parse and analyze them with minimal resources.

With these two options available to us, why should we consider using the CSVs? Well, Avro is still fairly new and unsupported by most systems. CSVs can be imported into Databases, Azure Table Storage, Hadoop (HDInsight), ERPs… And a slew of other systems with minimal effort. Heck, you can even open CSV files in Microsoft Excel! Continue Reading…


9-5-2013 2-36-06 AMWindows Azure HDInsight has been available for a little while now, but I haven’t had a chance to work with it. Tonight as I was browsing the Patterns & Practices Website, I noticed that they were working on a new book for the Cloud Series. It’s an ongoing project about developing Big Data solutions using the Windows Azure HDInsight and related technologies.

The book can be downloaded from the Patterns & Practices Windows Azure Guidance site.

Continue Reading…