Archives For TableBatchOperation

untitled

Never would have imagined that the laws of physics would be so important in a world where virtualization is the new normal.

Data Locality is Important

Data Locality, refers to the ability to move the computation close to the data. This is important because when performance is key, IO quickly becomes our number one bottleneck. Data access times vary from milliseconds to seconds because of many factors like hardware specifications and network capabilities.

Let’s explore Data Locality through the following Scenario. I have eight files containing data about multiple trucks, and I need to Identify trips. A trip consists of many segments, including short stops. So if the driver stops for coffee and starts again, this is still considered the same trip. The strategy depicted below is to read each file and to group data points by truck. This can be referred to as mapping the data. Then we can compute the trips for each group in parallel over multiple threads. This can be referred to as reducing the data. And finally, we merge the results in a single CSV file so that we can easily import it to other systems like SQL Server and Power BI.

Single Machine

The single machine configuration results were promising. So I decided to break it apart and distribute the process across many task Virtual Machines (TVM). Azure Batch is the perfect service to schedule jobs. Continue Reading…


2013-06-03_18h15_18Building on top of the code from my post “Windows Azure Blob Storage Service – Migrating Blobs Between Accounts” I added logic so that the Windows Azure Storage Account migration process recreate all the Tables from the source account in the target account. Then It downloads the entities from the source Tables using segmented table queries and inserts(or replaces) them into the target Tables.

The process is surprisingly fast compared to the migration of blob containers. When we copy a blob from one container to another, a command is queued and it can take some time to complete. Migrating Tables on the other hand, requires significantly more bandwidth, because we need to download the data from the source Tables and upload it into the target Tables located in the target Windows Azure Storage Account.

Entities are downloaded 1000 at a time. Then they are fed into my TableStorageWriter, which regroups the entities by Partition Key and inserts them in batches of 100.

Continue Reading…


table-storage

Modifying data in Windows Azure Table Storage Service can be done operation by operation or it can be done in batches. The golden rules below describe the limitations and constraints imposed by the service. Even though these rules may seem restrictive, they exist in order to ensure acceptable performances. Sending too much or too little data can create bottle necks.

Continue Reading…