Working on a system that ingests an interesting amount of data, I originally started out by building a service that used a Windows Azure Storage Queue to sequentially pick blobs from a Windows Azure Blob Container and import the contents into Windows Azure SQL Database.
This is a great solution in theory, but it has a few drawbacks.
Cost of Using a Windows Azure Storage Queue V.S Not Using One
Cost of Importing 100 000 Blobs Every 12 Hours Using a Queue
First it consumes a fair amount of storage transactions. To put things in perspective lets take a closer look.
We then count 1 transaction to fetch 32 messages from the Windows Azure Storage Queue.
While processing each blob we count 1 transaction to read the blob from the container. Then 1 transaction to delete the message from the Windows Azure Storage Queue and finally, 1 last transaction to delete the blob from the container
Lets do the Math
+ 1 (upload blob)
+ 1 (queue message)
+ 1 (read 32 messages from queue)
+ 32 (read blob)
+ 32 (delete message)
+ 32 (delete blob)
find the number of batches
100 000 (blobs) / 32 = 3125 batches
+ 97 (operations per batch) * (3125 batches)
+ 100 000 (upload blob) + 100 000 (queued messages)
That’s 503 125 transactions! Imagine doing that every 12 hours.
503 125 * 365 * 2 = 367 281 350 transactions / year
Using the queue it costs us 36.8$ / year
Cost of Importing 100 000 Blobs Every 12 Hours Without a Queue
Without the Windows Azure Storage Queue, we end up making far less transactions. In summary, there would be 1 transaction when inserting the blob into the container. Then 1 transaction to read the blob from the container
and 1 additional transaction to delete the blob from container. Furthermore, it would take 100 transactions to list all available blobs.
Lets do the Math
100 000 (blobs) * 3 (upload, read, delete) = 300 000 transactions
100 (list operations) + 300 000 (transactions) = 300 100 transactions
300 100 *365 * 2 = 219 073 000 transactions / year
Not using the queue It costs us 21.9$ / year
Using the queue it costs us 36.8$ / year and it takes 367 281 350 transactions / year
Not using the queue It costs us 21.9$ / year and it takes 219 073 000 transactions / year
That’s 148 208 350 less transactions per year!
How much are we saving by not using the Windows Azure Storage Queue in this specific scenario?
149 (Million transactions) * 0.1$ (/ Million transaction) is 14.9$
The difference doesn’t seem that significant on its own. But as you build your application, these insignificant costs of operation add up quickly. Its important to keep in mind that whenever you lower the cost of operation for a specific process you are contributing to lowering the overall cost of operation of your application.
Lets push this a bit further. Imagine for a second, that each transaction takes a minimum of 1 millisecond to execute. This would mean that by removing queues from this specific scenario, we would be saving 41.17 hours of compute time per year. To illustrate this in terms of costs, lets imagine that the worker role is a medium sized virtual machine, at the cost of 0.16$ per hour. This means that we would be saving an additional 6.59$ for that single process.
Consequently, not using a queue in this scenario has the potential of saving 14.9 + 6.59 = 21.49$ per year.
Building on the previous idea, imagine that we 1 process, but that we have 10 (which isn’t too far fetched on the cloud).
In this new scenario, we would have a rough potential of saving 214.9$ per year worth in savings.
I’m sharing these insights with you today, because as long as you have a small project and that you control over how resources are consumed, these kinds of cost reductions seem silly. But once you go viral (and you might), thousands or even millions of people could start using your services. At this point your cost of operation will probably grow exponentially!
When we talk about the Cloud we talk about economies of scale. Gaining an edge on your competitors starts by coding smart, making the right choices and knowing how to get the biggest bang for your buck!
Pros and Cons of Using Queues in This Scenario
If that’s not enough, there’s more added benefits to using a blob containers instead of queues for a service like this.
Cons of Using the Windows Azure Storage Queue
- You can only read the oldest 32 messages in a queue
- You cannot prioritize your queue, you can only hide messages for a certain amount of time
- Messages in a queue can live for a maximum of 7 days
Pros of using the windows azure storage container
- Blobs are ordered by name. Using naming conventions you can easily bump blobs to the top of the list
- You can browse and read all blobs at anytime
- You can add and delete any blob at anytime without having to purge the container
As I’ve mentioned in previous posts, Windows Azure SQL Database is extremely good at protecting itself against abuse. This is why I usually have a single worker role taking blobs and pushing their contents I to the database (who are not using Federations). This design usually prevents my services from getting throttled by the Windows Azure SQL Database service.
Having a single role process the blobs also means that the chances a blob being processed by two instances at the same time is very slim. Furthermore, whether your design uses queues or not, every modification that your service makes to the data should de idempotent. So ingesting the same blob twice should not be an issue.
Think about your services and how they consume Windows Azure Services. Then ask yourself "Could this service be optimized to take better advantage of my available resources?". If the answer is yes. Then you should take the time to investigate.
Details are crucial when it comes to the cloud! Don’t get caught by flash mobs =)