Performance isn’t about having a huge amount of concurrent threads running in your Worker Role. Its about finding the right balance! So start by taking out all the stops. Push your Worker Roles to their limits. Then taken it back a notch and observe how the Worker Role’s performance is affected. Chances are that your going to notice an increase in the global performance!
What? How can my processes speed up if I reduce the number of threads?
To answer this question, lets first look at the Worker Role configurations available as of February 2013.

Take a look at these configurations. The Extra Small shared instance has 1ghz of CPU and is limited to 5 Mbps. Further more it doesn’t have much RAM. Imagine dequeuing 32 messages from a queue on Windows Azure Queue Storage Service. If the Worker Role starts working on all 32 messages at the same time and that it must download a blob for each message, it will try to download all the blobs in parallel. The cumulative size of all blobs will most likely create a bottle neck resulting in an overall loss in performance.

Lets shine some light on this, 5Mbps would give you ~625 KiloBytes per second. Now imagine that your blobs are
3 MegaBytes in size. This would mean that you are downloading 3 * 32 = 96 MegaBytes. In this specific situation there are a couple of things that can help.
- Place delays between each download to spread them out
- Download the file one by one
- Compress the blobs to reduce the amount of data required to transit over the network
- Reduce the number of concurrent messages being processed
A mix and match can also be used to find the right balance between the number of messages being processed and the volume of data transferred over the wire.
- Lets alter the scenario a bit and say that the blobs aren’t 3 MegaBytes in size but they are 40 MegaBytes in size. This would mean that we could possibly be loading close to 1.25 GigaBytes. Keep in mind that an Extra Small instance only has 768 MegaBytes of RAM and that some of it is used by Windows.This scenario would cause Windows to start paging. Paging is a process where Windows swaps data from RAM to Disk so that processes may continue functioning.
To decrease excessive paging, and thus possibly resolve thrashing problem, a user can do any of the following:
- Increase the amount of RAM in the computer (generally the best long-term solution).
- Decrease the number of programs being concurrently run on the computer.
Having looked over the previous scenarios, your first thought might be something along the lines of “I’m just not going to use an Extra Small instance because its too limited.” So lets put things in perspective, the Extra Large instance offers a network connection of 800 mbps which is roughly 100 MegaBytes per second. This is great news for Worker Roles that work with small to reasonably sized files. But think about larger files like movies or audio processing. These kinds of work loads are going to experience the same limitations discussed earlier for an Extra Small instance.
Another possible bottle neck emerges when we look at the CPU capacity and the number of threads being spawned by our processes. CPUs are able to handle a certain amount of threads efficiently. If this amount is surpassed, the CPU spends more time switching between threads than doing actual work. To overcome this bottleneck, try making some parts of your code synchronous within the parent parallel process. This will reduce the amount of threads being used to execute your code and it will also reduce context switching for the CPU. You can still process 2 or more tasks in parallel, but the tasks themselves should be synchronous.
I/O bottlenecks also appear if you are using the local temporary storage available in Worker Roles. If your Role is trying to execute too many operations at once, the result will probably be similar to the possible network problems discussed earlier .
There are many creative ways to deal with these physical constraints. Finding the right balance between the number of queue messages being processed, the available CPU capacity, the available network capacity, the local storage capacity and the available RAM can be tricky. The best way to go about finding the balance is to test out different configurations and observe how our Worker Role is using the available resources.
As I previously mentioned in my post about choosing the right compute Virtual Machine size, try to optimize your code to a maximum before trying to scale up. Prefer scaling out over scaling up. In most cases, scaling up doesn’t resolve the problem, it pushes it back. Scaling up also has it’s costs. Take some time to look over the cost of spending a few days trying to improve performance by improving the code VS the cost of operating a bigger Virtual Machine.
Remember, every application consumes resources differently. To find a sweet spot, push your application to its limits. The results may surprise you, your Worker Role may perform better that you anticipated. Then tune it down slowly and observe how it behaves. Windows Azure Diagnostics can be very useful for this phase of your development lifecycle.
Have you come up with creative ways to overcome resource bottlenecks?