Along with including a readable copy of the original queue message along with the stack trace in your application’s diagnostics, it’s absolutely imperative that you implement a poison queue.
Poison or dead letter queues are essential in pull based systems, because they allow us to relieve the system from having to keep processing the same message over and over again.
A typical pull based system will use queues to absorb and protect services from peak loads. Allowing them to run at their own pace. Furthermore, it allows us to distribute the queue processing load over many compute nodes. Adding and removing compute nodes can be achieved by using the Auto Scaling features which can be found in the Windows Azure Management Portal.
Polling Windows Azure Solution Without a Poison Queue
As previously mentioned, the issue with this solution is that once messages fail to be processed, they become poison messages. These messages will accumulate over time and can eventually paralyze the entire system by blocking new messages from ever being processed.
Polling Windows Azure Solution With a Poison Queue
Implementing a poison queue can be achieved by adding a new queue to the existing system. Most queue services like the Windows Azure Storage Queue Service will keep track of the number of times that a message has been dequeued.
Good practice on Windows Azure, is attempting to process a message more than once. We do this because transient errors are normal and that reattempting to process a message will usually succeed.
Occasionally messages don’t deserialize properly or contain instructions that cause the process to fail repeatedly. We can identify these messages because their dequeue count exceeds what we consider to be normal. In many cases, it’s safe to consider a message as being a poison message when it has been dequeued more than 5 times.
Placing these messages in a poison queue has two interesting benefits. First, it allows the system to keep moving forward. Second, it regroups message in a centralized location so that DevOps can diagnose and fix issues that otherwise might go unnoticed.
NOTE: If you’re storing poison messages in Windows Azure Storage Queue Service be sure to check the queue regularly because messages are deleted after 7 days.
If you’re interested in reading more about poison queues, Pascal Laurin, a colleague MVP has written about Windows Azure Storage Queue with error queues and about Handling Azure Storage Queue poison messages.