Poison Queues Are A Must!

August 14, 2013 — 1 Comment

poisonAlong with including a readable copy of the original queue message along with the stack trace in your application’s diagnostics, it’s absolutely imperative that you implement a poison queue.

Poison or dead letter queues are essential in pull based systems, because they allow us to relieve the system from having to keep processing the same message over and over again.

A typical pull based system will use queues to absorb and protect services from peak loads. Allowing them to run at their own pace. Furthermore, it allows us to distribute the queue processing load over many compute nodes. Adding and removing compute nodes can be achieved by using the Auto Scaling features which can be found in the Windows Azure Management Portal.

Polling Windows Azure Solution Without a Poison Queue

As previously mentioned, the issue with this solution is that once messages fail to be processed, they become poison messages. These messages will accumulate over time and can eventually paralyze the entire system by blocking new messages from ever being processed.

7-26-2013 6-39-40 PM

Polling Windows Azure Solution With a Poison Queue

Implementing a poison queue can be achieved by adding a new queue to the existing system. Most queue services like the Windows Azure Storage Queue Service will keep track of the number of times that a message has been dequeued.

Good practice on Windows Azure, is attempting to process a message more than once. We do this because transient errors are normal and that reattempting to process a message will usually succeed.

Occasionally messages don’t deserialize properly or contain instructions that cause the process to fail repeatedly. We can identify these messages because their dequeue count exceeds what we consider to be normal. In many cases, it’s safe to consider a message as being a poison message when it has been dequeued more than 5 times.

 7-26-2013 6-34-15 PM

Placing these messages in a poison queue has two interesting benefits. First, it allows the system to keep moving forward. Second, it regroups message in a centralized location so that DevOps can diagnose and fix issues that otherwise might go unnoticed.

NOTE: If you’re storing poison messages in Windows Azure Storage Queue Service be sure to check the queue regularly because messages are deleted after 7 days.

If you’re interested in reading more about poison queues, Pascal Laurin, a colleague MVP has written about Windows Azure Storage Queue with error queues and  about Handling Azure Storage Queue poison messages.

Trackbacks and Pingbacks:

  1. The natural borders of Azure Cloud Queue: Taking the step to Service Bus – codingsoul : Intuition and discipline, coding for my soul. - March 17, 2016

    […] Certainly every queue has a posion queue beside it to keep messages for that processing failed even in repetition. Good article about tha topic is Poison queues are a must! […]

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.