flying blind

Lessons Learned – On #Azure, Log Everything!


Log everything, I mean it! If it wasn’t logged, it never happened.

So you’re probably thinking “doesn’t logging tax an application’s performance?” Absolutely, some logging frameworks can even bring down your application under load. So you really need to be careful about what you log. Therefore, we should log everything that can help us figure out what went horribly wrong. I used the past tense, because on the cloud it’s normal to fail. Never build an application without thinking of Mean Time to Recovery. In other words, how will you recover and how long will it take. Sometimes, when you start on a fresh project, you can only plan for the obvious cases. That means that we have a lot to learn from a brand new application. So in order to facilitate this learning process, I urge you to log anything that can help the DevOps that will get a phone call late at night. Who knows… you might be the one supporting this application.

Take a moment and think about your system. What would you need to know, to be able to identify what went wrong?

Running applications on the cloud without meaningful logs is like flying an airplane without windows or instruments. We know we’re going somewhere… well we think we’re going somewhere. But really where are we going? Is the engine on? Are we climbing or descending? How high are we?

Are these questions making you uneasy?

Let’s think about our applications, do we really know what’s going on? Sure we have performance counters around CPU, Network, Memory and Disk utilization. But what kind of information does that really provide about our application? Knowing that a Virtual Machine (VM) is running, is like being on an airplane without knowing where we’re going. Having meaningful logs provides us with the insights required to know where we’re going.

In the past, I’ve had a few interesting learning experiences in production environments. Some were harder to overcome than others. After a while, you start having a 6th sense about what might have gone wrong. But going on a hunch isn’t enough. You need numbers!

This is where logging comes in. On Microsoft Azure, storage is dirty cheap so don’t worry about volume. When I build Web APIs I make sure to log all incoming requests including parameters, identities and execution times. Then, I log all inbound and outbound communications between components. I start by adding logging to an application’s business layer and then I move on to other components.

Keep in mind that some logging frameworks buckle under load. For example, writing logs to files usually results in performance issues caused by I/O bottlenecks. So I usually use ETW and the Semantic Logging Application Block – Out-of-process Service in order to dump events into an Azure Storage Table. Then I build DevOps solutions to analyse, interpret and archive the events.

The term Semantic Logging refers specifically to the use of strongly typed events and the consistent structure of the log messages in the Semantic Logging Application Block (SLAB).

Event Tracing for Windows (ETW)

The Patterns & Practices Team (P&P Team) at Microsoft is working hard to make Event Tracing for Windows (ETW) accessible for everyone. ETW is an efficient kernel-level tracing facility that enables profiler log kernel or application-defined events. This technology has been around since Windows 2000. It’s been used by Microsoft for troubleshooting and performance monitoring components of Windows. And by the way without SLAB, ETW is wicked hard to use.

The P&P Team has put a lot of effort in making an Application Block (Semantic Logging Application Block (SLAB)) for the Enterprise Library that helps us take advantage of this high speed logging mechanism. The following packages can be found on NuGet and are worth your time:

  • Semantic Logging Application Blockprovides a set of destinations (sinks) to persist application events published using a subclass of the EventSource class from the System.Diagnostics.Tracing namespace. 
  • Semantic Logging Application Block – Azure Sink provides a set of destinations to persist events published by a subclass of EventSource, This add-on packages adds support to persisting log entries in Azure table storage.
  • Semantic Logging Application Block – SQL Server Sink (On Microsoft Azure, I recommend using the Azure Sink) – provides a set of destinations to persist events published by a subclass of EventSource, This add-on packages adds support to persisting log entries in SQL Server and Azure SQL Database.
  • Semantic Logging Application Block – Elasticsearch Sinkprovides a set of destinations to persist events published by a subclass of EventSource, This add-on packages adds support to persisting log entries into Elasticsearch.
  • Semantic Logging Application Block – Out-of-process Servicecontains an out-of-proc Windows Service for SLAB. I highly recommend using this package to handle the listening and persistence process of Events. By having this out-of-process service, you have more chances of catching and persisting the Events that were written just before the application crashed. The out-of-process service probably won’t restart when your Microsoft Azure Role restarts because the Run method terminated.

How Does ETW Work?

Source: Logging What You Mean: Using the Semantic Logging Application Block

Taking a Look Inside ETW

Source: Getting Started With Performance Tracing – Part 1 – Event Tracing For Windows Demystified

  • ETW Controller
    • An ETW controller can be any application or process that can initiate, control and use ETW.
  • ETW Event Trace Provider
    • An event trace provider has the functionality and logic for capturing log data for a certain aspect of the operating system or specific part of an application.
  • ETW Session
    • When an ETW controller enables one or more ETW event trace providers a session is automatically created. Within this session you can control how much data is stored in memory, when the data is flushed to disk etc.
  • ETW Consumer
    • Once a session is created a consumer can use that session to display the generated data in real time or the data is stored in a file for later viewing.

Find Out More about Logging and Telemetry

Trackbacks and Pingbacks:

  1. Using Time-based Partition Keys in #Azure Table Storage « Alexandre Brisebois - June 16, 2014

    […] method, can be used to calculate the Partition Keys required to query against logs persisted by the Semantic Logging Application Block (SLAB). The following is a query against Azure Table Storage, that I would use to extract events […]

    Like

  2. Reading Notes 2014-06-23 | Matricis - June 25, 2014

    […] Lessons Learned – On #Azure, Log Everything!s (Alexandre Briseboi) – Great post about how we should use the Event Trace Windows (ETW) to do our logs. […]

    Like

  3. From my reading list #13 – June 30th, 2014 | Pascal Laurin - June 30, 2014

    […] Lessons Learned – On #Azure, Log Everything! by Alexandre Brisebois […]

    Like

  4. Andrei Dzimchuk's | Insuring proper payload when using Semantic Logging Application Block - July 15, 2014

    […] where we create cloud services running on Azure. It’s important to for publicly faced services to log everything so it is possible when we analyze that data to have a decent idea of what happened and how our […]

    Like

  5. Andrei Dzimchuk's | Insuring proper payload when using Semantic Logging Application Block - July 29, 2014

    […] where we create cloud services running on Azure. It’s important for publicly faced services to log everything so it is possible when we analyze that data to have a decent idea of what happened and how our […]

    Like

  6. HTTP/1.1 503 Service Unavailable – Lessons Learned on #Azure « Alexandre Brisebois - July 30, 2014

    […] Lessons Learned – On #Azure, Log Everything! […]

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s