Lessons Learned – On #Azure, Log Everything!
Log everything, I mean it! If it wasn’t logged, it never happened.
So you’re probably thinking “doesn’t logging tax an application’s performance?” Absolutely, some logging frameworks can even bring down your application under load. So you really need to be careful about what you log. Therefore, we should log everything that can help us figure out what went horribly wrong. I used the past tense, because on the cloud it’s normal to fail. Never build an application without thinking of Mean Time to Recovery. In other words, how will you recover and how long will it take. Sometimes, when you start on a fresh project, you can only plan for the obvious cases. That means that we have a lot to learn from a brand new application. So in order to facilitate this learning process, I urge you to log anything that can help the DevOps that will get a phone call late at night. Who knows… you might be the one supporting this application.
Take a moment and think about your system. What would you need to know, to be able to identify what went wrong?
Running applications on the cloud without meaningful logs is like flying an airplane without windows or instruments. We know we’re going somewhere… well we think we’re going somewhere. But really where are we going? Is the engine on? Are we climbing or descending? How high are we?
Are these questions making you uneasy?
Let’s think about our applications, do we really know what’s going on? Sure we have performance counters around CPU, Network, Memory and Disk utilization. But what kind of information does that really provide about our application? Knowing that a Virtual Machine (VM) is running, is like being on an airplane without knowing where we’re going. Having meaningful logs provides us with the insights required to know where we’re going.
In the past, I’ve had a few interesting learning experiences in production environments. Some were harder to overcome than others. After a while, you start having a 6th sense about what might have gone wrong. But going on a hunch isn’t enough. You need numbers!
This is where logging comes in. On Microsoft Azure, storage is dirty cheap so don’t worry about volume. When I build Web APIs I make sure to log all incoming requests including parameters, identities and execution times. Then, I log all inbound and outbound communications between components. I start by adding logging to an application’s business layer and then I move on to other components.
Keep in mind that some logging frameworks buckle under load. For example, writing logs to files usually results in performance issues caused by I/O bottlenecks. So I usually use ETW and the Semantic Logging Application Block – Out-of-process Service in order to dump events into an Azure Storage Table. Then I build DevOps solutions to analyse, interpret and archive the events.
The term Semantic Logging refers specifically to the use of strongly typed events and the consistent structure of the log messages in the Semantic Logging Application Block (SLAB).
Event Tracing for Windows (ETW)
The Patterns & Practices Team (P&P Team) at Microsoft is working hard to make Event Tracing for Windows (ETW) accessible for everyone. ETW is an efficient kernel-level tracing facility that enables profiler log kernel or application-defined events. This technology has been around since Windows 2000. It’s been used by Microsoft for troubleshooting and performance monitoring components of Windows. And by the way without SLAB, ETW is wicked hard to use.
The P&P Team has put a lot of effort in making an Application Block (Semantic Logging Application Block (SLAB)) for the Enterprise Library that helps us take advantage of this high speed logging mechanism. The following packages can be found on NuGet and are worth your time:
- Semantic Logging Application Block – provides a set of destinations (sinks) to persist application events published using a subclass of the EventSource class from the System.Diagnostics.Tracing namespace.
- Semantic Logging Application Block – Azure Sink – provides a set of destinations to persist events published by a subclass of EventSource, This add-on packages adds support to persisting log entries in Azure table storage.
- Semantic Logging Application Block – SQL Server Sink (On Microsoft Azure, I recommend using the Azure Sink) – provides a set of destinations to persist events published by a subclass of EventSource, This add-on packages adds support to persisting log entries in SQL Server and Azure SQL Database.
- Semantic Logging Application Block – Elasticsearch Sink – provides a set of destinations to persist events published by a subclass of EventSource, This add-on packages adds support to persisting log entries into Elasticsearch.
- Semantic Logging Application Block – Out-of-process Service – contains an out-of-proc Windows Service for SLAB. I highly recommend using this package to handle the listening and persistence process of Events. By having this out-of-process service, you have more chances of catching and persisting the Events that were written just before the application crashed. The out-of-process service probably won’t restart when your Microsoft Azure Role restarts because the Run method terminated.
How Does ETW Work?
Source: Logging What You Mean: Using the Semantic Logging Application Block
Taking a Look Inside ETW
Source: Getting Started With Performance Tracing – Part 1 – Event Tracing For Windows Demystified
An ETW controller can be any application or process that can initiate, control and use ETW.
ETW Event Trace Provider
An event trace provider has the functionality and logic for capturing log data for a certain aspect of the operating system or specific part of an application.
When an ETW controller enables one or more ETW event trace providers a session is automatically created. Within this session you can control how much data is stored in memory, when the data is flushed to disk etc.
Once a session is created a consumer can use that session to display the generated data in real time or the data is stored in a file for later viewing.
Find Out More about Logging and Telemetry
- Logging What You Mean: Using the Semantic Logging Application Block – The official guidance around SLAB provided by the Patterns & Practices Team.
- Event Tracing for Windows (ETW) in .NET – Learn about a tracing system that is dead simple to code, already contains much of what you want to trace and is so blazingly fast you can leave it in place while your code runs in production. This course shows you how to harness ETW (Event Tracing for Windows), .NET’s EventSource and Semantic Tracing design.
- Cloud Service Fundamentals – This sample demonstrates how to create database-backed Azure services from real-world customer learnings of the Azure Customer Advisory Team (CAT).
- Building Blocks of Great Cloud Applications – Learn the fundamental building blocks for scale-out Azure applications.
- Improve Debugging And Performance Tuning With ETW (2007)