Reaching Performance Targets on Azure by Preparing and Pre-Calculating Data

August 8, 2013 — 1 Comment

imagesZHI8BPW2Azure is an amazing platform to work with! It creates interesting possibilities by offering a wide range of services that can be leveraged in order to produce exceptional end-user experiences.

Azure is built on commodity hardware and allows you to build elastic services that scale out in order to maintain a constant end-user experience. This is especially interesting when you want to start small and grow according to the demand for your services.

Enchanting as this may be, this is where many projects go wrong. Everything on the cloud has a price. On the cloud details matter! So choose the right architecture and be very cautious about the resources your application consumes.

To identify right architecture for your project, there are a few questions that need to be asked to the project owner:

  • Does your data really belong in a database?
  • On average, how much CPU is used to satisfy the application’s needs?
  • How fresh does the data have to be? Can the data be a few minutes old?
  • Define what real-time means for your organization, does it mean to the hour, to the minute or to the second?
  • How much bandwidth is required to adequately serve your end-users?
  • On average how many concurrent users does your application need to support?
  • How fast does you application need to respond to end-users? Does it absolutely need to respond under 30 milliseconds or is anything under 1 second acceptable?

Prepare And Pre-Calculate Data

Preparing data for the end-user’s consumption can be as simple as creating a view in the SQL Database in order to reduce the number of joins necessary to execute a query. It can also be achieved by generating files or cached responses for common queries. By preparing data ahead of time, you are effectively relieving the CPU from having to query, transform and package the same data for every request created by the end-user.

Pre-calculating data is also quite effective at relieving CPUs. Imagine a system where end-users need to query for statistics about a production line. By pre-calculating all the statistic values and by storing them in a database, the end-users are capable of querying for specific items based on statistical values. Calculating and comparing statistical values for each query isn’t very efficient and should be avoided at all costs!

The important thing to remember here, is that we can fake performance by preparing and
pre-calculating data for end-users, in turn reducing the amount of resources required to run the solution.

Reaching Performance Targets

From the flavor of SQL Database to the your database Schema, performance targets give you a baseline that will direct all sorts of decisions. Defining them early in the planning phase is extremely important!

Reaching performance targets can sometimes require a bit of creativity. For example, in some circumstances its favorable to store metadata in a SQL Database table and include a binary column that contains a serialized object graph that can be retrieved when needed. This kind of optimization is aligned with the best practices that surround SQL.

A database that is used to write lots of data should be quite normalized, this will generally provide better performance.

AND

A database that is used to read lots of data should be denormalized as much as possible in order to limit the number of joins required to extract meaningful data.

To accomplish this within the same database, I started to use Views in order to abstract my write schema from my read schema. I create views that are adapter to the application’s needs. Greatly reducing the complexity of my SQL Queries generated by Entity Framework, I am able to quickly retrieve data from the database.

Relying on Views also brings extra flexibility. You can create optimized views for all your queries, consequently reducing the average time an end-user waits for data.

Storing an object graph in a binary column in SQL Database to Azure Blob Storage Service. Doing so will require your application to make extra calls to fetch the object graph from the Azure Blob Storage Service, but it will reduce the amount of data being returned by SQL Database. Reducing the amount of data coming from SQL Database will reduce its IO.

Since Azure Blob Storage is built to scale, requesting multiple blobs in parallel will not impact queries from other users running on SQL Database. Furthermore, preparing and storing data in Azure Blob Storage Service has some benefits that we rarely exploit. For example, It allows us to version data. It also allows us to distribute documents through the Azure CDN, bringing data closer to our end-users and reducing network latency.

Reaching performance targets isn’t about paying for the largest Virtual Machine instance and hoping for the best. It’s about choosing the right services and distributing the load. Because much of the data we consume usually doesn’t change that often, it’s an excellent opportunity to pre-crunch the data. To transform, and prepare it for the end-user are it changes. Command and Query Responsibility Segregation (CQRS) and Event Sourcing (ES) lend themselves really well to these types of scenarios. They also open up many possibilities like further analysis through Big Data, a history of all changes that happen within a system and even a simple way to rebuild your databases without losing any data.

Command-Query Responsibility Segregation (CQRS) is a way of designing and developing scalable and robust enterprise solutions with rich business value.

In an oversimplified manner, CQRS separates commands (that change the data) from the queries (that read the data). This simple decision brings along a few changes to the classical architecture with service layers along with some positive side effects and opportunities. Instead of the RPC and Request-Reply communications, messaging and Publish-Subscribe are used. [More]

Getting Started with CQRS

Caching results from SQL Database queries can effectively increasing your application’s through put and possibly reduce its operational costs. The Azure Caching Services are able to handle a greater number of concurrent requests than is possible by the Azure SQL Database. It’s also an essential part of any solution whose goal is to offer a constant end-user experience.

By reducing the number of queries that actually make their way to the SQL Database, you are reducing the application’s performance targets for the SQL Database. Consequently, you can probably use a non-premium
SQL Database to satisfy your end-users needs. As it can be observed from the scenarios in my previous post “Azure SQL Database VS SQL Server in Azure Virtual Machines VS Alternatives” , a solution that requires a premium SQL Database is usually more expensive than one that doesn’t.

Have you reached your applications’ performance targets by preparing and pre-calculating data?

Trackbacks and Pingbacks:

  1. Microsoft #Azure – Where Should I Start? « Alexandre Brisebois - June 30, 2014

    […] Reaching performance targets on Azure by preparing and pre-calculating data […]

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s