Does Your Data Really Belong in Your SQL Database?
February 15, 2013 6 Comments
When we move applications to the cloud, there are a few questions that absolutely need our attention. One of these questions comes down to database design. Do you really need all that data in your SQL Database?
The cloud provides many storage alternatives to SQL Database. Surprisingly, many of them can offer a noticeable performance boost. These alternatives offer low cost solutions that are scalable, available, durable, secure and if used properly they can drastically improve the global end user experience!
Consider storing your data in blob storage. Blob Storage Service offers a highly scalable solution that resembles a file system without having to deal with files being locked by other processes. The service allows you to easily control access to independent blobs. It also allows you to distribute your content through the Windows Azure Content Delivery Network (CDN) effectively reducing network latency by placing copies of your content closer to your users.
In many cases data is isolated and doesn’t require querying capabilities. When you are dealing with these silos of information, can easily process your data in parallel. Unfortunately, SQL Database is extremely good at protecting itself from abuse. Often applications will require Federations in Windows Azure SQL Database to satisfy the demand.
- Continuously querying for the same data
- Complex queries whose results don’t change often
- Complex report queries that return large datasets
- Queries on reference data that return a few hundred records
In most of these situations, what you want to do is relieve pressure on the SQL Database by pre-executing these queries and storing the results in Blob Storage Service. Blobs can take many forms. I regularly use Json or the ADO.Net DataTable. I recently rediscovered the benefits of using the DataTable, you can read all about it in my previous blog post. Furthermore, if you don’t have to worry about concurrent modification operations on your data, you can use a DataTable stored in Blob Storage Service to store your data. For example, I have a field employees who record data on a mobile devices. The devices regularly push modifications to Windows Azure through Blobs. Then they place tasks on Queue Storage Service. A Worker Role then processes the message and updates each DataTable in Blob Storage Service. By not using SQL Database, the update process can be scaled out and I can have many Worker Role instances updating individual Blobs. If I hire more employees, I can add more instances and my services will not slow down. Just to be clear, adding more instances of a Worker Role will not speed up processing! It will maintain the expected performance.
The previous scenario is great when you can control concurrent updates to data. If your Blobs are compressed, then you can expect them to be transferred quite fast over the internal network in Windows Azure.
Other benefits to using Blobs
- You Blobs can be versioned through Snap Shots
- They can be distributed over Windows Azure CDN
- They can be Cached using standard HTTP Caching Strategies.
- You can control access
- They are replicated at least 3 times at all times
- If you have a higher volume of traffic for a specific Blob, it will scale automatically
- Its dirty cheap!
- Storing and querying over log data
- Storing TBs of structured data
- Storing a denormalized version of your data for blazing fast access
- Quickly querying data using a clustered index
- Accessing data using the OData protocol
When dealing with structured, non-relational data, it can be extremely cost effective to use the Windows Azure
Table Storage Service. This is Microsoft’s NoSQL datastore, which accepts authenticated calls from inside and outside the Windows Azure.
As mentioned above, Table Storage Service are great for logging or any type of data that can be accessed by a Partition Key and by Primary key. On many occasions I use Table Storage Service to create indexes where I can keep track of Blobs. This solution is very scalable and very fast!
Table Storage Service uses Partitions and require your entities to have Primary Keys which need to be unique within its Partition. The service will shard and scale automatically using your Partitions to redistribute the load over many service instances. This also means that your partitioning strategy is very important. You will give up a lot of potential performance if your data isn’t partitioned properly. Take time to thinking about your Partition Keys and Primary Keys because they make up a Clustered Indexed Key. Querying over these two columns is extremely efficient. Including other columns in your query will force a complete table scan which comes at a cost.
Don’t take this the wrong way, you can query and filter by other columns than the Partition Key and the Primary Key, but it will be slower. If you dataset is small, they you may not even notice it. But as your data grows, your queries will slow down by more than 40%.
Putting it all together
When it comes to storing data on Windows Azure, it’s quite important to take some time and think about all the services that are offered. The best way to create a data strategy is to analyze how data is consumed and by orchestrating the data strategy around the conclusions pulled from the initial analysis. The first iteration will probably not yield the ultimate data strategy, but it will get you closer to your goals. Future analysis will yield new strategies based on your actual needs. Be sure to iterate through this process as your application grows.
In most systems I’ve built so far, the best way to get performance up to par, was to pre-assemble and pre-calculate data. This not only relieved our databases, it made them more responsive. The system as a whole felt better. Users seemed to be happier because they didn’t have to wait for data to be found or calculated. They clicked and data was just there ready for them. Reducing delays is your best friend when it comes to convincing end users that YOU have the best solution for them. In some cases I was able to take queries that took well over 50 seconds to complete and reduce the delays le less than 200ms.
Good data strategies benefit by exploiting the Windows Azure Data Services accordingly. Each service has strengths and weaknesses. Make sure you take time to become familiar with them. Doing so will help you make the right choices. The services are continuously being updated with loads of new features.
Best practices say that Windows Azure Data Services all have specific roles and responsibilities
- SQL Database stores relational data. Data that will be queried to answer questions,
- Blob Storage Service stores object graphs of pre-assembled / pre-calculated / static reference data.
- Table Storage Service stores non-relational structured data
Taking all this into consideration, does your data really belong in your SQL Database?