Putting your database on a diet doesn’t mean getting rid of your data. It means taking a closer look at how you store and consume your data. I recently asked if your data belonged in a database. In this post I will summarize conclusions pulled from my recent experiences.
The size of my database went from 40 GB to 10 GB. Consequently reducing the yearly
SQL Database operational costs by $719.28!
Working on a project that requires more than 15 years of historical data, I was faced with interesting challenges. The first was getting the data into SQL Database. To overcome this hurdle I used SQL Bulk Copy and ended up with
a 40 GB SQL Database.
A 40 GB SQL Database costs about $105.90 a month which comes to approximately $1270.80 a year! This operational cost is for the database alone without counting bandwidth and compute costs.
At this point, I started to question whether the data belonged in the database. To my surprise most of the data wasn’t used. Furthermore, the queries didn’t require the 15 years of data to be stored in the database. At best, the application only required a year’s worth of data. Looking over the other tables in the database, I also noticed that most of them were redundant or simply didn’t belong in the database.
Following these observations, I started to remove tables from the database and storing the data in blob storage and in table storage. Most of the data ended up in blob storage, where my worker roles could read the data into memory and use it as reference data.
The complete 15 years worth of data was moved to blob storage. This allows my worker roles to access a client’s complete 15 year history and generate statistics that can be inserted into SQL Database. Making this information available through SQL Database enables clerks to search for client profiles based on statistics without having to pay for a bloated database.
The SQL Database originally contained close to 15 tables. Once I finished moving data out of the database, I was left with 4 tables and smaller data set to work with. Reducing the amount of data in my tables also brought interesting consequences. It meant that my indexes required less disk space. It also meant that my queries ran faster when a full table scan was required.
The size of the database went from 40 GB to 10 GB. Consequently reducing the yearly operational costs by $719.28. The new yearly costs for the SQL Database is now $551.52.
The yearly storage cost for the data that was moved out of SQL Database was approximately $48.00. Since a lot of the space used up in the SQL Database was for indexes, the size of the data stored in the storage account was approximately 14 GB. The cost of operation mentioned for the storage account also includes transaction costs.
The yearly operational costs related to data went down from $1270.80 to $599.52 (The new yearly operational cost includes the SQL Database and the Storage Account).
Conclusions Pulled from this Experience
Data doesn’t need to be in SQL Database to be available. Often moving the data out of SQL Database will dramatically improve the overall performance of your worker roles. It will also allow for a greater level of parallelism by removing locks. Data in blob storage can be access by many worker roles in parallel without side effects. Furthermore, reading from blob storage can be done over HTTP, effectively removing the encryption overhead imposed by HTTPS for the SQL Database connection.
I use SQL Database for what it’s best at, I use it to query over relational data. If the data isn’t relational, then I opt for table storage, which allows me to query over my data without having to rely on SQL Database.
In most circumstances I try to store data in blobs. Its convenient, fast and reliable! Consequently, my applications scale with ease and I am not required to resort to Federations in Windows Azure SQL Database to satisfy the application’s demand.