Inserting Large Amounts of Data Into Windows Azure SQL Database

October 22, 2012 — 1 Comment

As some of you may have noticed, Windows Azure SQL Database is quite good at protecting itself. If you try to insert too many rows in a short period of time, it will simply close your connection and cause your transaction to roll back.

I was trying to insert close to 700 000 items into a table and was systematically getting cut off.

General Guidelines and Limitations (Windows Azure SQL Database)

Connection Constraints

Windows Azure SQL Database provides a large-scale multi-tenant database service on shared resources. In order to provide a good experience to all Windows Azure SQL Database customers, your connection to the service may be closed due to the following conditions:

  • Excessive resource usage
  • Connections that have been idle for 30 minutes or longer
  • Failover because of server failures

In my case, I was probably violating the first of the three reasons mentioned above. I tried to figure out how to go around this issue and after toying around with this problem for some time I came up with using a Table Value as a parameter for a Stored Procedure. I quickly found an example and came up with the following solution.

More information can be found in Windows Azure SQL Database Performance and Elasticity Guide

The degree to which a subscriber’s connectivity is blocked ranges from blocking inserts and updates only, to blocking all writes, to blocking all reads and writes. The time span for which throttling occurs is referred to as the Throttling Cycle and the duration of a Throttling Cycle is referred to as the Throttling Sleep Interval which is 10 seconds by default.

Throttling severity falls into one of two categories, Soft Throttling for “mildly exceeded” types and Hard Throttling for “significantly exceeded” types. Because significantly exceeded types pose a greater risk to overall system health, they are handled more aggressively than mildly exceeded types. Engine Throttling follows these steps to reduce load and protect system health:

  1. Determines the load reduction required to return the system to a healthy state.
  2. Marks subscriber databases that are consuming excessive resources as throttling candidates. If Engine Throttling is occurring due to a mildly exceeded type then certain databases may be exempt from consideration as throttling candidates. If Engine Throttling is due to a significantly exceeded type then all subscriber databases can be candidates for throttling with the exception of subscriber databases that have not received any load in the Throttling Cycle immediately preceding the current Throttling Cycle.
  3. Calculates how many candidate databases must be throttled to return the system to a healthy state by evaluating the historical resource usage patterns of the candidate databases.
  4. Throttles the calculated number of candidate databases until system load is returned to the desired level. Depending on whether throttling is Hard Throttling or Soft Throttling, the degree of throttling applied or the throttling mode, as described in Understanding Windows Azure SQL Database Reason Codes can vary. Any databases that are throttled remain throttled for at least the duration of one throttling cycle but throttling may often persist for multiple throttling cycles to return the system to a healthy state.

Keep in mind that you are sharing the following Database Server configuration.

As of this writing, each SQL Database computer is equipped with 32 GB RAM, 8 CPU cores and 12 hard drives. To ensure flexibility, each SQL Database computer can host multiple subscribers at a time.

Reusing the database from my previous post about using the exponential back-off strategy with Windows Azure SQL Database:

10-21-2012 9-34-49 PM

I created a User Defined Type within my SQL Database instance which held an index Name and Asset Id

CREATE TYPE [dbo].[AssetIndexTable] AS TABLE
(
 Name varchar(50),
 AssetId INT
)

Then I created a Stored Procedure that takes the AssertIndexTable as a parameter and inserts each row into the AssetIndex table.

CREATE PROCEDURE [dbo].[InsertAssetIndexes]
    @table as [AssetIndexTable] readonly
AS

insert into AssetIndex (Name, [AssetId], Created)
SELECT Name , [AssetId] , getdate() as Created from @table

Alright, now we’re ready to switch to C# and use the InsertAssetIndexes Stored Procedure. This test uses the ReliableModel that I built for a previous post to bundle all the retry policy for use with Entity Framwork. It also uses the ReliableSqlConnection which comes from the The Transient Fault Handling Application Block. This connection is used to apply a retry policy on the Stored Procedure call.

The DataTable is built to be an exact replica of the Table Value previously created in the Database instance. Once it has been filed with the desired information, I then call the Stored Procedure using an SqlCommand.

Note: If you are sending a large DataTable, the Stored Procedure can take some time to execute. With 700 000 rows the following test took about 5 minutes to complete.

[TestMethod]
public void InsertAssetIndexesStoredProcedureTest()
{
    ReliableModel.Do<FaultTestModel>(model =>
    {
        var asset = new Asset
                        {
                            Name = "Indexed",
                            Created = DateTime.Now,
                        };

        model.Assets.Add(asset);
        model.SaveChanges();
    });

    Console.WriteLine("Created");

    var a = ReliableModel.Query<FaultTestModel, Asset>
        (model =>
                {
                    return model.Assets.First();
                });

    Console.WriteLine("found");

    var dt = new DataTable();
    dt.Columns.Add("Name", typeof(string));
    dt.Columns.Add("AssetId", typeof(int));
    Enumerable.Range(0,700000)
        .ToList()
        .ForEach(i=>
        {
            dt.Rows.Add("IndexEntry"+i,a.Id)
        });

    Console.WriteLine("Built results");

    using(var connection = new ReliableSqlConnection(
        "Server=tcp:********.database.windows.net," +
        "1433;Database=sql-azure;" +
        "User ID=****@*****;" +
        "Password=password;" +
        "Trusted_Connection=False;" +
        "Encrypt=True;" +
        "Connection Timeout=120;", 
        RetryPolicy.DefaultExponential,
        RetryPolicy.DefaultExponential))
    {
        connection.Open();
        using (var cmd = connection.CreateCommand())
        {
            cmd.CommandText = "InsertAssetIndexes";
            cmd.CommandType = CommandType.StoredProcedure;

            cmd.Parameters.AddWithValue("@table", dt);
            cmd.ExecuteScalar();
        }
    }

    Console.WriteLine("Inserted");

    var count = ReliableModel.Query<FaultTestModel, int>
        (model =>{
                   return model.AssetIndexes.Count();
                 });

    Console.WriteLine(count);
}

The following TestInitialize method is used to be sure the test always runs on a clean database.

[TestInitialize]
public void PrepareDatabase()
{
    //Clear tables
    ReliableModel.Do<FaultTestModel>(model =>
    {
        model.Database.ExecuteSqlCommand("DELETE FROM AssetIndex");
        model.Database.ExecuteSqlCommand("DELETE FROM Asset");
    });
}

One response to Inserting Large Amounts of Data Into Windows Azure SQL Database

  1. 

    beautiful explanation.love it.Great job

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.