Using SqlBulkCopy to Insert Massive Amounts of Data Into Windows Azure SQL Database

February 16, 2013 — 12 Comments

Recently I wrote about Inserting Large Amounts of Data Into Windows Azure SQL Database and this works well for reasonable amounts of data (5 to ~1000 inserts). Faced with a much bigger challenge which required a different approach, I started looking for new alternatives and found SqlBulkCopy.

The code from this Post is part of the Brisebois.WindowsAzure NuGet Package

To install Brisebois.WindowsAzure, run the following command in the Package Manager Console

PM> Install-Package Brisebois.WindowsAzure

Get more details about the Nuget Package.

SqlBulkCopy requires us to build a DataTable object which allows us to stream records from a DataTable to
SQL Database without being throttled. I used The Transient Fault Handling Application Block to provide resiliency against transient faults. Furthermore, locking the table during the operation will provide greater performance.

Performance will be affected by triggers and by indexes

The following code shows how to use the BulkWriter. The Dictionary is used to map columns from the DataTable to the actual table in SQL Database.The WriteWithRetries is not an async method because I usually execute this code within a Task which is already executing in parallel.

Before using the BulkWriter be sure to add a configuration setting to your Cloud Configurations for your Role or in your Web.Config or your App.Config. The configuration setting to add is “ConnectionString” containing a valid connection string for your SQL Database instance.

var bulk = new BulkWriter("EventLog", new Dictionary<string, string>
{
    {"Date", "Date"},
    {"EventId", "EventId"},
    {"Name", "Name"},
    {"Details", "Details"}
});

bulk.WriteWithRetries(table);

The table variable is a DataTable that contains  the mapped columns defined in the above Dictionary. Be sure that the column types from your DataTable match up with the column types in the SQL Database table.

BulkWriter

public class BulkWriter
{
    const int MaxRetry = 5;
    const int DelayMs = 100;

    private readonly string tableName;
    private readonly Dictionary<string, string> tableMap;
    private readonly string connString;

    public BulkWriter(string tableName,
                                Dictionary<string, string> tableMap)
    {
        this.tableName = tableName;
        this.tableMap = tableMap;

        // get your connection string
        connString = CloudConfigurationManager
            .GetSetting("ConnectionString");
    }

    public void WriteWithRetries(DataTable datatable)
    {
        TryWrite(datatable);
    }

    private void TryWrite(DataTable datatable)
    {
        var policy = MakeRetryPolicy();
        try
        {
            policy.ExecuteAction(() => Write(datatable));
        }
        catch (Exception ex)
        {

            //TODO: Add logging logic

            Trace.TraceError(ex.ToString());
            throw;
        }
    }

    private void Write(DataTable datatable)
    {
        // connect to SQL
        using (var connection =
            new SqlConnection(connString))
        {
            var bulkCopy = MakeSqlBulkCopy(connection);

            // set the destination table name
            connection.Open();

            using (var dataTableReader = new DataTableReader(datatable))
            {
                bulkCopy.WriteToServer(dataTableReader);
            }

            connection.Close();
        }
    }

    private RetryPolicy<SqlAzureTransientErrorDetectionStrategy> MakeRetryPolicy()
    {
        var fromMilliseconds = TimeSpan.FromMilliseconds(DelayMs);
        var policy = new RetryPolicy<SqlAzureTransientErrorDetectionStrategy>
            (MaxRetry, fromMilliseconds);
        return policy;
    }

    private SqlBulkCopy MakeSqlBulkCopy(SqlConnection connection)
    {
        var bulkCopy =
            new SqlBulkCopy
                (
                connection,
                SqlBulkCopyOptions.TableLock |
                SqlBulkCopyOptions.FireTriggers |
                SqlBulkCopyOptions.UseInternalTransaction,
                null
                )
            {
                DestinationTableName = tableName,
                EnableStreaming = true
            };

        tableMap
            .ToList()
            .ForEach(kp =>
            {
                bulkCopy
            .ColumnMappings
            .Add(kp.Key, kp.Value);
            });
        return bulkCopy;
    }
}

12 responses to Using SqlBulkCopy to Insert Massive Amounts of Data Into Windows Azure SQL Database

  1. 

    Very useful post. Keep going Alex !

    Like

  2. 

    Alex,
    thank you for your insights and code.

    “SqlBulkCopy requires us to build a DataTable object which allows us to stream records from a DataTable to
    SQL Database without being throttled.”

    How come that SqlBulkCopy is not being throttled?

    Like

    • 

      If you try to push too many rows in at once, you will time out. For the best results I recommend playing around with batch sizes. Remember, this only uses one connection and you stream the rows to the service.

      Like

  3. 

    i have 1000000 records in a datatable, how can Importing Data in Parallel with Sqlbulk copy.The record size is not flexible it may varying.

    Like

    • 

      Hi, by streaming the records you should be ok. If you hit a wall, I recommend breaking that single datatable into smaller datatables. I had much success with datatables of 200000 to 500000 records. Be sure to lock the table when you write.

      Liked by 1 person

  4. 

    thanks for your quick response :-) okey, how can split the datatable in parallel and push into the database, can you provide the some useful link

    Like

    • 

      What i usually do, is create a new data table instance. Configure it to my requires schema and copy a subset of rows to it. I then used this subset with the bulk copy. This allows me to be flexible on the dataset size and I can tweak batching via configurations.

      Like

      • 

        okey
        System.Data.DataTable dt;
        List tables = new List();
        tables.Add(GetDataFromExcelwithvalidation());

                Parallel.ForEach(tables, table =&gt;
                {
                    BulkLoadData(table);
                }
                );
        

        public void BulkLoadData(System.Data.DataTable dt)
        {

                using (SqlConnection conn = new SqlConnection(this.Connection))
                using (SqlBulkCopy bulkCopy = new SqlBulkCopy(conn, SqlBulkCopyOptions.TableLock, null))
                {
                    bulkCopy.DestinationTableName = destinatonTable;
                    bulkCopy.BulkCopyTimeout = 60;
                    bulkCopy.BatchSize = dt.Rows.Count;
                    foreach (ColumnMapping colMap in ImportColumnMapping)
                    {
                        bulkCopy.ColumnMappings.Add(colMap.SourceColumn, colMap.DestinationColumn);
                    }
                    conn.Open();
                    bulkCopy.WriteToServer(dt);
                    bulkCopy.Close();
                }
            }
        

        i have done like this,but performance is the matter,it taking 5o seconds for 500000 records to import.how to reduce the time, thats why i looking to split the datatable according to the record count and do the import

        Like

        • 

          Hi,

          yes I’ve seen it take up to minutes based on dataset size.
          Remember that it will take much more time if you have many indexes on the table.

          Like

          • 

            Hi,

            I am trying to Insert data in temp tables created in Azure DB thro SqlBulkCopy WriteToServer method.

            I had created temp tables and inseting data into it with this SqlBulkCopy mechanism.
            But i am getting the error “Cannot access destination table ‘#Numbers’.
            Reference to database and/or server name in ‘tempdb..sp_tablecollations_100’ is not supported in this version of SQL Server.”.
            Now this guy “tempdb..sp_tablecollations_100” internally used t global temp tables.

            But when I try the same using the regular approach ie using the insert query it works fine and it finds the temp table i created.
            As I had implemented the same code given above.

            I checked on net that Azure does not support global Temp tables, but the code given above is for azure DB only. So how come it is not working for me? Can you please tell me what I am missing here ?

            Like

  5. 

    I’m trying to use this solution but I have a row of extraneous nonsense in the first row, followed by a row of field names then data. How can I ignore that first row of nonsense?

    I’ve tried variations on:
    excelReader.IsFirstRowAsColumnNames = true;
    DataSet result = excelReader.AsDataSet();
    if (result.Tables[0].Rows[0].ItemArray[0].ToString() == “SomeStringICanFind”) { result.Tables[0].Rows[0].Delete(); result.AcceptChanges(); }

    It never seems to delete the row even though I watch it execute the delete and AcceptChanges. I can also watch the rowcount decrease by one but when the bulkcopy executes I get an error about missing columns (the file imports fine without that first row)

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.