Recently I wrote about Inserting Large Amounts of Data Into Windows Azure SQL Database and this works well for reasonable amounts of data (5 to ~1000 inserts). Faced with a much bigger challenge which required a different approach, I started looking for new alternatives and found SqlBulkCopy.
The code from this Post is part of the Brisebois.WindowsAzure NuGet Package
To install Brisebois.WindowsAzure, run the following command in the Package Manager Console
PM> Install-Package Brisebois.WindowsAzure
SqlBulkCopy requires us to build a DataTable object which allows us to stream records from a DataTable to
SQL Database without being throttled. I used The Transient Fault Handling Application Block to provide resiliency against transient faults. Furthermore, locking the table during the operation will provide greater performance.
Performance will be affected by triggers and by indexes
The following code shows how to use the BulkWriter. The Dictionary is used to map columns from the DataTable to the actual table in SQL Database.The WriteWithRetries is not an async method because I usually execute this code within a Task which is already executing in parallel.
Before using the BulkWriter be sure to add a configuration setting to your Cloud Configurations for your Role or in your Web.Config or your App.Config. The configuration setting to add is “ConnectionString” containing a valid connection string for your SQL Database instance.
var bulk = new BulkWriter("EventLog", new Dictionary<string, string> { {"Date", "Date"}, {"EventId", "EventId"}, {"Name", "Name"}, {"Details", "Details"} }); bulk.WriteWithRetries(table);
The table variable is a DataTable that contains the mapped columns defined in the above Dictionary. Be sure that the column types from your DataTable match up with the column types in the SQL Database table.
BulkWriter
public class BulkWriter { const int MaxRetry = 5; const int DelayMs = 100; private readonly string tableName; private readonly Dictionary<string, string> tableMap; private readonly string connString; public BulkWriter(string tableName, Dictionary<string, string> tableMap) { this.tableName = tableName; this.tableMap = tableMap; // get your connection string connString = CloudConfigurationManager .GetSetting("ConnectionString"); } public void WriteWithRetries(DataTable datatable) { TryWrite(datatable); } private void TryWrite(DataTable datatable) { var policy = MakeRetryPolicy(); try { policy.ExecuteAction(() => Write(datatable)); } catch (Exception ex) { //TODO: Add logging logic Trace.TraceError(ex.ToString()); throw; } } private void Write(DataTable datatable) { // connect to SQL using (var connection = new SqlConnection(connString)) { var bulkCopy = MakeSqlBulkCopy(connection); // set the destination table name connection.Open(); using (var dataTableReader = new DataTableReader(datatable)) { bulkCopy.WriteToServer(dataTableReader); } connection.Close(); } } private RetryPolicy<SqlAzureTransientErrorDetectionStrategy> MakeRetryPolicy() { var fromMilliseconds = TimeSpan.FromMilliseconds(DelayMs); var policy = new RetryPolicy<SqlAzureTransientErrorDetectionStrategy> (MaxRetry, fromMilliseconds); return policy; } private SqlBulkCopy MakeSqlBulkCopy(SqlConnection connection) { var bulkCopy = new SqlBulkCopy ( connection, SqlBulkCopyOptions.TableLock | SqlBulkCopyOptions.FireTriggers | SqlBulkCopyOptions.UseInternalTransaction, null ) { DestinationTableName = tableName, EnableStreaming = true }; tableMap .ToList() .ForEach(kp => { bulkCopy .ColumnMappings .Add(kp.Key, kp.Value); }); return bulkCopy; } }
Very useful post. Keep going Alex !
LikeLike
Alex,
thank you for your insights and code.
“SqlBulkCopy requires us to build a DataTable object which allows us to stream records from a DataTable to
SQL Database without being throttled.”
How come that SqlBulkCopy is not being throttled?
LikeLike
If you try to push too many rows in at once, you will time out. For the best results I recommend playing around with batch sizes. Remember, this only uses one connection and you stream the rows to the service.
LikeLike
i have 1000000 records in a datatable, how can Importing Data in Parallel with Sqlbulk copy.The record size is not flexible it may varying.
LikeLike
Hi, by streaming the records you should be ok. If you hit a wall, I recommend breaking that single datatable into smaller datatables. I had much success with datatables of 200000 to 500000 records. Be sure to lock the table when you write.
LikeLiked by 1 person
thanks for your quick response :-) okey, how can split the datatable in parallel and push into the database, can you provide the some useful link
LikeLike
What i usually do, is create a new data table instance. Configure it to my requires schema and copy a subset of rows to it. I then used this subset with the bulk copy. This allows me to be flexible on the dataset size and I can tweak batching via configurations.
LikeLike
okey
System.Data.DataTable dt;
List tables = new List();
tables.Add(GetDataFromExcelwithvalidation());
public void BulkLoadData(System.Data.DataTable dt)
{
i have done like this,but performance is the matter,it taking 5o seconds for 500000 records to import.how to reduce the time, thats why i looking to split the datatable according to the record count and do the import
LikeLike
Hi,
yes I’ve seen it take up to minutes based on dataset size.
Remember that it will take much more time if you have many indexes on the table.
LikeLike
Hi,
I am trying to Insert data in temp tables created in Azure DB thro SqlBulkCopy WriteToServer method.
I had created temp tables and inseting data into it with this SqlBulkCopy mechanism.
But i am getting the error “Cannot access destination table ‘#Numbers’.
Reference to database and/or server name in ‘tempdb..sp_tablecollations_100’ is not supported in this version of SQL Server.”.
Now this guy “tempdb..sp_tablecollations_100” internally used t global temp tables.
But when I try the same using the regular approach ie using the insert query it works fine and it finds the temp table i created.
As I had implemented the same code given above.
I checked on net that Azure does not support global Temp tables, but the code given above is for azure DB only. So how come it is not working for me? Can you please tell me what I am missing here ?
LikeLike
Hi, you may want to write to a real table instead. Temp tables may not perform as you may expect. Depending on the SKU that you are using for your SQL Database, the TEMP DB may not be dedicated to your workload.
LikeLike
I’m trying to use this solution but I have a row of extraneous nonsense in the first row, followed by a row of field names then data. How can I ignore that first row of nonsense?
I’ve tried variations on:
excelReader.IsFirstRowAsColumnNames = true;
DataSet result = excelReader.AsDataSet();
if (result.Tables[0].Rows[0].ItemArray[0].ToString() == “SomeStringICanFind”) { result.Tables[0].Rows[0].Delete(); result.AcceptChanges(); }
It never seems to delete the row even though I watch it execute the delete and AcceptChanges. I can also watch the rowcount decrease by one but when the bulkcopy executes I get an error about missing columns (the file imports fine without that first row)
LikeLike