ADO.NET 2.0 Tutorial : SqlBulkCopy Revisited for Transferring Data at High Speeds
by David Hayden ( .NET Developer )
I answered a question in the MSDN Forums today regarding SqlBulkCopy and for some reason I was under the impression that you had to give SqlBulkCopy an IDataReader as a source of data. I don't have a clue as to where that little bit of nonsense came from :)
It turns out the SqlBulkCopy.WriteToServer Method, which is responsible for writing the data to a database table, will accept the following as a source of data:
- DataTable
- DataRow[]
- IDataReader
So, if you have a DataTable to begin with, no sense using the DataTable.CreateDataReader() method when SqlBulkCopy will accept a DataTable directly. Here is a bit of code, but note that I would not waste the time in populating a DataTable if you are pulling the records from an existing database. The question in the MSDN Forums had to deal with the records already existing in a DataTable and I am just creating a DataTable to show it as a data source. See my other post:
for another example.
// Get data in a DataTable
DataTable table = new DataTable();
string northwindConnectionString = "...Northwind...";
using (SqlConnection connection =
new SqlConnection(northwindConnectionString))
{
using (SqlCommand command = connection.CreateCommand())
{
command.CommandText = "SELECT * FROM Categories";
connection.Open();
using (IDataReader dr = command.ExecuteReader
(CommandBehavior.CloseConnection))
{
table.Load(dr);
}
}
}
// Upload DataTable to a Database Table
string destinationConnectionString = "...";
using (SqlBulkCopy copy =
new SqlBulkCopy(destinationConnectionString))
{
copy.DestinationTableName = "Categories";
copy.WriteToServer(table);
}
While I am talking about SqlBulkCopy again, I should mention Pablo Castro's, the Program Manager for the ADO.NET Team at Microsoft, great explanation as to why SqlBulkCopy is faster than using normal insert commands:
"There are a number of reasons why bulk-copy is faster. Here is a summary:
- No per-row statement execution. When you do multiple inserts without bulk-copy, each insert is a statement in itself (regardless of whether it's batched together with other statements). With bulk-copy, we don't incur the cost of executing a statement for each row, the whole copy operation is a single thing.
- No multiple network round-trips. Once the bulk-insert operation is setup, we send rows from the client to the server continously, without going back-and-forth over the wire.
- Server storage engine also can greatly optimize how rows are inserted when performing a bulk-copy operation. How much can be optimized depends a lot on the recovery model the tarder database is set to; in "simple" and "bulk logged" the overhead of logging is greatly reduced during bulk-copy operations, helping a lot with performance.
Now, as to "how" that happens, it's hard to describe without going down to the details of the SQL Server client-server protocol. The short story is that we setup a BCP operation by sending a special statement to the server that includes metadata about the row-set we're about to set; that switches the session to bulk-copy mode, at which point the client and start sending the row stream to the server one after the other; as rows come the server does minimal processing in the upper layers, rows go straight to the storage layer. Once the client is done it sends a "done" marker and the operation is completed in the server. This operation can happen in "batches" of rows or all at once." - Pablo Castro
ADO.NET 2.0 is just plain cool. I could talk about it for days :)
Source: David Hayden ( .NET Developer )