You need to update one or more columns in a table with millions of records with data from another table, what’s a good way of going this with low risk and in an optimized way? One of my favorite ways of dealing with millions of records in a table is processing inserts, deletes, or updates in batches.
Updating data in batches of 10,000 records at a time and using a transaction is a simple and efficient way of performing updates on millions of records. The key is to batch out your data to allow you to process a smaller number of records at a time.
use DatabaseName go DECLARE @batchId INT DECLARE @batchSize INT DECLARE @results INT SET @results = 1 SET @batchSize = 10000 SET @batchId = 0 -- when 0 rows returned, exit the loop WHILE (@results > 0) BEGIN BEGIN TRAN; UPDATE Table1 SET columnA = Table2.columnA FROM Table2 INNER JOIN Table1 ON Table2.Id = Table1.Id WHERE Table1.columnA is null AND (Table2.Id > @batchId AND Table2.Id <= @batchId + @batchSize) SET @results = @@ROWCOUNT -- next batch SET @batchId = @batchId + @batchSize COMMIT TRAN; END -- the sql below is just to measure the performance of this update, it is not needed to update your data. DECLARE @startTime DATETIME SELECT @startTime = GETDATE() SET STATISTICS PROFILE ON SELECT COUNT(*) FROM Table1 first OPTION (RECOMPILE) SET STATISTICS PROFILE OFF SELECT 'It took ' + CONVERT(VARCHAR(8), DATEDIFF(ms, @startTime, GETDATE())) + ' ms' GO
In the query above we declare the following variables to do the batching:
- @results – this variable will hold the number of records updated; when zero, the query will stop. This is to avoid an infinite loop.
- @batchId – this is set to zero initially, and it is used to compare the table id against it, and after each update, it is set to the id plus the batch size. This allows splitting the millions of records into batches.
- @batchSize – the number of records to update at a time.
It is important to know that the above algorithm will only work when your table’s primary keys are of type int. If your table’s primary keys are guids, the above will not work.
The section at the end of the query is only used to review the performance of your updates, it is not necessary to update the data and it is a nice way to see how the query performs overall.
I hope the information in this post is helpful, and it helps you learn a simple way to update millions of records in batches, making it easier and lower risk than attempting to do it all at once.