Recently, we had to make some space available in one of our SQL Express instances that was getting close to its 10 GB limit of stored data, so I set out to delete some old data from two of our largest tables. One contained about half a million rows, the other a bit over 21 million.
Simple Deletion Would Take… Forever
The simplest sql statement to delete all rows that were created before 2012 would be the following:
- DELETE FROM [dbo].[Message] WHERE DateCreated < '20120101'
I can't even tell you how long this took because at 14 minutes I just cancelled the query execution (which took another 7 minutes to finish). This was the table with less than 500,000 rows where we wanted to delete a bit more than 200,000 rows.
Break Delete Operation Into Chunks
Searching for a solution to the problem, I came across this blog post on breaking large delete operations into chunks. It shows in good detail how the simple version above behaves against running a loop of a few tens of thousand deletes per iteration. An interesting aspect I hadn't thought of at that point was the transaction log growth that can become a problem with large delete operations. Running a loop allows you to do a log backup (in full recovery mode) or a checkpoint (in simple mode) at the end of each iteration so that the log will grow much more slowly.
Unfortunately, though, this didn't help with the execution time of the delete itself, as you can also see from the graphs presented in above post.
Disable Those Indexes!
It turns out, our [Message] table had six non-clustered indexes on them which all had to be written to for every row that was deleted. Even if those operations are fast, their processing time will add up over a few hundred thousand iterations. So let's turn them off! In fact, let's turn only those off that won't be used during out delete query. [We have one index on the DateCreated column which will be helpful during execution.]
This stackoverflow answer shows how to create some dynamic SQL to disable all non-clustered indexex in a database. I've modified it slightly to disable only indexes of a given table:
Disable/Enable Table Indexes
- DECLARE @table AS VARCHAR(MAX) = 'Message';
- DECLARE @sqlDisable AS VARCHAR(MAX) = '';
- DECLARE @sqlEnable AS VARCHAR(MAX) = '';
-
- SELECT
- @sqlDisable = @sqlDisable + 'ALTER INDEX ' + idx.name + ' ON '
- + obj.name + ' DISABLE;' + CHAR(13) + CHAR(10),
- @sqlEnable = @sqlEnable + 'ALTER INDEX ' + idx.name + ' ON '
- + obj.name + ' REBUILD;' + CHAR(13) + CHAR(10)
- FROM sys.indexes idx
- JOIN sys.objects obj
- ON idx.object_id = obj.object_id
- WHERE idx.type_desc = 'NONCLUSTERED'
- AND obj.type_desc = 'USER_TABLE'
- AND obj.name = @table;
-
- RAISERROR(@sqlDisable, 0, 1) WITH NOWAIT;
- RAISERROR(@sqlEnable, 0, 1) WITH NOWAIT;
- --EXEC(@sqlDisable);
- --EXEC(@sqlEnable);
Now, with those indexes disabled, the simple delete operation took a lot less than a minute! Even in the case of our 21 million rows table, deleting 7 million rows took only 1:02 on my machine. Of course, after deleting the unwanted rows, you need to re-enable the indexes again which took another minute, but all in all I'm happy with the result.
Copy Data to New Table and Drop Old Table
One other way of deleting rows that I've used in combination with changing the table schema at the same time is the following:
- use a temporary table into which you copy all the rows you want to keep (the schema of which I modified to meet our new needs)
- delete the original table
- rename the temporary table to the original table's name
- recreate all indexes you had defined before
This is basically what SSMS generates for you when you change the schema of a table, except for the indexes – you have to recreate them yourself.
As you can imagine, this approach becomes faster and creates smaller transaction log footprint with a growing amount of data to delete. It won't have any benefit if you delete less than half of the table's rows.
Choose the right tool for the job
There are quite a few other approaches and tips out there on how to speed up your deletion process. It depends a lot on your concrete situation which of those will actually help you get your deletion job done faster. I had to experiment quite a bit to find the sweet spot but now that I've seen a few approaches I'm able to take a better decision in the future.