Delete a Large Number of Rows from a Table in SQL Server

by Oliver 28. May 2014 12:09

Recently, we had to make some space available in one of our SQL Express instances that was getting close to its 10 GB limit of stored data, so I set out to delete some old data from two of our largest tables. One contained about half a million rows, the other a bit over 21 million. Simple Deletion Would Take… Forever The simplest sql statement to delete all rows that were created before 2012 would be the following: DELETE FROM [dbo].[Message] WHERE DateCreated < '20120101' I can't even tell you how long this took because at 14 minutes I just cancelled the query execution (which took another 7 minutes to finish). This was the table with less than 500,000 rows where we wanted to delete a bit more than 200,000 rows. Break Delete Operation Into Chunks Searching for a solution to the problem, I came across this blog post on breaking large delete operations into chunks. It shows in good detail how the simple version above behaves against running a loop of a few tens of thousand deletes per iteration. An interesting aspect I hadn't thought of at that point was the transaction log growth that can become a problem with large delete operations. Running a loop allows you to do a log backup (in full recovery mode) or a checkpoint (in simple mode) at the end of each iteration so that the log will grow much more slowly. Unfortunately, though, this didn't help with the execution time of the delete itself, as you can also see from the graphs presented in above post. Disable Those Indexes! It turns out, our [Message] table had six non-clustered indexes on them which all had to be written to for every row that was deleted. Even if those operations are fast, their processing time will add up over a few hundred thousand iterations. So let's turn them off! In fact, let's turn only those off that won't be used during out delete query. [We have one index on the DateCreated column which will be helpful during execution.] This stackoverflow answer shows how to create some dynamic SQL to disable all non-clustered indexex in a database. I've modified it slightly to disable only indexes of a given table: Disable/Enable Table Indexes DECLARE @table AS VARCHAR(MAX) = 'Message'; DECLARE @sqlDisable AS VARCHAR(MAX) = ''; DECLARE @sqlEnable AS VARCHAR(MAX) = '';   SELECT     @sqlDisable = @sqlDisable + 'ALTER INDEX ' + idx.name + ' ON '                     + obj.name + ' DISABLE;' + CHAR(13) + CHAR(10),     @sqlEnable = @sqlEnable + 'ALTER INDEX ' + idx.name + ' ON '                     + obj.name + ' REBUILD;' + CHAR(13) + CHAR(10) FROM sys.indexes idx JOIN sys.objects obj     ON idx.object_id = obj.object_id WHERE idx.type_desc = 'NONCLUSTERED'     AND obj.type_desc = 'USER_TABLE'     AND obj.name = @table;   RAISERROR(@sqlDisable, 0, 1) WITH NOWAIT; RAISERROR(@sqlEnable, 0, 1) WITH NOWAIT; --EXEC(@sqlDisable); --EXEC(@sqlEnable); Now, with those indexes disabled, the simple delete operation took a lot less than a minute! Even in the case of our 21 million rows table, deleting 7 million rows took only 1:02 on my machine. Of course, after deleting the unwanted rows, you need to re-enable the indexes again which took another minute, but all in all I'm happy with the result. Copy Data to New Table and Drop Old Table One other way of deleting rows that I've used in combination with changing the table schema at the same time is the following: use a temporary table into which you copy all the rows you want to keep (the schema of which I modified to meet our new needs) delete the original table rename the temporary table to the original table's name recreate all indexes you had defined before This is basically what SSMS generates for you when you change the schema of a table, except for the indexes – you have to recreate them yourself. As you can imagine, this approach becomes faster and creates smaller transaction log footprint with a growing amount of data to delete. It won't have any benefit if you delete less than half of the table's rows. Choose the right tool for the job There are quite a few other approaches and tips out there on how to speed up your deletion process. It depends a lot on your concrete situation which of those will actually help you get your deletion job done faster. I had to experiment quite a bit to find the sweet spot but now that I've seen a few approaches I'm able to take a better decision in the future.

Retrieving random content items (rows) from a SQL database in Orchard with HQL queries

by Oliver 22. February 2014 12:37

We're adding some Premium functionality to discoverize right now, and part of that is the so-called Premium block which is a showcase of six Premium entries. Now, choosing the right entries for that block is the interesting part: as long as we don't have six Premium entries to show, we want to fill up the left over space with some random entries that haven't booked our Premium feature, yet. Get random rows from SQL database There are plenty of articles and stackoverflow discussions on the topic of how to (quickly) retrieve some random rows from a SQL database. I wanted to get something to work simply and quickly, not necessarily high performance. Incorporating any kind of hand-crafted SQL query was really the last option since it would mean to get hold of an ISessionLocator instance to get at the underlying NHibernate ISession to then create a custom SQL query and execute it. Not my favorite path, really. Luckily, the IContentManager interface contains the method HqlQuery which returns an IHqlQuery containing these interesting details: /// <summary> /// Adds a join to a specific relationship. /// </summary> /// <param name="alias">An expression pointing to the joined relationship.</param> /// <param name="order">An order expression.</param> IHqlQuery OrderBy(Action<IAliasFactory> alias, Action<IHqlSortFactory> order); …and IHqlSortFactory contains a Random() method. This finally got me going! HQL queries in Orchard HQL queries are a great feature in (N)Hibernate that allow you to write almost-SQL queries against your domain models. I won't go into further detail here, but be sure to digest that! Orchard's IContentManager interface contains the method HqlQuery() to generate a new HQL query. Unfortunately, there's almost no usage of this feature throughout the whole Orchard solution. So let me document here how I used the HqlQuery to retrieve some random entries from our DB: // retrieve count items of type "Entry" sorted randomly return contentManager.HqlQuery()     .ForType("Entry")     .OrderBy(alias => alias.ContentItem(), sort => sort.Random())     .Slice(0, count)     .Select(item => item.Id); And one more: // retrieve <count> older items filtered by some restrictions, sorted randomly return contentManager.HqlQuery()     .ForPart<PremiumPart>()     .Where(alias => alias.ContentPartRecord<PremiumPartRecord>(),            expr => expr.Eq("Active", true))     .Where(alias => alias.ContentPartRecord<PremiumPartRecord>(),            expr => expr.Lt("BookingDateTime", recentDateTime))     .OrderBy(alias => alias.ContentItem(), sort => sort.Random())     .Slice(0, count)     .Select(item => item.Id); Even with the source code at hand, thanks to Orchard's MIT license, the implementation of this API in the over 600 lines long DefaultHqlQuery is not always straight-forward to put into practice. Most of all I was missing a unit test suite that would show off some of the core features of this API and I'm honestly scratching my head of how someone could build such an API without unit tests! Random() uses newid() : monitor the query performance The above solution was easy enough to implement once I've got my head around Orchard's HQL query API. But be aware that this method uses the newid() approach (more here) and thus needs to a) generate a new id for each row in the given table and b) sort all of those ids to then retrieve the top N rows. Orchard has this detail neatly abstracted away in the ISqlStatementProvider implementation classes. Here's the relevant code from SqlServerStatementProvider (identical code is used for SqlCe): public string GetStatement(string command) {     switch (command) {         case "random":             return "newid()";     }     return null; } For completeness, here's the generated SQL from the first query above (with variable names shortened for better readability): select content.Id as col_0_0_ from Test_ContentItemVersionRecord content     inner join Test_ContentItemRecord itemRec         on content.ContentItemRecord_id = itemRec.Id     inner join Test_ContentTypeRecord typeRec         on itemRec.ContentType_id = typeRec.Id where ( typeRec.Name in ('Entry') )     and content.Published = 1 order by newid() OFFSET 0 ROWS FETCH NEXT 3 ROWS ONLY This approach works well enough on small data sets but may become a problem if your data grows. So please keep a constant eye on all your random queries' performance. Happy HQL-ing!

IRIs and URIs; or: Internet Explorer does not decode encoded non-ASCII characters in its address bar

by Oliver 24. October 2013 23:03

Some facts about IE and its address bar IE can display non-ASCII characters in the address bar if you put them there by hand or click a link that contains such in unencoded form, e.g. http://marinas.info/marina/fürther-wassersportclub. IE sends a request for the correctly encoded URL, which is http://marinas.info/marina/marina/f%C3%BCrther-wassersportclub. Now, if you're in IE and click on the second link above, IE will not decode the URL back to the unencoded version – it will just keep the encoded URL in the address bar. If, instead, you're reading this page in FF or Chrome, the encoded URL above will be gracefully decoded into its unencoded counterpart. URIs and IRIs Disclaimer First off, let me tell you that I'm by no means an expert in this field. I'm trying to get my around URIs, IRIs, encodings and beautiful web sites and URLs just like probably half of the web developer world out there. So please, verify what you read here and correct me where I am mistaken. What the RFCs have to say By today, more than a handful of RFC documents have been published concerning URIs: RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax, which is the current Internet Standard (using IETF vocabulary) on URIs it updates RFC 1738, and it obsoletes RFCs 2732, 2396, and 1808 RFC 3987 - Internationalized Resource Identifiers (IRIs), which adds Unicode support for resource identifiers RFC 3986 states the following about a URI: A URI is an identifier consisting of a sequence of characters matching the syntax rule named <URI> in Section 3. See the examples section, or refer to Appendix A for the ABNF for URIs. RFC 3987 states the following about an IRI: An IRI is a sequence of characters from the Universal Character Set (Unicode/ISO 10646). In short, IRIs may contain Unicode characters while URI must not. Moreover, every URI is a valid IRI and every IRI can be encoded into a valid URI. Let's see an example again: IRI: http://marinas.info/marina/marina/fürther-wassersportclub URI: http://marinas.info/marina/marina/f%C3%BCrther-wassersportclub A great read on IRIs and their relationship to URIs can be found here by the W3C. Support for IRIs IRIs are not supported in HTTP as per RFC 2616. This implies that before requesting a resource identified by an IRI over HTTP it must be encoded as a URI first. This is what all mainstream browsers seem to do correctly – when you click on http://marinas.info/marina/marina/fürther-wassersportclub and inspect the request sent from your browser you will see that it actually requests http://marinas.info/marina/marina/f%C3%BCrther-wassersportclub. HTML5 support IRIs as URLs: http://www.w3.org/html/wg/drafts/html/CR/infrastructure.html#urls. Use IRIs today It looks like you can safely use IRIs in your HTML pages today already. And doing so will actually persuade IE into displaying the correct non-ASCII characters. So why don't we?

Setting up NGINX as Reverse Proxy for Camping.Info

by Oliver 24. June 2013 23:08

worker_processes 1; # http://nginx.org/en/docs/ngx_core_module.html#worker_processes Thanks to http://www.theunixtips.com/how-to-find-number-of-cpus-on-unix-system for helping me find out how many processor cores our VPS is using. Fixing the painful "24: Too many open files" error in NGINX When this error appears in your /var/etc/nginx/error.log file, you really want to do something about it. It basically tells you that some of your users are not being served content but instead receiving HTTP 500 errors generated by NGINX. Not a good thing! To investigate the open file descriptors on your linux (we're running NGINX on Ubuntu 12.04), I followed advice from this post: Linux: Find Out How Many File Descriptors Are Being Used. Then, to fix the low limits I found, 1024 and 4096 for the soft and hard limits, respectively, the post Set ulimit parameters on ubuntu provided a good walkthrough to changing those limits persistently, i.e. even after a reboot. But I somehow had the feeling that in case of NGINX there had to be a simpler solution. Turns out, there is. Let me introduce you to: worker_rlimit_nofile This thread in the NGINX forums contained the hint I needed: Re: Handling nginx's too many open files even I have the correct ulimit. I had actually posted to that thread before and received the helpful answer over a month ago, but I somehow hadn't got around to implementing it. Until today. So here's what you need to do: Set the worker_rlimit_nofile (in the main context) to something significant (I used 65536 = 2^16) instead of the default 1024 as soft and 4096 as the hard limit. Also set worker_connections to the same number to allow for a lot of parallel connections, both from clients and to upstream servers. Your nginx.conf file should now contain these lines: 1: user www-data; 2: worker_processes 1; 3: worker_rlimit_nofile 65536; 4: pid /var/run/nginx.pid; 5:  6: events { 7:     worker_connections 65536; ## default: 1024 8: # multi_accept on; 9: } Reload NGINX after configuration change In a shell, tell NGINX to reload the configuration. You don't need to restart the whole process to get a configuration change live. Just run: /usr/sbin/nginx -s reload or service nginx reload Now enjoy thousands and thousands of parallel connections to your proxy server :-)

Optimize Images for Your Website

by Oliver 17. June 2013 13:59

This is just a short post to draw your attention to a sweet tool I've just discovered: PNGGauntlet. It runs on Windows using the .NET 4 framework and is as easy to use as you could possibly wish. Also: it's completely free to use. Convert Your Existing PNGs For starters, we'll just convert some existing PNGs – can't really do any harm with that. In the Open File dialog, there's an option to filter for only .png files. You can choose many of them at once: If you provide an Output directory, the optimized files will be written to that destination. But: the tool also has the option to overwrite the original files, which is awesome if you use some kind of source control (and thus have a backup) and just want to get the job done. During my first run, using the 8 processing threads my CPU has to offer, … … I got savings from 3% to 27%: PNGGauntlet also tells me, that in total I saved 4,52 KB. If those were all images on your web site, that would be a not so bad improvement, especially when you get it investing about 2 min of your time and no extra expenses! Real Savings Running PNGGauntlet on the sprites that we use for Camping.Info, we were really surprised: out of 172 KB it saved us over 31%, a whole 54 KB! Now that's an improvement that on a slightly slower connection will already be noticeable. We'll definitely check the rest of our images for more savings. Convert Other Image Formats You can also choose to change your images format to PNG if you're in the mood. I tried converting all GIFs in the Orchard CMS Admin theme to PNGs and went from a total of 24 KB for 20 files to less than 17 KB with no loss of quality – an over 30% saving! Just beware that you'll need to change the file references in your project to pick up the new PNGs. Roundup Easy, fast and cheap (as in free) image optimization doesn't have to be magic anymore – today anyone can do it. Check out PNGGauntlet to try for yourself. There's really no excuse not to!

Orchard Harvest: Responsive Design

by Oliver 13. June 2013 15:30

These are just quick notes from one of the sessions at the Orchard Harvest conference currently taking place in Amsterdam. Fluid Grids bootstrap-responsive SimpleGrid 1140Grid FooTable hides columns, shows content in expandible elements Fluid Fonts FitText: fittextjs.com makes text fit into header elements SlabText adjusts text sizes to make page look beautiful Toggle navigation using Twitter bootstrap: put .nav-collapse and .collapse on the nav element Frameworks Foundation (uses SASS) INK Pure by Yahoo: new, small - 5KB

Revoke Access to Applications using Google OpenID

by Oliver 9. May 2013 12:20

During automatic frontend testing, some of our tests recently broke, which were trying to connect a Google account to our new TeamReview application using OpenID. Those tests used to make sure that on Google's confirmation page the checkbox to remember my choice was unchecked. I'd like to show a screenshot of what it used to look like, but unfortunately, it seems as if that old page has died. On the new confirmation page, no such checkbox is available. This means that during a test run I cannot temporarily accept access to our application and later revoke that access by simply deleting my Google cookies. I now have to go to Google's Authorizing Applications & Sites page and revoke access manually. Just for the record.

HTTP Error 500.19 - Internal Server Error: 0x8007007b Cannot read configuration file

by Oliver 20. March 2013 15:24

While setting up a specification tests project for our new TeamReview tool, I was facing an HTTP 500.19 error when hosting our site in IIS Express. There are lots of questions on stackoverflow concerning this error, Microsoft has a whole page on it, but there is a whole bunch of suberrors that this error addresses. Error 0x8007007b: Cannot read configuration file Unfortunately, none of the above mentioned links contained or solved the specific error code I was seeing: Error Code    0x8007007b Config Error    Cannot read configuration file Config File    \\?\C:\Projects\_teamaton\teamreview\TeamReview.Specs\bin\Debug\..\..\..\TeamReview.Web\web.config After some reading, trying, fiddling, it appeared to me that maybe the path to the config file somehow messed up IIS Express. I admit that it was at least a bit unusual to use the parent directory dots. But it came from my test harness code where I wanted to use relative paths and used Path.Combine() to do that: var webPath = Path.Combine(Environment.CurrentDirectory, "..", "..", "..", "TeamReview.Web"); Pitfall: .. in path Well, it turns out IIS Express didn't like it. Once I called it with a cleaned up path string, everything just worked: "C:\Program Files (x86)\IIS Express\iisexpress.exe" /path:"C:\Projects\_teamaton\teamreview\TeamReview.Web" /port:12345 /systray:false So, watch out for your physical path values when using IIS Express! Use DirectoryInfo to navigate up your directory tree To get the correct path without using absolute paths but also avoiding the .. I used the DirectoryInfo class: var webPath = Path.Combine( new DirectoryInfo(Environment.CurrentDirectory).Parent.Parent.Parent.FullName, "TeamReview.Web");

Test Driving AppHarbor – A Walkthrough and Review

by Oliver 15. March 2013 20:50

For some time now, I've wanted to check out AppHarbor, a cloud service to host .NET applications that includes a build environment, executes tests and deploys successful builds to one or more app servers. They use Amazon's cloud computing infrastructure as their backend. The smallest package is free so there's no good reason not to check it out. Getting my first application up and running First, you need to Sign up, confirm the link in the confirmation email, and log in. This part took about 2 minutes. (Created a new KeePass entry with an uncrackable password on the way.) Then, create an application, entering a name and the geographical region you want your application to be hosted at: Once you're done with that, you can choose where your code is hosted – this assumes you version control your source code using e.g. BitBucket, CodePlex, or Git. They also have a solution for the situation where you don't host your code anywhere, using a built-in Git repository. I didn't use that option, though, since I have an account at GitHub. Clicking on "Configure GitHub to deploy to AppHarbor" directs you to the GitHub logon screen (if you're logged out) where you simply sign in. Now, the following dialog was a bit spooky: What I read between the lines is something like: all your base are belong to us! I mean, it basically says that they can do to all of my projects … well, anything, really. Since I didn't want to create a new account just to try out AppHarbor and, honestly, because I somehow felt that they wouldn't destroy all of my work, I clicked "Authorize app". Phew! Remark: You might choose to use a different GitHub account for your deployments, using e.g. copies of your repositories locally where you just copy everything you need from your dev repo. Then you can grand AppHarbor access to that account without much ado. Now, the AppHarbor app took over and I chose a repository for the application I created earlier: Once I chose a repo for my first app, I got to see the first Build status message – here AppHarbor is building my app for the first time: A few seconds after AppHarbor was done building, and testing, and deploying my app after I had clicked the Deploy button, my app was ACTIVE :-) Under the Hostnames link I found that they had given it http://mathie.apphb.com/ and after a couple of seconds I saw my app online on AppHarbor. That was easy – how about deploying a new version? I'm quite surprised at how easy (and fast!) it was to get my first app up and running. For completeness, I wanted to check how AppHarbor would handle my pushing some changes to the master branch of my repo. Here we go: The commit was picked up within seconds! Another click on Deploy gets the new version out there. Remark: On their homepage they say that apps get automatically deployed once build and test runner finish successfully. This was not the case here, and I didn't find any setting to enable this. If you know how that works, please leave a comment. Update: Looks like this works out of the box, you just need to wait a minute or two for their deployment agent to pick up the new version. I updated my app just now and it got deployed by itself :-) Go back in time – it's easy, too Now, this is a nice feature: you can deploy any version of your application with a click of a button! So, if for some reason, you discover that the new version has some flaw, go back to an older one: What else do they offer? AppHarbor contains an add-on infrastructure and already offers several add-ons that you can install with your application. Most of them charge an extra monthly fee, some of them also offer a free plan. There are mostly analytics add-ons and DB engines, including dedicated MS SQL Server, RavenDB, MySQL, a PostgrSQL flavor, and a few more. Interesting platform with big ease of setup and a free plan Should be good for any smallish app that you just want to set up and forget about! For $10/month you can also assign your own host names which makes this a viable solution. It saves you from installing and maintaining a separate build server with something like TeamCity or CruiseControl running and is also almost easier to set up. Where are you gonna host your next app?

Test Driving AppHarbor – A Walkthrough and Review

by Oliver 15. March 2013 20:50

For some time now, I've wanted to check out AppHarbor, a cloud service to host .NET applications that includes a build environment, executes tests and deploys successful builds to one or more app servers. They use Amazon's cloud computing infrastructure as their backend. The smallest package is free so there's no good reason not to check it out. Getting my first application up and running First, you need to Sign up, confirm the link in the confirmation email, and log in. This part took about 2 minutes. (Created a new KeePass entry with an uncrackable password on the way.) Then, create an application, entering a name and the geographical region you want your application to be hosted at: Once you're done with that, you can choose where your code is hosted – this assumes you version control your source code using e.g. BitBucket, CodePlex, or Git. They also have a solution for the situation where you don't host your code anywhere, using a built-in Git repository. I didn't use that option, though, since I have an account at GitHub. Clicking on "Configure GitHub to deploy to AppHarbor" directs you to the GitHub logon screen (if you're logged out) where you simply sign in. Now, the following dialog was a bit spooky: What I read between the lines is something like: all your base are belong to us! I mean, it basically says that they can do to all of my projects … well, anything, really. Since I didn't want to create a new account just to try out AppHarbor and, honestly, because I somehow felt that they wouldn't destroy all of my work, I clicked "Authorize app". Phew! Remark: You might choose to use a different GitHub account for your deployments, using e.g. copies of your repositories locally where you just copy everything you need from your dev repo. Then you can grand AppHarbor access to that account without much ado. Now, the AppHarbor app took over and I chose a repository for the application I created earlier: Once I chose a repo for my first app, I got to see the first Build status message – here AppHarbor is building my app for the first time: A few seconds after AppHarbor was done building, and testing, and deploying my app after I had clicked the Deploy button, my app was ACTIVE :-) Under the Hostnames link I found that they had given it http://mathie.apphb.com/ and after a couple of seconds I saw my app online on AppHarbor. That was easy – how about deploying a new version? I'm quite surprised at how easy (and fast!) it was to get my first app up and running. For completeness, I wanted to check how AppHarbor would handle my pushing some changes to the master branch of my repo. Here we go: The commit was picked up within seconds! Another click on Deploy gets the new version out there. Remark: On their homepage they say that apps get automatically deployed once build and test runner finish successfully. This was not the case here, and I didn't find any setting to enable this. If you know how that works, please leave a comment. Update: Looks like this works out of the box, you just need to wait a minute or two for their deployment agent to pick up the new version. I updated my app just now and it got deployed by itself :-) Go back in time – it's easy, too Now, this is a nice feature: you can deploy any version of your application with a click of a button! So, if for some reason, you discover that the new version has some flaw, go back to an older one: What else do they offer? AppHarbor contains an add-on infrastructure and already offers several add-ons that you can install with your application. Most of them charge an extra monthly fee, some of them also offer a free plan. There are mostly analytics add-ons and DB engines, including dedicated MS SQL Server, RavenDB, MySQL, a PostgrSQL flavor, and a few more. Interesting platform with big ease of setup and a free plan Should be good for any smallish app that you just want to set up and forget about! For $10/month you can also assign your own host names which makes this a viable solution. It saves you from installing and maintaining a separate build server with something like TeamCity or CruiseControl running and is also almost easier to set up. Where are you gonna host your next app?

About Oliver

shades-of-orange.com code blog logo I build web applications using ASP.NET and have a passion for jQuery. Enjoy MVC 4 and Orchard CMS, and I do TDD whenever I can. I like clean code. Love to spend time with my wife and our daughter. My profile on Stack Exchange, a network of free, community-driven Q&A sites

About Anton

shades-of-orange.com code blog logo I'm a software developer at teamaton. I code in C# and work with MVC, Orchard, SpecFlow, Coypu and NHibernate. I enjoy beach volleyball, board games and Coke.