8. Don't cluster on GUID
After all these years, I can't believe we're still fighting this issue. But I still run into clustered GUIDs at least twice a year.
A GUID (globally unique identifier) is a 16-byte randomly generated number. Ordering your table's data on this column will cause your table to fragment much faster than using a steadily increasing value like DATE or IDENTITY. I did a benchmark a few years ago where I inserted a bunch of data into one table with a clustered GUID and into another table with an IDENTITY column. The GUID table fragmented so severely that the performance degraded by several thousand percent in a mere 15 minutes. The IDENTITY table lost only a few percent off performance after five hours. This applies to more than GUIDs — it goes toward any volatile column.
9. Don't count all rows if you only need to see if data exists
It's a common situation. You need to see if data exists in a table or for a customer, and based on the results of that check, you're going to perform some action. I can't tell you how often I've seen someone do a SELECT COUNT(*) FROM dbo.T1 to check for the existence of that data:
SET @CT = (SELECT COUNT(*) FROM dbo.T1);
If @CT > 0
It's completely unnecessary. If you want to check for existence, then do this:
If EXISTS (SELECT 1 FROM dbo.T1)
Don't count everything in the table. Just get back the first row you find. SQL Server is smart enough to use EXISTS properly, and the second block of code returns superfast. The larger the table, the bigger difference this will make. Do the smart thing now before your data gets too big. It's never too early to tune your database.
In fact, I just ran this example on one of my production databases against a table with 270 million rows. The first query took 15 seconds, and included 456,197 logical reads, while the second one returned in less than one second and included only five logical reads. However, if you really do need a row count on the table, and it's really big, another technique is to pull it from the system table. SELECT rows from sysindexes will get you the row counts for all of the indexes. And because the clustered index represents the data itself, you can get the table rows by adding WHERE indid = 1. Then simply include the table name and you're golden. So the final query is SELECT rows from sysindexes where object_name(id) = 'T1' and indexid = 1. In my 270 million row table, this returned sub-second and had only six logical reads. Now that's performance.
Sign up for CIO Asia eNewsletters.