Tuesday, April 28, 2009

SQL Server 2005 - Tips for Optimizing SQL Server Query Performance

The following are some tips to improve the performance of SQL Server queries.

1. Turn on the execution plan, and statistics. Then analyse the query performance
The first step is to use the tools that help you determine whether a query done one way is better than another. By comparing the original query to a new query that we come up with is the best way to evaluate the benefits of any changes.

To do this, go into SQL Server Management Studio and select the Query menu. Select the “Include Actual Execution Plan.”

[SqlServer1.JPG]

Next, turn on statistics. Type the following statement:

SET STATISTICS IO ON;

To counteract the above two you should also compare the actual execution times. To do this, execute the following statement:

SET STATISTICS TIME ON;

After running a simple query, analyze the query execution plan for optimizing the query as below.

[sqlserver2.JPG]

After getting the execution plan, do the following

  • Analyzing Execution Plans
  • Estimated Cost of Execution

2. Use Clustered Indexes in appropriate columns
Having the clustered index on the primary key is sometimes not the most efficient place for the clustered index to be. A clustered index is the most per formant type of index.
The scenario is

  • Generally, the whole table is sorted according to the clustered index which is created by default while creating Primary key.
  • If the table is involved in lots of joins based on the primary key, it is probably the right place for it to be.
  • But in some cases that you are continually filtering or grouping on other columns in a table, then you should possibly consider changing the primary key index to Non-Clustered, and putting the clustered index on those filtered or grouped columns.


In such cases, drop the clustered index from PK and create new clustered index for other column.

ALTER TABLE testtable
DROP CONSTRAINT PK_testtable
GO
ALTER TABLE testtable
ADD CONSTRAINT PK_testtable
PRIMARY KEY NONCLUSTERED (testcolumn);

Then the following statement adds a new clustered index to a table for the required column other than primary key where it involved lot of joins.

CREATE CLUSTERED INDEX MyClusteredIndex
ON testtable (testcolumnID)

3. Use Indexed views

Indexed Views have been around for a while for improving the performance using SCHEMABINDING option and create index on the views.
Imagine that you have created a view without SCHEMABINDING option and you have altered the schema of underlying table (deleted one column). Next time when you run your view, it will fail. Here is when SCHEMABINDING comes into picture. Creating a view with SCHEMABINDING option locks the underlying tables and prevents any changes that may change the table schema.

CREATE VIEW testview
WITH SCHEMABINDING
AS
SELECT testcolumn1, testcolumn2,
FROM testtable a INNER JOIN testtable2 b
ON (a.column1 = b.column1)

Note the use of the schema binding attribute. This prevents you from changing underlying tables while this view exists, and is necessary if you want to add an index.

CREATE UNIQUE CLUSTERED INDEX Idxtestview
ON testview (
testcolumn1, testcolumn2
)

4. Use Covering Indexes

Covering indexes are a feature that was newly added to SQL 2005. Basically, you can create an index optimized for the query itself based on joins, filters and grouping, and then add additional columns that can be retrieved directly from the index for use in select statements, as follows:

CREATE NONCLUSTERED INDEX TestIndex ON salestable (OrderId)INCLUDE (Quantity, UnitPrice)

If queries are executed on the OrderId column, the index will be used, and if the only other columns being retrieved are Quantity and UnitPrice, then the query optimizer doesn’t need to retrieve any extra columns from the underlying table. It can just use the index. Because the query optimizer doesn’t need to query the original table, performance is improved.

5. Keep your clustered index small in the table

One thing you need to consider when determining where to put your clustered index is how big the key for that index will be. So if you have a large clustered index on a table with a decent number of rows, the size could blow out significantly. In the case where there is no clustered index on a table, this could be just as bad, because it will use the row pointer, which is 8 bytes per row.

6. Avoid cursors

Cursors are less performant because every FETCH statement executed is equivalent to another SELECT statement execution that returns a single row. The optimizer can’t optimize a CURSOR statement, instead optimizing the queries within each execution of the cursor loop, which is undesirable. Given that most CURSOR statements can be re-written using set logic, they should generally be avoided.
Each time when the FECTH is executed, it causes a roundtrip to the server and hence performance is degraded.

7. Archive old data using Replication or Backup
If you want to improve query performance, give the optimizer less work to do. If you can cut down the number of rows the query has deal with, then performance will improve.

This can be done in the following ways

  • One can also create audit triggers to move historical data into other tables for this reason.
  • Don’t need your data after a certain period of time, back up your database and remove the data.
  • Also we can use SQL Server Replication technique to back up the data


8. Partition your data correctly
These days, you don’t actually have to move old data out of a table to improve query performance. You can partition your table into a number of data segments based on a partition function. The query optimizer can use the partition function to look at rows only on the most appropriate file group. To create partitions, you need a partition function and a partition scheme.

CREATE PARTITION FUNCTION TestPartitionFunction (int) AS RANGE RIGHT FOR VALUES (1, 100, 1000)

Once the partition function is created, you can then apply the function to a partition scheme for a table.
CREATE PARTITION SCHEME TestPartitionSchemeAS PARTITION TestPartitionFunctionTO (filegrp1, filegrp2, filegrp3, filegrp4)

Then it’s just a matter of creating the table to use the partition scheme on the column you decided to partition on:
CREATE TABLE myPartitionTable ( Col1 int, Col2 varchar (100))ON TestPartitionScheme (col1)

9. Remove user-defined inline scalar functions
Inline scalar functions are convenient if you want to return a single value, but at the expense of performance. They look somewhat like stored procedures, and they can be used in SQL statements. The problem is that they are not expanded and therefore not optimized into the query plan by the query optimizer. Queries like this may appear to be performant in the Execution plans and also in the IO statistics, but when you run the query, it can perform really badly.


CREATE FUNCTION dbo.fnGetPostalCode ( @Suburb varchar (100), @State varchar (10)) RETURNS int AS BEGIN

RETURN ISNULL (( SELECT Postal Code FROM dbo.PostalCode WHERE Suburb = @Suburb AND State = @State ), -1);

END

The following statement will only perform a clustered index scan, not a seek, and on a big table this could seriously affect performance.

SELECT s.SalesPersonID, s.SuburbName, s.State, dbo.fnGetPostalCode (s.SuburbName, s.State) AS PostalCode FROM dbo.SalesPerson

You can have a look at the details by clicking on SQL Server Management Studio’s Query menu, and selecting “Include Actual Execution Plan”
One way to get around this is to simply inline the underlying queries from the function, as follows:
SELECT s.SalesPersonID, s.SuburbName, s.State, ISNULL ((SELECT PostalCode FROM dbo.PostalCode WHERE Suburb = s.SuburbName AND State = s.State), -1) AS PostalCode FROM dbo.SalesPerson

Inline the SQL statement will perform significantly better than the inline function.

10. Use APPLY in SQL queries
The apply statement was created for the situation where you put multiple inline nested queries in the one statement. For example, take the following statement:

SELECT soh.SalesOrderID, Quantity= (SELECT TOP 1 (Quantity) FROM SalesOrderDetails WHERE SalesOrderID = a.SalesOrderID), UnitPrice= (SELECT TOP 1 (UnitPrice) FROM SalesOrderDetails WHERE SalesOrderID = a.SalesOrderID)FROM SalesOrderHeader a

This performs an extra query, retrieving data from another table using the same criterion. This can now be replaced with the following:

SELECT b.SalesOrderID, b.OrderDate, a.* FROM SalesOrderHeader b CROSS APPLY ( SELECT TOP (1) c.UnitPrice as UnitPrice, c.Quantity as Quantity FROM SalesOrderDetail c WHERE c.SalesOrderId = b.SalesOrderId ORDER BY c.Quantity DESC) as a

a.* mentioned in the above query returns Quantity and UnitPrice. We can also refer as a. UnitPrice and a.Quantity.

11. Use computed columns

Computed columns are derived from other columns in a table. By creating and indexing a computed column, you can turn what would otherwise be a scan into a seek.
For example, if you needed to calculate SalesPrice and you had a Quantity and UnitPrice column, multiplying them in the SQL inline would cause a table scan as it multiplied the two columns together for every single row. Create a computed column called SalesPrice, then index it and the query optimizer will no longer need to retrieve the UnitPrice and Quantity data and do a calculation.
Creating computed column syntax

CREATE TABLE sales (SALESORDERID int, UNITPRICE int, QUANTITY int, SALESPRICE as (UNITPRICE * QUANTITY))

where SALESPRICE is a computed column

Inserting the values into other columns automatically inserts value in SALESPRICE which is a computed column.
INSERT INTO sales (SALESORDERID, UNITPRICE, QUANTITY) VALUES (1, 100, 2)

12. Use the correct transaction isolation level

If there are a lot of rows in your table, multiple concurrent requests to that table could cause contention if the correct transaction isolation level is not set. If requests are repeatedly blocked, it could be time to consider whether to change.

For example, READ UNCOMMITED is equivalent to dirty reads, or NOLOCK. That is, if a transaction is in the middle of processing and you read a row, the data may not be valid, especially if multiple inserts/updates are occurring that require atomicity. This is the most per formant and it ignores locking altogether, but is generally not allowed by good design and is a special case.

With READ_COMMITTED_SNAPSHOT, it specifies that any data read by the transaction will be the transitionally consistent version of the data that existed at the start of the transaction. Internally, it makes a versioned copy of the data and this is placed in tempdb until the transaction has competed. Except when the database is being recovered, snapshot transactions do not request locks when reading data, and therefore do not block other transactions from writing data. Transactions writing data also do not block other transactions reading data.

There are various other types of transaction options, including REPEATABLE_READ and SERIALIZABLE amongst others that you can look at to determine whether they are appropriate for your needs.

Microsoft suggested some check list for SQL Server performance is available here

No comments: