The most important factor in Sql Server to achieve good query performance is indexing.
The index provides a fast way to look up data based on the values within those columns.For example, if you create an index on the primary key and then search for a row of data based on one of the primary key values, SQL Server first finds that value in the index, and then uses the index to quickly locate the entire row of data. Without the index, a table scan would have to be performed in order to locate the row, which can have a significant effect on performance.
Note: PRIMARY KEY constraints create clustered indexes automatically if no clustered index already exists on the table and a non-clustered index is not specified when you create the PRIMARY KEY constraint.
An index is made up of set of pages (index nodes) that are organized in a B-tree structure.
When a query fires against an index column, the query engine starts at the root node and navigates down through the intermediate nodes. It continues down through the index nodes until it reaches the leaf node.
For example if you are searching for the value 123 in an index column, query engine would look for the root node to determine which page to reference in the top intermediate level. In this example the first page points the value 1-100 and second point to value 101-200, so the query engine would go to the second page on that level. From there, the query engine would determine to go the next intermediate level. From there query engine would navigate to the leaf node to for value 123.
The leaf node will contain either the entire row of data or a pointer to that row, depending on whether the index is clustered or non-clustered.
A cluster index store the actual row of data on it its leaf node. According to the above example, that would mean that entire row of data associated with the key value 123 is found on its leaf node.
- When a cluster index defined on a table, it tells the Sql Server how to order the table data pages. It organizes the data pages into a B-tree structure where root pages at the highest level and leaf pages at the lowest level.
- Index pages in the root and intermediate level contain the cluster key value and a page pointer down into the next level of the B-tree. This pattern continues till it reaches the leaf page. Leaf level of the cluster index is where the actual data stored.
- In cluster index, index values are sorted either on ascending or descending order.
It’s impossible to physically arrange same data in 2 different ways without having the separate structure that store information. This is where non clustered index comes into picture.
Non-clustered index tree structure is same as the cluster index where instead of having the base data at the bottom or leaf level of tree, we have set up pointer or references back to the base table data. With this type of structure we can use any index key order we want because orders are independent of the base table data. This entire structure called non-cluster index.
- Index pages in the root and intermediate level contain the key value and a page pointer down into the next level of the B-tree. This pattern continues till it reaches the leaf page.
- Non-cluster index leaf pages store both the index key value AND a pointer to locate the actual data rows in table where actually data stored.
- Whereas in the clustered index the leaf level contains the actual data rows, in a non-clustered index, the leaf level contains the clustered key or Row ID (RID), which SQL Server uses to find the rest of the data.
Let’s says we have defined a non-clustered index on the EmployeeID column in Employee table, if we are running a query to retrieve data (empname, empaddress, etc) for an EmployeeID 18, it uses the cluster index to retrieve other column data since actual data stored in the leaf level of cluster index.
Summary of clustered and non-clustered index:
- A table can have multiple non-clustered indexes where as a table can have only one clustered index.
- Non clustered index store both a value and a pointer to the actual row that holds that value. Cluster index actually stores the row- level data in its leaf nodes.
- Cluster index speeds up the data retrieval since physical data stored on the disk is sorted in the same order as the index. A non-clustered has no effect on which order of the rows will be stored.
- Frequent updates on cluster index column cause performance issues.
- Non-clustered index is the optimal choice for exact match queries by using the cluster key value or row ID (RID) depending upon the underlying table order.
- Non-clustered index is considered for queries that do not return large result sets.
- Use of covering index eliminates accessing the table or cluster index together.
- It is important to define the cluster index key with as few columns as possible because the non-cluster index entries contain the clustering key.
- Cluster index considered for queries that return a range of values using operator such as BETWEEN,>,>=, < and <=.
- Considered for column that are accessed sequentially and queries that return large result set.
- Cluster index eliminates the need for Sql Server to sort the data when there is GROUP BY and ORDER BY clause in a query. Since data is already sorted in a cluster index hence improve query performance.