Indexes in databases are similar to indexes in books. In a book, an index allows you to find information quickly without reading the entire book. In a database, an index allows the database program to find data in a table without scanning the entire table. An index in a book is a list of words with the page numbers that contain each word. An index in a database is a list of values in a table with the storage locations of rows in the table that contain each value. Indexes can be created on either a single column or a combination of columns in a table and are implemented in the form of B-trees. An index contains an entry with one or more columns (the search key) from each row in a table. A B-tree is sorted on the search key, and can be searched efficiently on any leading subset of the search key. For example, an index on columns A, B, C can be searched efficiently on A, on A, B, and A, B, C.
Most books contain one general index of words, names, places, and so on. Databases contain individual indexes for selected types or columns of data: this is similar to a book that contains one index for names of people and another index for places. When you create a database and tune it for performance, you should create indexes for the columns used in queries to find data.
When SQL Server executes a statement to find data in the employee table based on a specified emp_id value, it recognizes the index for the emp_id column and uses the index to find the data. If the index is not present, it performs a full table scan starting at the beginning of the table and stepping through each row, searching for the specified emp_id value.
SQL Server automatically creates indexes for certain types of constraints (for example, PRIMARY KEY and UNIQUE constraints). You can further customize the table definitions by creating indexes that are independent of constraints.
The performance benefits of indexes, however, do come with a cost. Tables with indexes require more storage space in the database. Also, commands that insert, update, or delete data can take longer and require more processing time to maintain the indexes. When you design and create indexes, you should ensure that the performance benefits outweigh the extra cost in storage space and processing resources.
Using Clustered Indexes
A clustered index determines the physical order of data in a table. A clustered index is analogous to a telephone directory, which arranges data by last name. Because the clustered index dictates the physical storage order of the data in the table, a table can contain only one clustered index. However, the index can comprise multiple columns (a composite index), like the way a telephone directory is organized by last name and first name.
A clustered index is particularly efficient on columns that are often searched for ranges of values. After the row with the first value is found using the clustered index, rows with subsequent indexed values are guaranteed to be physically adjacent. For example, if an application frequently executes a query to retrieve records between a range of dates, a clustered index can quickly locate the row containing the beginning date, and then retrieve all adjacent rows in the table until the last date is reached. This can help increase the performance of this type of query. Also, if there is a column(s) that is used frequently to sort the data retrieved from a table, it can be advantageous to cluster (physically sort) the table on that column(s) to save the cost of a sort each time the column(s) is queried.
Clustered indexes are also efficient for finding a specific row when the indexed value is unique. For example, the fastest way to find a particular employee using the unique employee ID column emp_id is to create a clustered index or PRIMARY KEY constraint on the emp_id column.
Note: PRIMARY KEY constraints create clustered indexes automatically if no clustered index already exists on the table and a non-clustered index is not specified when you create the PRIMARY KEY constraint.
Alternatively, a clustered index could be created on lname, fname (last name, first name), because employee records are often grouped and queried in this way rather than by employee ID.
Clustered indexes are not a good choice for:
ü Columns that undergo frequent changes: This results in the entire row moving (because SQL Server must keep the data values of a row in physical order). This is an important consideration in high-volume transaction processing systems where data tends to be volatile.
ü Wide keys: The key values from the clustered index are used by all non-clustered indexes as lookup keys and therefore are stored in each non-clustered index leaf entry
Using Non-clustered Indexes
A non-clustered index is analogous to an index in a textbook. The data is stored in one place, the index in another, with pointers to the storage location of the data. The items in the index are stored in the order of the index key values, but the information in the table is stored in a different order (which can be dictated by a clustered index). If no clustered index is created on the table, the rows are not guaranteed to be in any particular order. Some books contain multiple indexes. For example, a gardening book can contain one index for the common names of plants and another index for the scientific names because these are the two most common ways in which the readers find information. The same is true for non-clustered indexes. You can define a Non-clustered index for each of the columns commonly used to find the data in the table.
Using Unique Indexes
A unique index ensures that the indexed column contains no duplicate values. In the case of multicolumn unique indexes, the index ensures that each combination of values in the indexed column is unique. For example, if a unique index full_name is created on a combination of last_name, first_name, and middle_initial columns, no two people could have the same full name in the table.
Both clustered and non-clustered indexes can be unique. Therefore, provided that the data in the column is unique, you can create both a unique clustered index and multiple-unique non-clustered indexes on the same table.
When you create a clustered index, the table is copied, the data in the table is sorted, and then the original table is deleted. Therefore, enough empty space must exist in the database to hold a copy of the data.
By default, the data in the table is sorted when the index is created. However, if the data is already sorted because the clustered index already exists and is being re-created using the same name and columns, the sort operation can be automatically skipped by rebuilding the index, rather than creating the index again. The rebuild operation checks that the rows are sorted while building the index. If any rows are not correctly sorted, the operations cancels and the index is not created.
Creating a unique index ensures that any attempt to duplicate key values fails. If a single query is created that causes duplicate and nonduplicate key values to be added, SQL Server rejects all rows, including the nonduplicate key values. For example, if a single insert statement retrieves 20 rows from table A and inserts them into table B, and 10 of those rows contain duplicate key values, by default all 20 rows are rejected. However, the IGNORE_DUP_KEY clause can be specified when creating the index that causes only the duplicate key values to be rejected; the nonduplicate key values are added. In the previous example, only the 10 duplicate key values would be rejected; the other 10 nonduplicate key values would be inserted into table B.
A unique index cannot be created if there are any duplicate key values. For example, if you want to create a unique, composite index on columns a and b, but there are two rows in the table that contain the values 1 and 2 for a and b respectively, the unique index cannot be created.