Indexes
in databases are similar to indexes in books. In a book, an index allows you to
find information quickly without reading the entire book. In a database, an
index allows the database program to find data in a table without scanning the
entire table. An index in a book is a list of words with the page numbers that
contain each word. An index in a database is a list of values in a table with
the storage locations of rows in the table that contain each value. Indexes can
be created on either a single column or a combination of columns in a table and
are implemented in the form of B-trees. An index contains an entry with one or
more columns (the search key) from each row in a table. A B-tree is sorted on
the search key, and can be searched efficiently on any leading subset of the
search key. For example, an index on columns A, B, C can be searched
efficiently on A, on A, B, and A, B, C.
Most
books contain one general index of words, names, places, and so on. Databases
contain individual indexes for selected types or columns of data: this is
similar to a book that contains one index for names of people and another index
for places. When you create a database and tune it for performance, you should
create indexes for the columns used in queries to find data.
When
SQL Server executes a statement to find data in the employee table based on a
specified emp_id value, it recognizes the index for the emp_id column and uses
the index to find the data. If the index is not present, it performs a full
table scan starting at the beginning of the table and stepping through each
row, searching for the specified emp_id value.
SQL
Server automatically creates indexes for certain types of constraints (for
example, PRIMARY KEY and UNIQUE constraints). You can further customize the
table definitions by creating indexes that are independent of constraints.
The
performance benefits of indexes, however, do come with a cost. Tables with
indexes require more storage space in the database. Also, commands that insert,
update, or delete data can take longer and require more processing time to
maintain the indexes. When you design and create indexes, you should ensure
that the performance benefits outweigh the extra cost in storage space and
processing resources.
Using Clustered Indexes
A
clustered index determines the physical order of data in a table. A clustered
index is analogous to a telephone directory, which arranges data by last name.
Because the clustered index dictates the physical storage order of the data in
the table, a table can contain only one clustered index. However, the index can
comprise multiple columns (a composite index), like the way a telephone
directory is organized by last name and first name.
A
clustered index is particularly efficient on columns that are often searched
for ranges of values. After the row with the first value is found using the
clustered index, rows with subsequent indexed values are guaranteed to be
physically adjacent. For example, if an application frequently executes a query
to retrieve records between a range of dates, a clustered index can quickly
locate the row containing the beginning date, and then retrieve all adjacent
rows in the table until the last date is reached. This can help increase the
performance of this type of query. Also, if there is a column(s) that is used
frequently to sort the data retrieved from a table, it can be advantageous to
cluster (physically sort) the table on that column(s) to save the cost of a
sort each time the column(s) is queried.
Clustered
indexes are also efficient for finding a specific row when the indexed value is
unique. For example, the fastest way to find a particular employee using the
unique employee ID column emp_id is to create a clustered index or PRIMARY KEY
constraint on the emp_id column.
Note:
PRIMARY KEY constraints create clustered indexes automatically if no clustered
index already exists on the table and a non-clustered index is not specified
when you create the PRIMARY KEY constraint.
Alternatively,
a clustered index could be created on lname, fname (last name, first name),
because employee records are often grouped and queried in this way rather than
by employee ID.
Clustered
indexes are not a good choice for:
ü
Columns that undergo
frequent changes: This results in the entire row moving (because SQL Server
must keep the data values of a row in physical order). This is an important
consideration in high-volume transaction processing systems where data tends to
be volatile.
ü
Wide keys: The key
values from the clustered index are used by all non-clustered indexes as lookup
keys and therefore are stored in each non-clustered index leaf entry
Using Non-clustered Indexes
A
non-clustered index is analogous to an index in a textbook. The data is stored
in one place, the index in another, with pointers to the storage location of
the data. The items in the index are stored in the order of the index key
values, but the information in the table is stored in a different order (which
can be dictated by a clustered index). If no clustered index is created on the
table, the rows are not guaranteed to be in any particular order. Some books
contain multiple indexes. For example, a gardening book can contain one index
for the common names of plants and another index for the scientific names because
these are the two most common ways in which the readers find information. The
same is true for non-clustered indexes. You can define a Non-clustered index
for each of the columns commonly used to find the data in the table.
Using Unique Indexes
A
unique index ensures that the indexed column contains no duplicate values. In
the case of multicolumn unique indexes, the index ensures that each combination
of values in the indexed column is unique. For example, if a unique index
full_name is created on a combination of last_name, first_name, and
middle_initial columns, no two people could have the same full name in the
table.
Both
clustered and non-clustered indexes can be unique. Therefore, provided that the
data in the column is unique, you can create both a unique clustered index and
multiple-unique non-clustered indexes on the same table.
Clustered Indexes
When
you create a clustered index, the table is copied, the data in the table is
sorted, and then the original table is deleted. Therefore, enough empty space
must exist in the database to hold a copy of the data.
By
default, the data in the table is sorted when the index is created. However, if
the data is already sorted because the clustered index already exists and is
being re-created using the same name and columns, the sort operation can be
automatically skipped by rebuilding the index, rather than creating the index
again. The rebuild operation checks that the rows are sorted while building the
index. If any rows are not correctly sorted, the operations cancels and the
index is not created.
Unique
Indexes
Creating
a unique index ensures that any attempt to duplicate key values fails. If a
single query is created that causes duplicate and nonduplicate key values to be
added, SQL Server rejects all rows, including the nonduplicate key values. For
example, if a single insert statement retrieves 20 rows from table A and
inserts them into table B, and 10 of those rows contain duplicate key values,
by default all 20 rows are rejected. However, the IGNORE_DUP_KEY clause can be
specified when creating the index that causes only the duplicate key values to
be rejected; the nonduplicate key values are added. In the previous example,
only the 10 duplicate key values would be rejected; the other 10 nonduplicate key
values would be inserted into table B.
A
unique index cannot be created if there are any duplicate key values. For
example, if you want to create a unique, composite index on columns a and b, but
there are two rows in the table that contain the values 1 and 2 for a and b
respectively, the unique index cannot be created.
No comments:
Post a Comment