When and in which columns should indexes be used?

Asked

Viewed 673 times

16

Reading an article that has nothing to do with database I came across information that use of indexes can bring great improvements to database performance.

I have two questions about that:

  • When indexes should be created and in which columns they should be created?
  • Creating indexes has some cost to the bank?
  • I think this answers: http://answall.com/q/35088/101, http://answall.com/q/23348/101, http://answall.com/q/55118/101

2 answers

12


The importance of indexes

Imagine searching a word in a dictionary that is not ordered. Worse than that, imagine searching in a dictionary where words are randomly arranged.

In the worst case scenario, it would be necessary to go through the entire dictionary to find a given word. The worst case is the one where the word searched is the last page of the dictionary. Obviously, nothing a brute-force algorithm can’t solve. But imagine now millions/billions of records in a table or file. It would be extremely expensive to find, agree?

What are the indexes

Indexes are mechanisms that allow queries to data to happen more quickly when compared to the same queries without the use of index. Can be used in databases, isolated files, RAM.

This happens because the index is stored in a data structure whose premise is to keep the fields that make up the index ordered, thus allowing the application of search algorithms built to operate when the data are properly ordered. An example of these algorithms is binary search. Binary search is a search algorithm that operates on RAM memory and requires the search key to be ordered for its functioning. It is much more efficient than a brute force search algorithm, as it reduces the search space, thus reducing the search time. While a brute-force algorithm has complexity O(n), the binary search algorithm has complexity O(log(n)), the log in this case is in base 2.

Example:

If there are 10000 keys to query, a brute-force algorithm, in the worst case, would make 10000 comparisons to find what you want. The binary search algorithm would, in the worst case, make 14 comparisons (actually, 13.29, however, rounded to 14).

See how binary search is much more efficient than brute-force search algorithm.

Note, however, that this is only possible because the search key is sorted. If not, it would not be possible to use the binary search algorithm.

When to use

Indices are very useful and you can actually reduce the time of many queries using this functionality present in virtually all database Engins. However, there is no free lunch in computing and it is necessary to be aware of two questions:

  1. There is a computational cost to maintain the ordered index. Returning to the example of the dictionary, imagine that a new word is incorporated into Portuguese and should be added to the dictionary. There is a cost to know where to insert this word and another cost to move (reorganize) the other words for inserting the new one. Therefore, if a given table suffers many insertions every second, an index could harm rather than help.

  2. As the goal is to increase performance, that is, decrease the time spent to consult, so it will be necessary to spend more space. In other words, in the case of the database, disk space is used to store an index. This is the traditional commitment of computing: if you want to run an algorithm faster, you have to pay with memory. If you want to use less memory, then you should accept a not-so-efficient algorithm.

See the image below, showing the space spent for the data, the number of lines and the space spent by the index (real example):

Exemplo real do espaço gasto pelo índice de uma tabela

In the case of databases, indexes are created in one or more fields of a given table. If the index is composed of more than one field, then it is called a composite index. Generally, you include indexes for those fields that are always present in common Where system clauses or that should be executed quickly.

For the primary key, indexes are set automatically, since they must always be present in an INSERT as well as being present in JOINS.

The image above is from an audit table with more than 1 billion records. The index is composed using two varchar fields. It would be impracticable to consult in this table if this index did not exist.

One can hardly tell if using the index helps or harms without analyzing the context of the situation in detail. This is part of more detailed analysis of database tuning (optimization).

  • Great explanation.

4

Creating indexes has cost, yes. Each time you insert or change a column, the index needs to be rebuilt.

The index is used to record the elements of the indexed column in another location. When you search for, say, 'Roberto', instead of searching in the table USERS, line by line and see where the search term appears, the Oracle searches in the index, which is ordered, and finds the line or lines where the term appears.

Think of it as a remissive index in a book. The performance impact exists: when you change a page, you need to update the index. But this impact is distributed among the changes, which are usually small. Without indexes, the impact of the search would be greater, and would occur in each search (think about having to search the book every time you want to find where a word appears).


Some details:

  • Some scenarios are improved amazingly with the application of indexes. Some not so much.
  • Sometimes the impact of creating indexes can have measurable consequences. It is recommended to avoid unnecessary indexes.
  • In addition to the impact on time, indexes occupy space.
  • There are other optimizations in a database that are important (normalization), and deserve even more attention than the question of indexes.

At the end of the day, the subject is complex. But the basic tip is to create indexes in fields that will be searched frequently.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.