The importance of indexes
Imagine searching a word in a dictionary that is not ordered. Worse than that, imagine searching in a dictionary where words are randomly arranged.
In the worst case scenario, it would be necessary to go through the entire dictionary to find a given word. The worst case is the one where the word searched is the last page of the dictionary. Obviously, nothing a brute-force algorithm can’t solve. But imagine now millions/billions of records in a table or file. It would be extremely expensive to find, agree?
What are the indexes
Indexes are mechanisms that allow queries to data to happen more quickly when compared to the same queries without the use of index. Can be used in databases, isolated files, RAM.
This happens because the index is stored in a data structure whose premise is to keep the fields that make up the index ordered, thus allowing the application of search algorithms built to operate when the data are properly ordered. An example of these algorithms is binary search. Binary search is a search algorithm that operates on RAM memory and requires the search key to be ordered for its functioning. It is much more efficient than a brute force search algorithm, as it reduces the search space, thus reducing the search time. While a brute-force algorithm has complexity O(n), the binary search algorithm has complexity O(log(n)), the log in this case is in base 2.
Example:
If there are 10000 keys to query, a brute-force algorithm, in the worst case, would make 10000 comparisons to find what you want. The binary search algorithm would, in the worst case, make 14 comparisons (actually, 13.29, however, rounded to 14).
See how binary search is much more efficient than brute-force search algorithm.
Note, however, that this is only possible because the search key is sorted. If not, it would not be possible to use the binary search algorithm.
When to use
Indices are very useful and you can actually reduce the time of many queries using this functionality present in virtually all database Engins. However, there is no free lunch in computing and it is necessary to be aware of two questions:
There is a computational cost to maintain the ordered index. Returning to the example of the dictionary, imagine that a new word is incorporated into Portuguese and should be added to the dictionary. There is a cost to know where to insert this word and another cost to move (reorganize) the other words for inserting the new one. Therefore, if a given table suffers many insertions every second, an index could harm rather than help.
As the goal is to increase performance, that is, decrease the time spent to consult, so it will be necessary to spend more space. In other words, in the case of the database, disk space is used to store an index. This is the traditional commitment of computing: if you want to run an algorithm faster, you have to pay with memory. If you want to use less memory, then you should accept a not-so-efficient algorithm.
See the image below, showing the space spent for the data, the number of lines and the space spent by the index (real example):
In the case of databases, indexes are created in one or more fields of a given table. If the index is composed of more than one field, then it is called a composite index. Generally, you include indexes for those fields that are always present in common Where system clauses or that should be executed quickly.
For the primary key, indexes are set automatically, since they must always be present in an INSERT as well as being present in JOINS.
The image above is from an audit table with more than 1 billion records. The index is composed using two varchar fields. It would be impracticable to consult in this table if this index did not exist.
One can hardly tell if using the index helps or harms without analyzing the context of the situation in detail. This is part of more detailed analysis of database tuning (optimization).
I think this answers: http://answall.com/q/35088/101, http://answall.com/q/23348/101, http://answall.com/q/55118/101
– Maniero