Indexes in Mysql queries

Question

Indexes in Mysql queries

Asked 10 years, 11 months ago

Viewed 2,914 times

23

Using indexes in mysql queries really makes the result come quickly for some large queries or tables.

For example in the query below:

SELECT * FROM table WHERE status = 1;

For a table with 1 million records if I give the command below:

DELETE FROM table WHERE status = 1;

This query takes a long time. So create the index for the status column and I was able to remove 700,000 lines quickly (about 0.8s)

In the select of the example I did not put the use index xyz but the database was smart to use the index you create for the column.

But what about consultations with various WHERE? Something like

SELECT * FROM table WHERE status = 1 AND account = 12 AND data > '2014-09-09 00:00:00';

If I create an index with the columns status, Account and date MYSQL will use in the above query automatically or I will have to explicitly put the use index xpto?

UPDATE

After implementing only one index in a table that had a complicated query. Look at what resulted in using CPU on my RDS on Amazon.

Desempenho RDS Amazon

I’ll even be able to decrease the type of instance :D

Index creation example for the query example above:

ALTER TABLE `table` ADD INDEX `status_idx` (`status`);

If you added an index for the status column, then Mysql will automatically use this index.

– Nicolas Dias

2014/09/10 at 14:16

1 answer

Browser other questions tagged mysql database optimizing índices

You are not signed in. Login or sign up in order to post.

by Maniero • **444,682** points · Answer 1 · 2014-09-10T14:45:14+00:00

For this query specific will use index if there is an appropriate index for it. This index should include columns status, account and data. Preferably in this order. There is no reason why Mysql does not use the index automatically. Anyway to be sure create and see what happens. Not always what we deduce is correct.

Thus the index will classify the status increasingly, so he can quickly find the lines with the value 1. Then he can quickly find the lines that have the value 12 for account, among the lines already selected that meet the above condition. And finally select all dates above the value indicated in query, again between previously selected lines in the previous conditions. Obviously this selection will not consider other data ranges that do not contain the previous values.

Note that there needs to be a single track selection to take full advantage of the index on query. This is why order can be important to determine whether or not to use the index. That’s why I said that in this query specified the index with the specified composition would be used. Another query may not be able to use the index or may partially use to select a track through it and the rest of the selection have to be scanning the database row by row between the lines already selected in the index.

Usually the secret is to have indexes that meet the main darlings and avoid darlings that cannot benefit from the indices.

Indexes work well when you can mount sub-tracks with them. In a query very complex, which makes it difficult to catch these sub-bands there is no index that solves. Here the ideal is to try to at least reduce enough the lines that were obtained through the index and leave little for the final selection.

To try to facilitate understanding, think about how this search works. When you have something ordered according to what you want to search for it is easy to find through a binary search that has a logarithmic complexity O(log n), ie, it divide and conquer. Even with billions of lines, in just over 30 steps he finds what you want. When there is no order, the way is to look one by one, linear complexity O(n). If a query look at several columns it will try to use the first method and make a selection very fast. When it is no longer possible, only between the lines that already managed to select fast it will apply the condition determined individually row by row. To understand a little more about Big O read this answer.

When the optimizer does not use an existing index it is because it failed to put together a way to "understand" its query appropriately for the index. This may be the fault of how the index is mounted or it may be because the optimizer is not as good. Almost always the optimizer does its job well.

In fact, it may be that the index is not used even if it has a suitable key. There are small volumes of data that the direct search in the data has a lower cost. The optimizer can identify that even if it can select sub-bands, subsequent analysis can be so costly that eliminating the index query can bring a better result. But in these cases the optimizer has renounced its use in a "conscious" way for its benefit. It can not be bad. Although there are cases that it errs.

But be careful not to create too many indices. Updates to them can make the software very slow.

What the AP did in editing the question is exactly what needs to be done. Try and see what really happens. With a simple key using only the most relevant column he got a huge gain. He used the column status. I don’t know if he tried, but he could try it with just the spine account, column only data, and make the possible combinations between the three columns and see how each result looks. As the gain seems to be fundamental, it is worth the small effort.