Index limit with LIMIT

Asked

Viewed 451 times

2

I have never been an expert in sql but I will always come, nothing that a googlada has not solved. But I have a doubt that I did not find information on this.

Recently I did some testing to find out why an sql was slow and came across the following situation

select campo from tabela limit 0,150

the chart had over 10,000 regimens. Ai used explain to discovered pq sometimes gets slow and sometimes not, and found q depending on the value of LIMIT the select used the index and after a q value I can’t remember now what it was, the index was not used.

someone can explain to me why and if there are any limits to the use of LIMIT

explain SELECT id, nome, email FROM tabela ORDER BY id LIMIT 0,20

exit of explain:

id select_type table type possible_keys key key_len ref rows Extra
1      SIMPLE tabela index NULL PRIMARY 3 NULL 20

explain SELECT id, nome, email FROM tabela ORDER BY id LIMIT 0,820

exit of explain:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tabela ALL NULL NULL NULL NULL 14132 Using filesort
  • I don’t understand the question, I think there are no limits to the LIMIT!

  • 2

    The question sounds interesting, but give more subsidies to those who can answer. The limit does not seem to exist, but why he decides to use the index or not is very curious. But you can only answer with enough information about the situation. I don’t know, put the tables, how are the data, show statistics, the result of EXPLAIN. The way it is, only you can answer.

  • ok, I’m editing with the exact examples and explain answers in the question

  • I could not edit the completed question below

  • Is this the slow query? by the amount of records and by the query was not to have this kind of problem.

  • but why if the LIMIT values increase the index is not used? There is some reason for this?

Show 1 more comment

1 answer

4

Often the query Planner realizes that the performance of a sequential disk reading comes out "cheaper" for the system than by consulting the index and taking random readings.

With no index, DB can scan the records (even out of order), and take only what matters (and eventually already sorting on output):

SELECT * FROM x BETWEEN( 3, 10 )

0 > 1 > 8 > 5 > 128 > 2 > 882 > 9 > 7 > 11 > 92 > 6 > 4 > 10 > 3 ...
        |   |                   |   |             |   |   |    |
3 4 5 6 7 8 9 10 ...

In the above case, the query "looks" one by one, compares, and accumulates what matters.

When you use a low limit, for example 3, it pays to use the index despite the query overhead and random reading

SELECT * FROM x BETWEEN( 3, 10 ) LIMIT 4

INDICE: 0, 1, 2, 3 (start), 4, 5, 6 (stop), 7, 8, 9, 11, 92 ...
BUSCA:           ->         -> <-  ->
DB:     0 > 1 > 8 > 5 > 128 > 2 > 882 > 9 > 7 > 11 > 92 > 6 > 4 > 10 > 3 ...

In the above case, as there are few, it pays to start with the index, take the first one, find where it is on the disk, read that part of the data (remembering that if it was only the value of the indexed column nor would need the DB, it would be enough to go to the next one of the index, locate the data in the DB, read the data, go back to Dice again to see the next, read the DB data... etc.

Note that in the first case, it does not have this "come and go" in the index, because the quantity does not compensate to be reading a thousand separate pieces of the file that contain the data, because the sequential reading of the same is usually faster.

In the second case, is little information, so do this "come and go" compensates, because even with random readings, still will be read much less data than the whole DB.

I am taking over the most common databases, working on normal Hds. It has Dbs that do a specialization for memory access or Ssds. Inclusive, the query Planner makes these decisions based on a number of statistics, which may not give the best result in all situations, but usually in most of them the gain justifies. What is important to understand is that someone programmed it this way, and with great probability of having done an absurd amount of tests to improve its functioning.


From the Mysql Handbook

See, for example, these links:
https://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html
https://dev.mysql.com/doc/refman/5.7/en/how-to-avoid-table-scan.html

Highlight for this line:

You are Comparing Indexed Columns with Constant values and Mysql has calculated (based on the index Tree) that the constants cover Too large a part of the table and that a table scan would be Faster.

You’re comparing indexed columns with constant values, and Mysql has calculated, based on the index tree, that they cover such a large part of the table, that scanning it directly will be faster.

And this:

You are using a key with low cardinality (Many Rows match the key value) through Another column. In this case, Mysql assumes that by using the key it probably will do Many key lookups and that a table scan would be Faster.

You are using a low cardinality key (many lines match the key value) through another column. In this case, Mysql assumes that using the key will do many searches for it, and that it is faster to scan the table.

Although they are not the exact scenario of LIMIT, the same logic applies.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.