Cast in MYSQL X Performance

Asked

Viewed 32 times

2

I’m trying to improve my Sqls in Mysql and PHP, in the analysis I ended up coming to a CAST function is increasing by at least 5X the search time as follows below:

CUSTOMERS table where the DATACAD field is DATETIME to store date and time: Index created in the DATACAD field

SELECT * FROM CLIENTES WHERE (CAST(DATACAD AS DATE) BETWEEN '2021-01-27' AND '2021-01-27')

The above example takes 0.178 seconds

SELECT * FROM CLIENTES WHERE (DATACAD BETWEEN '2021-01-27' AND '2021-01-27 23:59:59')

Example 2 takes 0.031 seconds

In the second example I am taking into account that the registrations were created between 00:00:00 and 23:59:59 of the day 27 so I took the CAST function

It seems to me that when using the CAST function the engine ignores the INDEX of the field in question, that’s right?

How to build a good Query when we need to bring part of the field?

Note: These fields already exist at a time and are already well populated, so separating date and time is out of the question :-(

It may seem derisory this type of example, but I have several Sqls that use this type of field and are starting to get in the way of performance.

1 answer

3


"It seems to me that when using the CAST function the engine ignores the INDEX of the field in question, that’s right?"

Answer: no, do not ignore, what happens is that to get to the INDEX you must first do the conversion, which costs time, so the first query is slower.

To prove this, I replicated your scenario and used the EXPLAIN to display the execution plans of both queries, and both use the INDEX.

See here: https://www.db-fiddle.com/f/9DKhXgoXTd4u1yqUf8nXp7/0

Here the result of the first and second query, in the same sequence as the question:

**Query #1**

    EXPLAIN SELECT * FROM CLIENTES WHERE (CAST(DATACAD AS DATE) BETWEEN '2021-01-27' AND '2021-01-27');

| id  | select_type | table    | partitions | type  | possible_keys | key     | key_len | ref | rows | filtered | Extra                    |
| --- | ----------- | -------- | ---------- | ----- | ------------- | ------- | ------- | --- | ---- | -------- | ------------------------ |
| 1   | SIMPLE      | CLIENTES |            | index |               | ix_data | 6       |     | 6    | 100      | Using where; Using index |

---
**Query #2**

    EXPLAIN SELECT * FROM CLIENTES WHERE (DATACAD BETWEEN '2021-01-27' AND '2021-01-27 23:59:59');

| id  | select_type | table    | partitions | type  | possible_keys | key     | key_len | ref | rows | filtered | Extra                    |
| --- | ----------- | -------- | ---------- | ----- | ------------- | ------- | ------- | --- | ---- | -------- | ------------------------ |
| 1   | SIMPLE      | CLIENTES |            | range | ix_data       | ix_data | 6       |     | 2    | 100      | Using where; Using index |

View on DB Fiddle

Note in the "key" column result, both queries have the value "ix_data" which is the name of the index that was created, so both use the index.

Now note that in the "type" column, which shows the type of "Join" between the tables. The query does not have Join, but the WHERE influences this, because it is possible to do Join using Where, and in the first query the type is "index" and the second the type is "range". If you join this query they will be affected by this plan. See what the Mysql documentation talks about Join-types:

  • range: Only Rows that are in a Given range are retrieved, using an index to select the Rows
  • index: The index Join type is the same as ALL, A full table scan is done for each Combination of Rows from the Previous Tables

If translated, the crease will restrict the returned lines to a range based on the index, and the index makes a "full table scan", that is, a complete survey.

In summary, if there are joins this time difference should worsen using the CAST, but both queries use the index.

  • Bacana @Ricardo Pontual, in this case it would be more "profitable" to add the time "23:59:59" the final date for a better performance right? Since the win comes to 5x in my case!

  • Yes it would, and avoid CAST. I added more information, see that because of the CAST, each date has to be computed, which has a much lower performance. I really like these analyses :)

  • That time, I’m not DBA so turn and move I google or here in Stack to improve my searches, and this tips were a hand in the wheel, thanks!

  • Just adding, you say then that the best way to make select with indexed DATETIME would be by putting date and time start and end right?

  • 1

    this, no between use "00:00:00" until "23:59:59"

  • So @Ricardo Punctual, I have doubts about COALESCE and IN too, I think I’ll put a post here, example, select * from Where table (cpf_cnpj in(taxpayer_id, ein)) or select * from Where table (cpf_cnpj coalesce(taxpayer_id, ein)), is a table that stores CPF or CNPJ in separate fields one of the two will always be null, in my tests it seems that IN and COALESCE have the same performance.

  • @Marcelo Put your question about coalesce in a new question, is better for the site.

  • as @Piovezan commented, post another question, it is easier for the community to find, as it will have words like COALESCE and IN

  • @Marcelo Isso, for kindness.

Show 4 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.