Optimization of SQL code

Asked

Viewed 312 times

12

How can I optimize the following code to not use 3 Selects and not plaster the query to only 3 status

(
    SELECT * 
    FROM historico 
    WHERE 
        his_status = 'FRACASSO' 
    ORDER BY his_data DESC 
    LIMIT 50
)
UNION ALL
(
    SELECT * 
    FROM historico 
    WHERE 
        his_status = 'REAVALIAR' 
    ORDER BY his_data DESC 
    LIMIT 50
)
UNION ALL
(
    SELECT * 
    FROM historico 
    WHERE 
        his_status = 'SUCESSO' 
    ORDER BY his_data DESC 
    LIMIT 50
)

At the moment I have 2562 records in the historical table and I am using them all to train an RNA. It costs 3 to 4 minutes. This query will allow the user to indicate how many samples of each status you want to use. Quantity 50 is an example. Query optimization is also to decrease the cost of time.

  • 1

    Yes, it is valid to use in, as long as I inform the interval by the application, but it can only be used if the query is optimized. With only 1 select, at least.

  • 1

    I didn’t pay attention to the tripo limit , 50 of each case.

  • 1

    Related: https://answall.com/q/254540/57801

  • An index talves help , using the same UNION , index his_data,his_status

  • 1

    Normally UNION is a reasonable solution, dynamism can be produced in the client language (when available). Perhaps it was the case to use a function or precedent, but usually it is not justified.

  • Updated response.

Show 1 more comment

2 answers

10

You can do it this way. It will only return records with "line" less than or equal to 50 (ie, 50 records max. of each type):

SELECT *
FROM 
(SELECT *,
    (@numero:=IF(@status = `his_status`,
     @numero+1,
     IF(@status := `his_status`, 1, 1))) linha 
  FROM historico a
  CROSS JOIN (SELECT @numero:=0) b
  ORDER BY his_status, his_data DESC
) AS c 
WHERE c.linha <= 50;

Sqlfiddle by friend’s indication @Everson in the comments: http://sqlfiddle.com/#! 9/e6166e/5/0

  • @Everson Updated.

  • 1

    @DVD If you want to use in your reply: http://sqlfiddle.com/#! 9/e6166e/3

  • 1

    Your answer is partially correct, but consider that a number will have to be assigned to all lines. What will happen to the performance in this case?

  • 1

    @Sorack The answer is not about performance. It’s just a functional suggestion. If someone has another answer that can cover functionality and performance, it would be interesting.

  • Updated question.

  • @Billlyjow Speak up! If the question was edited because of performance, see this print with a result I made using the answer query with a table of more than 51 thousand records, and it took only 0.18 seconds: https://i.stack.Imgur.com/cmjn3.png

  • My historical table has 32 fields. Anyway, I will try to use your code.

Show 2 more comments

10


Here is an alternative capable of bringing up to 50 records of each status sorted by date. In case of a tie between two records the result is unpacked by id (the smallest id prevails).

SELECT h.*
FROM historico AS h LEFT JOIN historico AS h2
    ON h.his_status = h2.his_status 
       AND (h.his_data < h2.his_data OR (h.his_data = h2.his_data AND h.id > h2.id))
WHERE h.his_status IN ('FRACASSO', 'REAVALIAR', 'SUCESSO')
GROUP BY h.id
HAVING COUNT(*) < 50
ORDER BY h.his_status, h.his_data DESC, h.id;

See working on DB Fiddle

This version is more compact. You can filter how many status want. You can even remove the clause where to return the first 50 results of all status of your table.

I can’t compare the performance of this solution with the original version with UNION ALL. This has to be done with your dataset complete, with the exact same table and index structure, in the same version of Mysql with the same settings. That said the link above shows the execution plan for a small amount of data in the absence of any index other than that created implicitly by PK, using Mariadb 10.2. Under these conditions the original version is scanning the full table for each status desired. The above version, although it also scans the full table twice and uses a temporary table, does so regardless of the amount of status in the filter.

  • I ran the tests running 5 times the three queries and taking an average. Mine takes around 0.031s. A of the DVD takes around 0.053s. Yours in the 5 rounds showed 0.00. Actually the existence of more Selects in the Query drops the bank’s performance.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.