Query DELETE taking too long to load

Asked

Viewed 545 times

1

I’m having a problem with a query, it has the function of deleting some existing duplicate records in a table.

Follows:

DELETE t1 
FROM bid_account t1 
    INNER JOIN bid_account t2 
WHERE t1.id > t2.id AND 
    t1.bidding_price = t2.bidding_price AND 
    t1.auction_id = '" . $obj->auction_id . "' AND 
    t1.bidding_type = 's' AND 
    t1.bid_flag = 'd' AND 
    DATE(t1.bidpack_buy_date) >= DATE(NOW())

The problem is that when executing it, it takes a long time to complete and does not solve anything, it does not seem to work.

I made a select with these same conditions, and something absurd happened, he repeated several records, which do not exist, eg for the auction with ID 15 has 100 records, when consulting using these conditions he returns more than 80 thousand records.

How can I make this query work properly?

  • There seems to be a logic error in his SQL. If the query returns more than 80,000 records it is because your relationships are incorrect. Or maybe the result is the same. Anyway, I would avoid doing that JOIN that you made.... Maybe you should make a Exists, to avoid such double standards.

  • Also I think you need to improve your sql.. has a lot of clauses besides Join. If select already returns many records, it will take even delete. Try to simplify

2 answers

10


Before we answer, let’s put a few points on is:

I made a select with these same conditions, and something absurd happened, he repeated several records, which do not exist, eg for the auction with ID 15 has 100 records, when consulting using these conditions he returns more than 80 thousand records.

Nothing absurd happened. This result is a Cartesian product that was triggered by the fact that you declared a JOIN table bid_account with herself without any restriction. So, supposing you had 10 records in it. The result would be 10 * 10 = 100 records in the initial result. Next is that the restrictions you stated in the WHERE, what could decrease that number (but not too).

When executing it, it takes a long time to complete and does not solve anything, it does not seem to work.

Delay! And for every new record that is inserted into it, the delay increases exponentially. This is certainly for the same reason as the above point.

TLDR version;

If I understand correctly, you want to delete records that are completely repeated on bid_account and keep only what has the smallest id. In this case, your query should be like this*:

DELETE t1 
FROM bid_account t1 
WHERE EXISTS ( SELECT 1 
               FROM bid_account t2 
               WHERE t2.bidding_price = t1.bidding_price AND
                    t2.auction_id = t1.auction_id AND
                    t2.bidding_type  = t1.bidding_type AND 
                    t2.bid_flag = t1.bid_flag AND 
                    DATE(t2.bidpack_buy_date) = DATE(t1.bidpack_buy_date) AND
                    t2.id < t1.id) AND
    t1.auction_id = '" . $obj->auction_id . "' AND 
    t1.bidding_type = 's' AND 
    t1.bid_flag = 'd' AND 
    DATE(t1.bidpack_buy_date) >= DATE(NOW())

* Note: You should evaluate whether the rules that define your "duplicity" are those I stated in subselect. That is where you must identify them. No WHERE you set your criteria for exclusion (except what identifies duplicity). So your query gets cleaner and easier to make adjustments like changing the criteria

A little more about what happened

In your query, with what was declared as restrictions in where and assuming the table has the contents below, we would have the following:

----------------------------------------------------------------------------------------------
| id | bidding_price |        auction_id        | bidding_type | bid_flag | bidpack_buy_date |
----------------------------------------------------------------------------------------------
|  1 |           2.0 | " . $obj->auction_id . " | s            | d        |    05/05/2018    | // Seria eliminado de t1 porque o id não é maior do que nenhum de t2
----------------------------------------------------------------------------------------------
|  2 |           2.0 | " . $obj->auction_id . " | s            | d        |    13/08/2018    |
----------------------------------------------------------------------------------------------
|  3 |           2.0 | " . $obj->auction_id . " | s            | d        |    17/10/2019    | 
----------------------------------------------------------------------------------------------
|  4 |           1.2 | " . $obj->auction_id . " | s            | a        |    22/02/2019    | // Seria eliminado de t1 pelo flag 'a'
----------------------------------------------------------------------------------------------
|  5 |           1.2 | " . $obj->auction_id . " | t            | d        |    30/09/2018    | // Seria eliminado de t1 pelo tipo 't'
----------------------------------------------------------------------------------------------
|  6 |           1.2 | " . $obj->auction_id . " | s            | d        |    01/04/2017    | // Seria eliminado de t1 pela data menor que hoje
----------------------------------------------------------------------------------------------

So, that leaves two records of t1 and all of t2 for at least one record of t2 has id smaller than each id of t1 and has bidding_price also equivalent. Knowing that the processing would already be relatively slow by the Cartesian product, if we made a Select t1.* ... we would end with the following (2 * 6 = 12):

----------------------------------------------------------------------------------------------
| id | bidding_price |        auction_id        | bidding_type | bid_flag | bidpack_buy_date |
----------------------------------------------------------------------------------------------
|  2 |           2.0 | " . $obj->auction_id . " | s            | d        |    13/08/2018    |
----------------------------------------------------------------------------------------------
|  3 |           2.0 | " . $obj->auction_id . " | s            | d        |    17/10/2019    | 
----------------------------------------------------------------------------------------------
|  2 |           2.0 | " . $obj->auction_id . " | s            | d        |    13/08/2018    |
----------------------------------------------------------------------------------------------
|  3 |           2.0 | " . $obj->auction_id . " | s            | d        |    17/10/2019    | 
----------------------------------------------------------------------------------------------
|  2 |           2.0 | " . $obj->auction_id . " | s            | d        |    13/08/2018    |
----------------------------------------------------------------------------------------------
|  3 |           2.0 | " . $obj->auction_id . " | s            | d        |    17/10/2019    | 
----------------------------------------------------------------------------------------------
|  2 |           2.0 | " . $obj->auction_id . " | s            | d        |    13/08/2018    |
----------------------------------------------------------------------------------------------
|  3 |           2.0 | " . $obj->auction_id . " | s            | d        |    17/10/2019    | 
----------------------------------------------------------------------------------------------
|  2 |           2.0 | " . $obj->auction_id . " | s            | d        |    13/08/2018    |
----------------------------------------------------------------------------------------------
|  3 |           2.0 | " . $obj->auction_id . " | s            | d        |    17/10/2019    | 
----------------------------------------------------------------------------------------------
|  2 |           2.0 | " . $obj->auction_id . " | s            | d        |    13/08/2018    |
----------------------------------------------------------------------------------------------
|  3 |           2.0 | " . $obj->auction_id . " | s            | d        |    17/10/2019    | 
----------------------------------------------------------------------------------------------

I hope this helps.

1

The best solution I can come up with is for you to draw the line. Delete in the same by adding, for example, LIMIT 10000 at the end of the query. Delete the records in the same but by parcels.

A cycle or routine to run this query and I think the problem is solved.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.