Delete - Which is the fastest?

Asked

Viewed 461 times

0

I need to delete from a large table some records. What is the best alternative: Use

DELETE Tabela where id in (select id from @temporaria)

or

delete Tabela WHERE exists (select id from @temporaria  where estoque.id = id)

use IN or EXISTS ??? Or is there a better alternative ?

  • 2

    this will depend on the database, of how the engine optimizes the query, as it did not say which database, it would only reply speculation, but the best way to know, for the specific query, would be to run some query parser to see the best option

  • 2

    Possible duplicate of Not IN or Not EXISTS which to use?

  • 1

    Does your table have a fairly large data volume? If you don’t have it, don’t spend your time worrying about it. Instead, analyze the possible side effects of IN/NOT IN on the possible occurrence of NULL.

  • Is it possible ? Run the correct delete and worry about other things. Always running a cursor may be the best solution.

  • The table is large with over 56 million records, I’m deleting something around 1500 records. For what they said when I use IN he charges the bank. I noticed no difference in using IN or EXISTS. I am Using SQL Server Microsoft SQL Server 2016 (SP1) (KB3182545) - 13.0.4001.0 (X64)

  • (1) On the table @temporaria the column id is primary or exclusive key (Unic)? (2) In Tabela, the column id is the primary key?

Show 1 more comment

1 answer

2

Great question.

The article Delete row set in huge tables treats precisely on how to erase set of rows in huge tables avoiding (or trying to...) stop the other processes that are running simultaneously and accessing the same table.

Regarding the two approaches presented by the author of the topic, IN or EXISTS, the solution is always to analyze the implementation plan to evaluate which can be more efficient. I suggest reading the article "Which is faster: NOT IN or NOT EXISTS?”.

Interestingly, sometimes you realize that the query Optimizer generates the same execution plan for queries with different approaches, because everything depends on the predicates existing in the query code, available indexes and some other factors. For example, in the case in question if the column id in both tables is primary key (or has the attribute Unic) may allow for a more efficient implementation plan. Another care is to ensure that statistics are up to date.

It is necessary to keep an eye on the lock Escalation, but as about 1,500 lines per run are deleted, so there is no risk of the table being blocked. Details on Line Version Control and Transaction Blocking Guide.

  • 1

    Great article this Delete set of rows in huge tables, I did not know and I’ve favorite. Thanks!

  • @Victorfreidinger I thank for the return on the article; it was published a few days ago. Celso’s question is great, because it brings to the discussion the theme.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.