Not IN or Not EXISTS which to use?

Asked

Viewed 82,876 times

50

I’ve seen some answers here with the use Not In and some with the use of Not Exists to answer the same question.

I don’t know the difference between the two and would like to know about:

  • Which of the two is best when considering performance?
  • On what occasions should I wear one and the other?

If possible examples with Query!

  • This generated a debate here at the company yesterday same rs, the DBA commenting on the use of not in and of not exist in a script we created

  • @Marcelobonifazio I usually use not in, but I don’t know which two is more efficient.

  • I use the not in also. I can’t answer if you perform better or not.

  • 1

    I use NOT EXISTS. This reply in Soen explains well what may be different in the use of each of them.

  • Here is another reason to consider avoiding the use of NOT IN (or IN): http://stackoverflow.com/questions/4722220/sql-in-clause-1000-item-limit

  • Personal thank you.

  • that article in the Mysql manual (in English) describes some conditions under which it is possible to optimize queries with Subqueries using EXISTS

  • @Penalty Thank you.

  • Each case is a case. I suggest reading the article Which is faster: NOT IN or NOT EXISTS?: https://portosql.wordpress.com/2020/05/02/not-in-x-not-exists/

Show 4 more comments

4 answers

46


This will depend on the situation. Not in is recommended when you already have the values and pass as parameter, for example:

select * from tbl where status_id not in (1,3,7)

Now if you are going to make a subselect I recommend using not exists, because in the first result it will already validate and pass to the next record.

select * from user u where not exists (select 1 from player p where p.user_id = u.id limit 1)

Since I don’t need the player table fields the return 1 is faster than *. The limit 1 (changes the way of writing according to the database) will make finding 1 record is enough to say that it exists. Update: in some situations the limit 1 can leave faster according to this reply.

I hope that’s clear.

Performance: as Paulo mentioned in the answer below, not exists allows the use of index, which in tables with many records can greatly improve performance compared to not in.

  • 1

    Very good the answer.

  • Thank you very much! :)

12

Complementing the above answer. The goal now is to find all employees who are not managers. Let’s see how we can achieve this using the NOT IN and NOT EXISTS.

inserir a descrição da imagem aqui

NOT IN

SQL> select count(*) from emp_master where emp_nbr not in ( select mgr_nbr from emp_master );
COUNT(*)
———-
0

NOT EXISTS

SQL> select count(*) from emp_master T1 where not exists ( select 1 from emp_master T2 where t2.mgr_nbr = t1.emp_nbr );

COUNT(*)
———-
9

Now, there are 9 people who are not managers. So, you can clearly see the difference which values NULL do since they NULL! = NULL in SQL, the clause NOT IN does not return any registro back.

Performance implications:

When using NOT IN, the query performs nested table scans, while for NOT EXISTS, the query can use an index within the sub-query.

6

EXISTS is more performatic than the IN, but it is not a good practice, it depends on your need....

The in "creates" a structure of or’s in its sql condition, Exp:

select * from tabela1
where idtabela in (select idtabela from tabela2 where condicao = 2)

Let’s say there are 5 conditions = 2 of idtable in this table2 , the bd would interpret the in as:

select * from tabela1 t
inner join tabela2 t2 on t2.idtabela = t.idtabela
where idtabela = 1 or idtabela = 2 or idtabela = 3 or idtabela = 4 or idtabela = 5

Now when using the exists bd "force" a relationship between the two tables Getting kind of like this:

select * from tabela1 t
inner join tabela2 t2 on t2.idtabela = t.idtabela
where condicao = 2

1

The answer depends first on which database manager (sgbd) the internal implementations may not be the same for NOT IN and NOT EXISTS.

In the case of SQL Server I recently published the article "Which is faster: NOT IN or NOT EXISTS?" dealing with the issue of "how to know which elements are in set A but are not in set B". In addition to NOT IN and NOT EXISTS they are treated of other options like EXCEPT, LEFT OUTER JOIN and OUTER APPLY.

First, it is always recommended review the implementation plan. And it is necessary to keep in mind that the execution plan generated varies according to several factors, including the characteristics of the data.

In my opinion there is no way to define a priori which method is the fastest.

When analyzing other articles on the subject, it can be seen that in some cases the situation is optimized to obtain a certain conclusion. For example, in the article "Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?", by Aaron Bertrand, he states early in the article that "Instead of NOT IN, use a Correlated NOT EXISTS for this query Pattern. Always". That is, according to him NOT EXISTS is always faster. However, he quotes "for this query Pattern". What standard of consultation? Direct consultation between only two tables and that have indexes that allow the direct joining between the tables. Well, there is papaya with sugar, right?

In the article "Consider using [NOT] EXISTS Instead of [NOT] IN with a subquery"by Phil Streiff, "It used to be that the EXISTS Logical Operator was Faster than IN, when Comparing data sets using a subquery. (…) However, the query Optimizer now Treats EXISTS and IN the same way, Whenever it can, so you’re unlikely to see any significant performance Differences”.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.