Compare a null bit column

Asked

Viewed 394 times

1

I know that the results of the queries below are the same, but I would like to know if there is a difference in the execution by Sqlserver on performance issues and etc.

SELECT * FROM CLIENTES C WHERE C.FL_EXCLUIDO IS NULL OR C.FL_EXCLUIDO = 0

or

SELECT * FROM CLIENTES C WHERE ISNULL(C.FL_EXCLUIDO, 0) = 0
  • How many lines, approximately, does the CLIENTS table have? // What percentage of FL_EXCLUIDO 'false' or NULL over the total?

2 answers

1

I had already seen some things about this, there is even a very interesting post in the English OS: https://stackoverflow.com/questions/3118213/isnull-vs-is-null

But in the link example, you are comparing a text field, and in your case it is a bit field.

So I decided to make a test with a table here. Enabling Exec Plan, I executed a query similar to yours, using a field bit null, and in two tests, one with a table with a few hundred records and the other with a few thousand records, the result was the same: IS NULL required the creation of an index for tables. The function ISNULL executed without requesting an index.

Regarding the time of execution, in the table with about 50 thousand records there was no relevant difference, as well as in a table with a few million records, which was the following:

IS NULL: 01:21 (1 m2s)
ISNULL(): 01:19 (1 m19s)

Here is the result of the Exec Plan to analyze: inserir a descrição da imagem aqui

This may lead to the conclusion that IS NULL would theoretically run faster, since it suggested creating an index, but this may not be true.
To conclude, I decided to create the index and test the execution times again, which were:

IS NULL: 00:08 (8s)
ISNULL(): 00:03 (3s)

With this we can see that in the case of a field bit nullable, ISNULL had the same performance without index, and with index was much higher, it took 37.5% of the time of the IS NULL, that is, less than half, much faster.
Of course other scenarios and a clause WHERE different, can bring different results, being always recommending to analyze Exec Plain to determine the best query.

0

Rafael, there are several factors that determine the performance of a SELECT construction, including whether there are indices and, in this case, what are the characteristics of the existing indices.

Immediately, some observations on the two constructions that you published in this topic, which, to facilitate the comments, I referenced as

(to) SELECT * FROM CLIENTS C WHERE C.FL_EXCLUIDO IS NULL OR C.FL_EXCLUIDO = 0;

(b) SELECT * FROM CLIENTS C WHERE ISNULL(C.FL_EXCLUIDO, 0) = 0;


(1) In construction (a), the WHERE clause contains the logical operator OR. Generally, type restrictions

WHERE expressão_lógica1 OR expressão_lógica2 OR ...   

can force (or not) a sequential reading, either by index (index scan) or directly in the table (table scan). I suggest you review the implementation plan to verify what the query Optimizer chose. Tips in the article "The Perfect Plan”.


(2) In the construction (b) the clause WHERE contains function call having as parameter table column. This usually prevents SQL Server from finding the appropriate index (if any), as the restriction becomes non sargable, also occurring sequential reading, either through index (index scan) or directly in the table (table scan). Again I suggest you review the execution plan. Details in article "Building Efficient T-SQL Code: Sargability”.


(3) Another factor that can influence performance is what contains the list_de_columns of the SELECT clause. Constructions of the type

SELECT *

may make the choice of index impossible nonclustered, especially if it is not covered. This includes the presence or not of the operator Key search (Key lookup) in the implementation plan.


(4) Assuming that the purpose of the query is to obtain customer identification only, we could then have

(c)
SELECT idCliente FROM CLIENTES C WHERE C.FL_EXCLUIDO IS NULL OR C.FL_EXCLUIDO = 'false';

but still it would occur sequential reading (scan), because of (1). Unless there is an index in the column FL_EXCLUIDO and containing the column idCliente:

CREATE nonclustered INDEX I2_CLIENTES on CLIENTES(FL_EXCLUIDO) include (idCliente);

The construction

(d)
SELECT idCliente FROM CLIENTES C WHERE C.FL_EXCLUIDO IS NULL     
union all      
SELECT idCliente FROM CLIENTES C WHERE C.FL_EXCLUIDO = 'false';

can be efficient if the following filtered indices exist:

CREATE nonclustered INDEX I3_CLIENTES on CLIENTES(FL_EXCLUIDO) include (idCliente) where FL_EXCLUIDO = 'false';     
CREATE nonclustered INDEX I4_CLIENTES on CLIENTES(FL_EXCLUIDO) include (idCliente) where FL_EXCLUIDO is null;

Of course it also depends on the volume of data.


(5) I could list here several other factors. However, the construction (b), for being non sargable, will occur sequential reading (scan). Unless the technique of forcing the predicate to sargable is used, as described in the article "Building Efficient T-SQL Code: Sargability", item 4.2.


(6) The values valid for the column FL_EXCLUIDO are true and false, in addition to the absence of information (NULL). That is, the construction (a) could be rewritten as

(e)
SELECT * from CLIENTES
except
SELECT * from CLIENTES where FL_EXCLUIDO = 'true';

The efficiency (or not) of this solution depends on the characteristics of the column FL_EXCLUIDO and the existing indexes.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.