When should we allow a column of a table from a database to accept NULL?

Asked

Viewed 7,903 times

48

This is a conceptual question that always creates confusion when deciding whether or not a column should accept NULL.

There’s a current that considers the use of NULL to be an error and that an organization should always be made in the database, normally applying normalization, thus avoiding any use of NULL values.

I have a special question with columns like character (in any of its variants) where there is a way to represent that there is no information, keeping the column empty. But this value may not be enough to indicate what you want with the column.

I’ve already got some Stack Exchange response from the DBA point of view. But in addition to the answers not being completely satisfied, they had an administrator bias and not the developer, and I also ended up restricting myself to a type of field and there was a question whether the same behavior would be valid for other types of columns that may or may not have a natural value that could represent the absence of value. You can use the value 0 or negative in some fields by defining the use of the value as absence of relevant value.

Each database system treats NULL in a different way and it is very difficult to transfer a model created in one database to another with different semantics. When you create your own semantics, it gets easier.

So, there are scenarios where NULL is really beneficial and its use brings advantages that compensate for the problems that its use can carry?

  • 2

    The SQL-ISO standard has NULL, the function COALESCE() and the operator "IS NULL"... It’s a state, meaning "uninformed value" or "non-existent value", and not a value. NULL is widely used, it makes no sense to question its usefulness. There is a small dispute of concepts and forms of modeling that can replace the use of NULL ISO. I think the question/answers is emphasizing a dichotomy ("NULL is good or evil?") that does not exist, and deviates the objective discussion on the concepts of modeling.

  • How about taking advantage of the enthusiasm to contribute to the poor Wikipedia Portuguese in this type of discussion? One can take advantage of the broad panorama raised by article in English, that already has in the answers below a great translation and didactic explanation of part of the topics(!)

6 answers

40


A bit of history

The discussion on the use of null is discussed by developers of various platforms. I’ve heard a lot about this in Java as well.

On one occasion, talking to a colleague after a NullPointerException have burst into production, he was arguing that null should not exist, that someone, I do not know where, was trying to remove the null of Java, etc. For object-oriented languages, some people suggest the use of a Pattern design (design standard) called Null Object (Null Object).

Besides, I’ve picked up some problems in several procedures and darlings arising from the failure to process fields null, mainly with null values in Boolean expressions, resulting in unforeseen behaviors (see the "table" @mgibsonbr posted in his reply).

An interesting historical account of the null is in the Wikipedia[in English].

A case where the nay use of null creates problems

Imagine a table of patients from a system for medical offices, whose field idade is unsigned integer not null. Let’s analyze the following code in PHP:

$idade = empty($campo_idade) ? 0 : intval($campo_idade);
if (!$idade) throw new Exception('A idade deve ser preenchida.');
insere_cliente(..., $idade, ...);

The above code tries to verify if the age field has a value that is considered true (true) for PHP before entering into the database. Does this work? Yes, as long as the office does not serve infants under 1 year of age.

Now let’s suppose we sell the system to a pediatric clinic. What do we do now that the value 0 (zero) should be valid? We only remove the validation (if), since the database is "protected" against null and negative values.

So we start getting a lot of complaints that certain patients are showing up at age zero. We identified that secretaries often forget to fill the age and the registration is included with zero age by default. This is easy to solve, just stop the registration to be done if the field is empty, it is not?

But let’s assume that some of the client clinics have an emergency room and sometimes it is necessary to register a patient without knowing his exact age. Managers of clinics need to know which records nay have the age filled to collect this information later. And now? Now one is missing value to say that no value and we know that zero is not an option.

What null means to you?

A null value makes a lot of sense when it is part of the business rule to treat a value uninformed or nonexistent.

The truth is that using other values, such as negative numbers or textual constants, to represent non-existent values is a mixture of concepts and generates various problems. Would that, in the absence of salt, someone should put sugar in the salt shaker to avoid the disappointment of a guest to find the empty container?

The design pattern Null Object, for example, you can avoid an exception in OO languages, but if the value is not properly treated, it can generate very strange behaviors. Imagine a user making a query and the screen opening with all empty fields because the query record was not found and a null Object was returned and "displayed" in the system interface.

Some treatments for null in SQL

Every programming language has its methods for treating nulls, but I’m not going to go into that here.

Dbms may exhibit different value behaviors null, but in my experience, just following some good development practices can avoid these problems. One of them is to not only use the operator =, but also the IS NULL or IS NOT NULLwhen searching columns that can be null, for example:

select * from tabela where (nome is null or nome = '')

Another example, according to the case outlined in the previous topic, which returns patients with no age filled:

select * from paciente where idade is null

The same logic applies to any type of field or variable. Example in Transact-SQL:

create procedure atualiza_paciente_sem_idade(
    @id int, 
    @nova_idade int
) 
as 

declare @idade_atual int
set @idade_atual = (select idade from paciente where id = @id)
if @idade_atual is null
begin
    update paciente set idade = @nova_idade where id = @id
    insert into log (...)
end

Also, it is important to use the proper routine to handle null fields before trying to manipulate the data. Example in SQL Server:

select upper(isnull(campo_possivelmente_nulo, '')) from tabela

Unfortunately, each database system seems to have implemented its own solution to return a value default in the case of a null. Oracle uses NVL, Microsoft SQL Server and Access use ISNULL, Mysql uses IFNULL and Postgresql does not have an equivalent function. See some examples at this link.

However, an alternative that seems to work in all the cited Sgbdrs is the COALESCE. This function returns the first non-null value of a list of values. For example:

select coalesce(campo_possivelmente_nulo1, 'valor inexistente') from tabela

The effect of the above command is the same as the previous one.

Another approach is to create a custom function to handle this. For example, create a function ISNULL on Oracle with the same implementation of NVL existing. However, as there are several other divergences between the SQL implementations, I believe that this border is not valid. For systems supporting various types of databases, I prefer abstraction provided by a framework ORM (Object-Relational Mapping).

Another situation to get our attention is when joins are necessary. But it is nothing too complicated. Just know difference and know when to use INNER JOIN and OUTER JOIN. Know about LEFT JOIN and RIGHT JOIN also helps.

It is very common questions on the internet like: "my query does not list a clinic when there are no registered patients, which can be?"

This is because it was used INNER JOIN in place of OUTER JOIN.

Let’s take an example:

select clinica.id, avg(paciente.idade) as average
from clinica
join paciente on paciente.id_clinica = clinica.id
group by clinica.id

The above consultation aims to calculate the average age of patients in a clinic. The problem is that only clinics that have at least one patient will be listed.

Let’s change this query a little bit:

select clinica.id, avg(paciente.idade) as average
from clinica
left outer join paciente on paciente.id_clinica = clinica.id
group by clinica.id

Now, with the LEFT OUTER JOIN, clinic without patients will return a null average. Note that in this case we do not want the average to be 0 (zero) when there are no patients, because if the clinic attended only newborns the average would be effectively zero years of age!

If anyone wants to test these queries, can use this sqlfiddle.

Considerations

The main issue is that, from the user’s point of view, an error or strange behavior caused by null is so bad when an error, strange behavior or limitation due to the nay use of null.

A very big decoy that is behind some arguments against the null is: if there were no nullFewer execution errors would occur. In fact, errors occur because developers in general do not effectively create "safe" code, that is, treating input values properly. So some want to get around the problem of code quality by removing values that make the code fail.

Validate any input, including user input and file and database readings, it is (and always will be) good programming practice. The "lazy" programmer is thinking: "my routine only released an unexpected exception because so-and-so sent the value XPTO".

Finally, null fields, which you could call optional, are a necessity in various types of situation. The effort to limit their use does not effectively improve the quality of the systems. If developers do not create a proper logic for system routines, it is useless to use nulls or values to simulate empty fields.

And if, by a change in business rules, a field becomes nullable? Bugs will appear everywhere?

  • 3

    In college I had fierce discussions with the comic book teacher - I’ve always advocated the use of NULL, shortnames and case sensite table names. That by default is an affront to Oracle’s fan boys, which was his case - 1:30 for each class was little for so much discussion. But I wanted to have known this case of the clinic/ emergency room to give example of use.

  • 2

    I agree. The NULL is my friend to express absence of value. Use codes (0, "", -1...) is just a confusing way of saying the same thing. Try to avoid system errors by taking the NULL is similar to eliminate bug using catch mute. The Pattern Nullobject, in turn, is very restricted. It already has a birth defect which is having to know what the consumer will do with it.

14

Theory

It is necessary to be very careful when using NULL for he works according to ternary logic. That is, strictly speaking, to semantics of NULL is not "no value", but "value unknown". See some truth tables for ternary logic:

    A        |  não A
-------------+--------------
Verdadeiro   |  Falso
Falso        |  Verdadeiro
Desconhecido |  Desconhecido

A e B        |                 B
      A      | Verdadeiro    Falso  Desconhecido
-------------+----------------------------------
Verdadeiro   | Verdadeiro    Falso  Desconhecido
Falso        | Falso         Falso  Falso
Desconhecido | Desconhecido  Falso  Desconhecido

A ou B       |                 B
      A      | Verdadeiro  Falso         Desconhecido
-------------+----------------------------------
Verdadeiro   | Verdadeiro  Verdadeiro    Verdadeiro
Falso        | Verdadeiro  Falso         Desconhecido
Desconhecido | Verdadeiro  Desconhecido  Desconhecido

In addition to boolean comparisons, other types of operation also have "surprising" results when admitting the NULL. For example, the following expression:

Cidade = 'Porto Alegre' OR Balanco < 0.0

Will return TRUE if the city is "Porto Alegre" or the balance sheet has a negative value, FALSE if the city is definite and different from "Porto Alegre" and the balance sheet for defined and greater than or equal to zero. Otherwise, return UNKNOWN.

The expression NULL = NULL - contrary to common sense - will not return TRUE, but yes UNKNOWN.

Practising

In general, the language that consumes the results of an SQL query do not follow ternary logic. For it, true is true and anything else is false. This allows you to use the NULL in a variety of situations without causing any problem. But the pitfalls remain: as compared with NULL mentioned above. This needs to be borne in mind when deciding to work with NULL.

Use Cases

Personally, I do not consider it good practice to use text columns nullable, for:

  1. if it is an open field, an empty string (or even one with only blank spaces) is semantically equivalent to the absence of information (i.e. for a human, it makes no difference);
  2. if it is a "choice" field (i.e. the column type is text, but there is a finite set of strings that are accepted), nothing prevents you from creating an additional entry for "not informed", "unknown", "not applicable", etc;
  3. if it is a more complex entity (which may or may not be absent), the ideal would be to create a new table to represent it (replacing the textual field with a foreign key).

Regarding the use of NULL for other types of simple data (numbers, dates), I see no problems: sometimes it is interesting to have a value that represents "missing data" - and to choose some value of the domain itself seems strange. However, I have no objective argument, neither against nor in favour...

In foreign keys, the NULL is a convenient way to represent "missing value" or "not applicable": using a particular line to represent that value would require the programmer to test an additional condition for each query executed (i.e. to test whether that line is the "special line"). That’s no problem if you already does this kind of test routinely (for example, if your system does not excludes lines, but the mark as excluded), but in general it leads to more complexity and more a source of error (forget to test for this special value). This is more important in aggregations. The standard behavior of NULL according to ternary logic is "close enough" than the programmer expects intuitively, so that the darlings remains simpler.

A particular case where the difference between using or not using the NULL is a little more significant are the tables that self-reference; a foreign key does not nullable combined with a self-adjusting primary key would require a "gambiarra" to add the first row at least (i.e. make the column nullable, add the "empty" line, assign the primary key to the "self", make the column not nullable). There is also the potential of an infinite loop if one tries to "follow" the references to the source (if the programmer forgets to test the self-reference), but in practice if you are doing so your modeling is less than ideal (a closure table would be more appropriate in this case - whether its tree-like structure or not). However, nothing described here is not manageable...

Completion

I know of no strong arguments, no specific use cases, advocating the use of NULL. If it’s important that your database is portable and/or the use of NULL is causing problems in your current DBMS, so it is viable eliminate its use in any practical situation. Otherwise, using it makes the darlings somewhat simpler and reduces the possibility of certain errors (although it introduces the possibility of other).

  • 4

    Reminders: 1) NULL is state, not value, so the false "surprise". Algebraically NULL is "contaminant", such as Infinite or Notnumber. 2) the call "NOT NULL Constraint", provided in any version of SQL, and fundamental in any modeling, allows precisely bring back to Boolean the type "ternary Boolean".

  • This is a new idea for me. I see NULL as worthlessness and not as an unknown value. The result of NULL == 0 is DESCONHECIDO not because the first variable is unknown but because it is impossible to compare a value with an absence of value - as in nature: It is impossible to know if one star is brighter than another or if one alligator is as heavy as another if there is no other star and another alligator for comparison. I think the NULL is great for expressing no value and I use it widely for this purpose both in the code and in the database.

  • @Caffé The null value as used in programming languages really has this semantics of absence of value. But according to the references I indicated (there are others, but I don’t remember now) in the context of a relational database the semantics adopted is in fact the ternary logic. This did not come out of my head, I guarantee... : P By the way, you are correct in stating that NULL == 0 is unknown, but this "unknown" ends up being represented by... NULL! ;) P.S. I see no problem at all in - in a table - use NULL as no value. I do that too.

  • @mgibsonbr I wouldn’t say that desconhecido is represented by NULL. I would say that if I try to use the result of that comparison, this result will be converted into worthless, NULL, which is a convention since represent DESCONHECIDO adds more complexity than simplicity. But I understood your point and appreciated reading your answer.

12

Not as a full answer, but as a contribution to the subject:

A possible use of Null is to differentiate values where zero numeric and empty string have different meanings of fields with missing information. A simplified example, just to clarify:

You have a registration application, and one of the fields is the guarantor of the contract. In your UI you have fields that differentiate information provided or missing. The Null in such a case it would indicate that this information is missing (we do not know if there is a guarantor or not, the user is missing clearly indicate through the UI), while the empty string would indicate that it is known that the contract has no guarantor. In this second case, the information was provided by the system user, which "by chance" is an empty string.


As you have already noticed, there are also applications for specific use, depending on the type of DB used, as in Mysql:

One of the possible uses of Null is to supply the lack of a "bit" field. The motivation for those who are adept at this technique is the following:

For you to store a mere yes or no in a numeric, however small, you use 8 bits. In turn, if Voce creates a character field of zero length, but that accepts Null, will have in practice a bit (the Null) busy to tell if it is an empty string or a null.

However, in practice, with storage space generally not being the bottleneck of current applications, I can’t see if it really is advantageous to save space in exchange for the additional logic to implement this economy (including the fact that to save space, you would need to have several fields Null in use, because anyway the padding of bitfield Nulls will be 8 bits).

8

I usually use NULL in one column, when I need to use a FK in some situations, and in others, no. Example: I have a table called evento, that records events occurring in the system. Certain records of evento are related to a table record tarefa, through a foreign key that stores the foreign table record ID.
On certain occasions there is no task related to the event, and on others there is. If there is no, I put NULL. I could not put another value, because there is no ID 0, and the foreign key does not allow me to enter the record, and any other numerical value higher than 0, would be relating to a record.

With the field of evento in NULL, I can do LEFT JOIN for example, bringing data from tarefa, if there is a related.

3

I see with good eyes the use of Null, it has its place, beyond the issue of saving bits, even if it is extremely small, what draws my attention to the use of null in a field is with respect to avoid errors and expedite the insertion of data. For example, in a table with several fields referring to Credit Values: Créditocomprax, Créditocompray, Créditocompraw and the user acquired only the credit of Compra X, so in my code I need to INSERT INTO only for Creditocomprax, and in the procedure to verify the value of the Sum of Credits, except The Sum of Debts does not cause any problems. If these fields were not null, in INSERT INTO, you would have to mention the other fields and enter their values as '0'.

  • 1

    It’s a little confusing what you posted, what your doubt?

0

Ideally a column should never accept NULL. The primary utility of NULL is so you can do LEFT OUTER JOIN, by exepmlo

SELECT * from CLIENTES 
LEFT OUTER JOIN PEDIDOS ON CLIENTES.ID = PEDIDOS.ID_CLIENTE 
WHERE PEDIDOS.ID_PEDIDO IS NULL 
GROUP BY CLIENTES.ID

With this consultation I discover customers who nay have requests.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.