How to solve special situations of unnatural value for a domain?

Asked

Viewed 256 times

16

The question has a slightly more conceptual bias, so I’m going to use a simple generic example.

There are cases where some column may be in a special situation, this because an information:

  • is not available,
  • is invalid at any time,
  • is unknown,
  • is undetermined,
  • has no relevance,
  • may be in a special situation that only makes sense for the domain specific.

One of the ways of indicating the special situation is to use of null, which is already questionable whether it is something good or bad, although everything has its merit in certain circumstances. The use of null is a clear case of choosing unnatural special values to the domain of the column to indicate one special situation.

But you may need to indicate more precisely what the special situation is, the reason for the current state. In question about the null there is a definition of how to solve a special situation. But there may be more than one reason, and worse, it may be that new reasons are included in the future.

For example, in an academic control there is a column with students' grades in disciplines may not be enough to have a zero grade. A zero note not the same as no note. And a null It may not be enough since it only indicates that there is no grade. You need to know why there is no note, since each circumstance may require a different treatment. You need to know if the note:

  • has not yet been given,
  • does not exist because the student is excused from that evaluation,
  • he did not make the assessment (which in this case is different from him having zero grade).

I know that in some cases we can avoid that there are certain reasons improving the normalization. That’s not the point here.

There are basically two solutions to treat this:

  1. Documents that certain values are considered as these reasons (e.g., values above 10 or negative in a note are special situations broken down into each of these values outside the normal range in the example -1 would indicate the reason a quoted above, -2 the reason b and -3 for the reason c).
  2. An auxiliary column is used to indicate the reason for the special situation (Ex.: there will be an indication of the reasons a, b or c cited above, and of course the indication that the note is valid and should be considered normally, even if it is zero).

In a domain with special situation we should use auxiliary values or columns to indicate the situation? That is, there is more perks in one than the other? Particularly prefer the second.

How to deal with these cases? After all, darlings can give wrong results without considering these cases. And this holds for either of the two options.

A typical example of the problem is the calculation of the average values of selected lines. But all the problems that involve null, also apply in this case. Database systems often have a solution to deal with the null.

Bonus point: is there a way to abstract (make transparent) this? Or the user (maybe a developer) will have to know this need and handle it properly in each query?

  • You turned my head. I didn’t understand anything you wanted to know.

  • @Jorgeb. Do you understand the subject I’m talking about? If you have any specific questions, can I try to help clarify it for you.

  • 1

    Yes I understand, then I read better at home and say something. Meanwhile I delete these comments.

  • 1

    I would like to understand why you are considering the very broad question.

  • 3

    @Jorgeb. If I understand correctly, the crux of the question is we should use auxiliary values or columns to indicate the situation? "The situation" being that a NULL can represent several things.

  • 4

    I also did not notice the bigown too ample, nor any other reason for closure. Sometimes I think people don’t even read the questions.

  • 3

    @bfavaretto is this, but obviously I want the reason and not a simple "this or that". I don’t want a chapter in a book, I don’t want multiple answers where anyone can be right. A simple explanation of why to use one or the other option. It’s very similar to the question about the null which was not considered broad.

  • 4

    @Jorgeb. It’s true. If you really have a problem, I’m willing to improve it. I just need to understand what, 'cause I may have failed, but I worked really hard not to leave her in trouble.

  • 1

    I read the question several times and didn’t really understand it. I didn’t have time to comment on why I didn’t understand, so I voted to close. Now with the editions, with the example of the student’s note and the answer of @Bacco I understood the question perfectly. Vote to close withdrawn.

  • The use of null in optional fields also fits in special situations? For example I have a scheme with tables 'Question' and 'Choice', but when the user answers he can make a free answer. When his answer is 'free' the choice is null, or when he chooses a 'choice' his free answer is null. Following the reasoning of the question, I could put another value in the place of null, for example, 'does not apply'... It would really be nice if it were abstracted.

  • @Earendul When you only need to indicate exists or not, the null solves well, even in one more column. That is when you have only one special situation for the domain of that column. The fact that you have a special situation in another column, is a separate problem. I understood that they are mutually exclusive, but there is still only one reason for the absence of value in each column. There are no two reasons for the same column.

  • 5

    @Earendul Not understanding a question is an excellent reason to not vote to close. Others may have (and seem to have) understood. In these cases, ask for clarification in the comments. Vote to close as "not clear" when relevant information is missing from the question.

  • 2

    @bfavaretto - I get it, I just failed to be too hasty, I really could have waited for more details before voting. It won’t happen again...

  • I think that this is a very broad question that depends on the analysis of each situation and that there will be no right or wrong

Show 9 more comments

2 answers

11


All situations, like almost everything that allows for this type of discussion, have advantages and disadvantages. Here’s a sketch of some of the most common ones I come across. Note that I am using very basic references and using pseudocode for clarity only:

The use of null

For the obvious example of school grades, the null is a good option, in order to differentiate a zero note from a note not yet launched in the system, if the special condition is only the absence of a note.

The caution to be taken in these cases is that to use this data, the darlings will need special conditions, such as a WHERE not_null( nota ) to calculate an average, for example.

A very common error is when the user makes one of these: WHERE coluna = null. This is a very common problem of those who are not used to the concept of null. Null cannot be used as value in normal operations without the result also being null.

Always remembering that semantically the null is treated as unknown value, which is not always the same as missing value.

The use of "magical values"

Still in the note example, we could do differently: say that note -1 (or note 100, whatever is agreed) means unfilled note.

In this case we’d still have to worry about a WHERE to do all the operations, and in addition we would have to have one of these values as default field when creating the table, otherwise make a big mess when developing the application.

nota INT() DEFAULT 100

Whereas some DB connectors have problems returning the null, a possible advantage of using magic values is the ease of treating these conditions on the application side.

Another advantage of this solution is that magic values can be used to indicate more than one situation, not just a special value (which has already been said in Edit of the question, but remember). For example, 100 is "waiting", 101 is "dismissed", 102 is "substitutive work" and so on.

Is it not the case that Enum?

Still taking the example of the notes, an output very similar to the "magic values" is in "give name" to these values, defining the field this way:

nota ENUM( '0','1','2','3','4','5','6','7','8','9','10','aguardando','dispensado')

The advantages are the readability of the data when analyzing and editing the DB with native tools, and the fact of the values of Enum can still be used for averages and sums, provided they are created in the correct order.

The disadvantage is the same as the magic values, the need for conditions to obtain aggregated values, so that only the values from 0 to 10 are considered. You may even need some type conversion in the aggregators, and there may also be an offset between the zero note and the Enum value, depending on the DBMS and the internal value of the initial item.

The use of extra columns

This case should probably only be used when the separation of special and "conventional" values can actually help the darlings or the application in any way. More rarely, in cases where it is not acceptable any of the aforementioned possibilities, either by performance or limitation of the technologies being used.

If you have many exceptions, the extra indexed column can speed up the query, returning only what is needed, speeding up, for example the use of aggregator functions like SUM() and AVG().

Data collections beyond the table structure

Let’s forget the grades now, this is for completely different data. If you face situations where the need for special values is great, and which could occur in several columns in the models previously proposed, you may even think about using a special column instead.

For example, a column of metadata, where you could store collections verbatim, such as "collation of degree:12/03/1960, marriage:21/05/1967, death:03/08/2011" in the same field. Remembering that this case makes more sense in relational Sgdbs.

The problem here is that these data are almost unfounded outside the main application.

Always remembering that this case is for very specific situations only, I only mentioned as a complement. Probably the developer will notice when he needs an output of this type.

9

The answer describing the decision of the ideal procedure will depend on two factors:

  • How resistant your model is to changes, and
  • How resistant to change dependent processes are.

Let’s go to a simulation. In the case mentioned, we can assume a model like this:

inserir a descrição da imagem aqui

Which translates as follows::

The domain where a student’s grade is expressed in a given subject.

We now have a need to Overload of that domain:

How to express the reason why a given student has no grade in a given subject?

What it entails in:

How to express an absence of note?

That’s where the value Null may be important, serving as a marker indicating that the context does not apply to the expected domain.:

We have a situation where I cannot express a note, but it is not my responsibility to describe the reason. Your princess is in another castle.

It is at this time that the two questions above should be evaluated.

Where both the model and the processes are flexible

The recommended is to continue maintaining your physical model as an expression of the logical model (which obviously requires the domain Reason why the student has no grade) and refactoring both the model and the processes:

inserir a descrição da imagem aqui

If your model is flexible, but the processes are not

You can implement the new domain by maintaining the interface expected by the non-defactoring processes.

One possibility is to change the name of the original table (which will be consumed by the new processes that know of the presence of a new domain) and create a view with the same name as the old table (which will feed the legacy processes):

inserir a descrição da imagem aqui

This is a process that pays attention to any reading method; several Dbms also allow inferred data manipulation by a view (such as MS SQL Server and Oracle, for example), thus allowing methods UPDATE, INSERT and DELETE are executed via reference.

If both the model and the processes are rigid

The option is to create the domain in a separate repository (another template, or directly in the code via enumeration, static or similar collections).

Always keep in mind that the closer your data model is to the domains it seeks to express, the better; this translates into shorter deadlines for maintenance, process mapping and model expansion.

Solutions involving out-of-scope domains ('tables' in code or similar) obfuscate aspects of the model and tend to Spaghettize the solution.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.