The short answer is "Because they defined it that way".
To elaborate better and give a little more foundation, follows below an adaptation of Soen’s reply, whose author claims to have been a member of the IEEE-754 Committee - which in turn is responsible for IEEE 754 standard, defining the entire operation of floating point numbers (of which the NaN
is part).
In addition to the translation/adaptation, I also joined the text with other sources (both Soen and other places) and some addenda of mine, in order to give a more general overview of why NaN
not be like himself.
Adaptation of reply from Soen to the question "What is the justification for all comparisons with NaN
return false
?"
First, floating point numbers are not real numbers, and floating-point arithmetic does not satisfy the axioms of real arithmetic. A Law of Trichotomy (that every real number is either negative, or positive, or zero) is not the only property of real arithmetic that does not apply to floating point numbers, nor is it the most important. There are other cases, such as:
- The sum is not associative.
- The distributive law does not apply.
- There are floating point numbers that do not have inverses.
This list could continue for hours... Anyway, it is not possible to specify an arithmetic type of fixed size that satisfies all the properties of real arithmetic we know. The IEEE 754 committee had to decide whether to follow all the rules or break some of them. The decision was guided by the following principles:
- When possible, have the same behavior as real arithmetic.
- When not possible, try to make violations predictable and easy to diagnose (or as close as possible).
For example, the predicate (y < x)
is asking if y
is less than x
. If y
for NaN
, then he nay is less than no other floating point value, so the answer is necessarily false
, for any value of x
.
I said that the Trichotomy Law does not apply to floating point values. However, there is another similar property that applies. Clause 5.11, paragraph 2 of standard 754-2008 reads as follows::
Four mutually exclusive relationships are possible: "less than", "equal", "greater than" and "unorganized". The latter applies when at least one of the operands is NaN
. All NaN
should be considered as not ordered with respect to anything, including himself.
As much as the treatment of NaN
may require extra code, it is usually possible (although not always easy) to structure the code in order to handle NaN
's correctly. When it is not possible, an additional code may be required, but it is a small price to pay for the convenience that algebraic closure has brought to floating-point arithmetic.
Many may argue that it would have been more useful to maintain the Law of Trichotomy and the reflective property of equality (which says that "anything is the same as itself"), therefore define that NaN
is different from himself does not seem to preserve any axiom with which we are familiar. It is understandable that many sympathize with this idea, but I think it is worth giving a little more context.
My understanding when talking to Professor Willian Kahan (author of this article and considered the "Father of Floating Spot") is that the definition of NaN != NaN
is based on two pragmatic considerations:
x == y
should be equivalent to x - y == 0
, to the extent possible. In addition to being a real arithmetic theorem, this causes the implementation of the comparison (at the level of the hardware) is more efficient in terms of space consumption, which was of utmost importance at the time the standard was created. It is worth noting, however, that this rule is violated when x
and y
are equal to infinity, so this item is not a big reason by itself; although it could have been changed for example to (x - y == 0) or (x and y are both NaN)
).
- The most important thing is that there was no predicate like
isnan()
at the time when the NaN
was formalized in the processor arithmetic 8087. It was necessary to provide programmers with a convenient and efficient means of detecting NaN
and which did not depend on programming languages implementing an operation such as isNaN()
(what could take years remember, it was another time). On this, Kahan wrote in cited article:
If there was no way to get rid of NaN
, they would be as useless as the Indefinites (similar concept of the cray computers). As soon as a NaN
is found, it would be better for the processing to be stopped rather than continue indefinitely until it reaches an indefinite conclusion. That’s why some operations with NaN
should return results that are not NaN
. What operations?
It is inevitable that people disagree about what these operations would be, but this does not give them the right to resolve these issues by making arbitrary choices. Any real (non-logical) function that produces the same floating-point result for all finite and infinite numerical values passed as argument should produce the same result if the value is a NaN
.
Exceptions are the predicates x == x
and x != x
. These are respectively 1 and 0 for every infinite value or finite number x
, but the reverse if x
for NaN
. These are the only exceptional differences between NaN
and numbers in languages that do not have a constant similar to NaN
or a predicate such as isNaN(x)
.
Perhaps this pragmatism was misguided, and the pattern could have compelled to create something like the predicate/operation isnan()
. But that would have made it almost impossible to use the NaN
in an efficient and convenient way, as the world would still have to wait several years before programming languages adopted it. I don’t believe it would have been a reasonable choice.
To put it bluntly: the result of NaN == NaN
will not change. It is better to learn to live with it instead of complaining on the internet (note: the original question has a tone that can be understood as "complaint").
From here it is no longer the translation of Soen’s answer, but my conclusion when reading it.
My understanding
From what I understand, a set of decisions were made taking into account the context of the time (hardware/software, mathematical principles and design, etc). The NaN
could have been implemented in several ways, for example an error/exception that stopped the execution of the program. A "special" value was chosen with a behavior "outside the curve" in order to allow the program to continue running (and it would be enough to check if the result was NaN
, to then take appropriate actions, such as interrupting the algorithm).
In the end, they decided that NaN == NaN
should be false for the reasons explained above. In a way it was an "arbitrary" choice, but at least there was a reasoning behind it and a whole technical basis (and of course they could also have defined that NaN == NaN
is true, but the fact is they didn’t and now we have to live with it).
For me, this is the only correct answer to "Why is it so?". NaN
He’s not the same as himself because they’ve decided he is. Any other "response" is actually talking about the reasons that - in my view - potentially led to the decision:
- "
NaN
it’s not a number, so it doesn’t make sense that he’s the same as himself"
- "
NaN
is equivalent to an indeterminate value (has no actual value), so it cannot be equal to anything"
- "
Nan
is a special case"
- etc....
These phrases, in my view, explain the reasoning more than can have led to the decision of what the real reason for it was taken. The real reason is: considering all that has already been said above (the context of the time, the principles of remaining true to real arithmetic to some extent, etc.), they eventually decided that NaN
being different from himself was the "best" option.
<if they had decided that NaN
is equal to himself, maybe today we were discussing whether it should not be different, using the same arguments above :-)</speculation>
Other answers and links I’ve researched always follow this line of explaining that NaN
is a "special value", "different" and therefore has this behavior distinct from the "normal" numbers. In the end, it is not considered a fact number (so it is called "Not a Number"), is a type of placeholder representing an undefined value or state. This answer, for example, it argues that the name is not good because it confuses, and that it should be called "Numerical Exception" or something like that. This would make their distinctive behavior clearer and perhaps cause less strangeness when used (or not, now we have no way of knowing).
The NaN
would, in a way, be the equivalent of mathematical concept of Undefined (Undefined), when an expression does not have an associated value. So much so that many mathematical operations that are considered undefined produce NaN
in the programming languages that use IEEE 754, such as dividing zero by zero or subtracting one infinity from the other, which usually result in NaN
(in addition to the others already mentioned in one of the answers). This, moreover, is the line of reasoning used by many to explain their peculiar behavior (if it has no definite value, it has no way to be compared to anything, nor to himself - but again, it explains one of the reasons that may have weighed for the final decision, but it is not "the answer" itself).
Ultimately, the answer is "Because they defined it that way" (based, of course, but still "arbitrary").
valores binários confusos impedindo operações aritiméticas
This makes no sense. Every binary value is a discrete numerical value. There is no way to be confused nor how impossible or impractical the application of arithmetic.– Oralista de Sistemas
I changed the term
confuso
forarbritários
, thus makes more sense in this context. Since it has relation to the arbritarity in the precision arithmetic of such conversions that generate Nan.– LeonanCarvalho