How does the truncation of a float or double number occur?

Asked

Viewed 65 times

2

I have the following excerpt:

double valor = 1.5;

while(valor >= 0) {
    System.out.println(valor);
    valor -= 0.15;
}

Output will be truncated numbers due to 8 byte storage for double, correct?

(...)
0.30000000000000016
0.15000000000000016
1.6653345369377348E-16

How do languages handle these truncation situations internally? The operation is simply stopped for that number, returning it to the point where the truncation occurred? What would be the sequence of the number 0.30000000000000016 were it not truncated?

  • I think it is duplicate even. If it is not the author can try to clarify. There is the answer to the question.

  • @bfavaretto, the answers helped a lot, but had already read some explanations -- such as binary representation; the fact that the number 0.1 becomes a periodic tithe; and the question of storage. However, my question is about how languages (if, in fact, it is up to them) treat this representation. I’m sorry, but I couldn’t think of a way to make the question more precise. When I formulated it, I didn’t think about why it was truncated, but about how it was executed. In PHP, for example, until the result 0.15 the numbers appear rounded, but at the last number -- that would be 0 --, it returns 1.665334...

  • Sorry Caesar, but I still don’t understand! Think about it: you have 1.5 represented in 64-bit binary, and 0.15 similarly represented. Put one under the other and subtract, as you would with a pencil decimal number, and you have the result in binary, with the possible precision in a 64-bit double.

  • So, theoretically, the more accurate a number, the more bits will have to be reserved? The 1.5 is already represented in binary in 64 bits 'originally'? I imagined that there would be a loss of information in the process, you understand?

  • Yes, 1.5 is represented in binary (but the format is more complicated than I indicated in the previous comment, see details in Miguel Angelo’s reply to the linked question as duplicate). For example, 0.1 is not representable in binary as float or double, then when you assign this value to a double variable, what you’re holding is an approximation. So there’s actually a loss of information, but it’s not in the calculation, it’s in the representation of the numbers by the computer.

  • Finally: if you think you can detail the question and make it distinct enough from what was considered duplicate, you can do this, and it will be subject to reopening (you need 5 votes to reopen).

Show 1 more comment
No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.