The structure of the compiler
Internally, the compiler is divided into several parts: Lexical analysis; syntactic analysis, semantic analysis, code generation and code optimization.
The first of these parts, the lexical analysis, is responsible for perforating the source code into tokens. For example, when writing public static void main(String[] args) {, The lexical analyzer will see 11 different tokens: public, static, void, main, (, String, [, ], args, )and {. In addition, the lexical analysis already makes a basic classification of the token: public, static and void are key words of the language; main and String are identifiers, (, ), [, ] and { are special symbols. Identation and comments are discarded by lexical analysis and do not constitute tokens.
In the syntactic analysis, the tokens will be grouped so that the compiler tries to understand the program structure. At this stage, he will see which access modifiers (public and static) followed by one type void, followed by a name main, followed by a list of parameters in parentheses corresponds to a method declaration.
Parsing transforms the code into a tree-like structure, where below the node representing the class, we have nodes representing fields, constructors and methods. Within the nodes that represent methods, we have nodes that represent the type of return, the modifiers, the parameters, the exceptions and the body. Within each node that corresponds to the field of a method, we have several other nodes that correspond to each method statement.
Semantic analysis is the step responsible for verifying that the program obtained from the syntactic analysis makes sense, verifying that all the variables used have been declared and initialized, that all the methods called exist and have the parameters of the correct types, if there are no variables with repeated names in the same scope, etc.
The literals int and long
Lexical analysis when finding a 9797 will issue a token literal-type int and when finding a 9797L will issue a token literal-type long. The answer to your question is that differentiation is made in lexical analysis. Behold here the lexical specification of that Part.
So that the lexical analyzer can distinguish the literal int of the literal long, they decided they have the suffix L or l, then it’s a literal long, if not, it’s a literal int. This is a very simple and easy rule to understand.
It wouldn’t be any different?
It is true that they could do otherwise, but the compiler design is easier if the lexical parser can already separate the literals ints of the literals longs, at the cost of putting this detail in the language with the suffix l or L. The same is true of the literal float that requires the suffix f or F to differentiate from the literal double.
The need to have these literals expressed is justified in particular by the presence of autoboxing:
Object a = 555;
Object b = 555L;
System.out.println(a.getClass().getName()); // java.lang.Integer
System.out.println(b.getClass().getName()); // java.lang.Long
Without the suffix, build 555 as long would require a cast.
If there were no such suffix L, you would have to use this to create a long without the suffix:
long y = (long) 922337203685477807;
But this does not work because the number 922337203685477807 is already out of the valid range for the int, then you can’t build it before you cast it. It must necessarily be built as long. There we have the suffix L for that reason.
Could have made those numbers already long by default, but then when using this:
int x = 555;
You’d have a problem because the literal is long and the variable is int. To solve this or you’d have to put a suffix on us ints, which would be much worse (having to use 555i instead of 555), or would have to use explicit Casts for int always, it would be horrible, since ints are everywhere.
Another possibility would be for the compiler to do a contextual analysis to know if the number fits the int or not. But this is not feasible. For example:
int f = 150096 * g - h / 5;
How to know if this fits or not in the int without using Casts or specific suffixes? It’s even possible to do it, but this complicates the syntactic and semantic analysis of the compiler to solve a simple detail of the language. That is, it would make the compiler structure quite complicated.
Another possibility would be for the lexical analyzer to verify if the number is in the int, emitting a token literal int if you are or a literal long if it is not. But it would have somewhat confusing side effects:
Object a = 2147483647; // java.lang.Integer
Object b = (long) 2147483647; // java.lang.Long - Tem que ter o cast
Object c = 2147483648; // java.lang.Long - Surpresa! Agora não precisa mais do cast.
In the case of byte and of short, there are literals for them, which is very boring and therefore Casts are always needed from int, long or char. For example:
byte b = (byte) 123;
short s = (short) 1234L;
However, as the int and the long are larger than the byte and the short, cast can be used, unlike the case of long to the int.
In the case of float and of double, the reverse occurs because the smaller type requires the suffix, which frees the larger type from having to do it. Adopt the same regarding the int and to the long would not be practical because it would mean that the int is that it would have to have the suffix (555i).
Just to complement: values
floathas the same behavior. If you dofloat x = 10.0;the10.0will be a literaldoubleand not afloat. It is necessary to makefloat x = 10.0F;orfloat x = 10.0f;for the value to be "truly"float.– igventurelli
Did any of the answers solve your question? Do you think you can accept one of them? Check out the [tour] how to do this, if you haven’t already. You would help the community by identifying what was the best solution for you. You can accept only one of them. But you can vote on any question or answer you find useful on the entire site.
– Maniero