The structure of the compiler
Internally, the compiler is divided into several parts: Lexical analysis; syntactic analysis, semantic analysis, code generation and code optimization.
The first of these parts, the lexical analysis, is responsible for perforating the source code into tokens. For example, when writing public static void main(String[] args) {
, The lexical analyzer will see 11 different tokens: public
, static
, void
, main
, (
, String
, [
, ]
, args
, )
and {
. In addition, the lexical analysis already makes a basic classification of the token: public
, static
and void
are key words of the language; main
and String
are identifiers, (
, )
, [
, ]
and {
are special symbols. Identation and comments are discarded by lexical analysis and do not constitute tokens.
In the syntactic analysis, the tokens will be grouped so that the compiler tries to understand the program structure. At this stage, he will see which access modifiers (public
and static
) followed by one type void
, followed by a name main
, followed by a list of parameters in parentheses corresponds to a method declaration.
Parsing transforms the code into a tree-like structure, where below the node representing the class, we have nodes representing fields, constructors and methods. Within the nodes that represent methods, we have nodes that represent the type of return, the modifiers, the parameters, the exceptions and the body. Within each node that corresponds to the field of a method, we have several other nodes that correspond to each method statement.
Semantic analysis is the step responsible for verifying that the program obtained from the syntactic analysis makes sense, verifying that all the variables used have been declared and initialized, that all the methods called exist and have the parameters of the correct types, if there are no variables with repeated names in the same scope, etc.
The literals int
and long
Lexical analysis when finding a 9797
will issue a token literal-type int
and when finding a 9797L
will issue a token literal-type long
. The answer to your question is that differentiation is made in lexical analysis. Behold here the lexical specification of that Part.
So that the lexical analyzer can distinguish the literal int
of the literal long
, they decided they have the suffix L
or l
, then it’s a literal long
, if not, it’s a literal int
. This is a very simple and easy rule to understand.
It wouldn’t be any different?
It is true that they could do otherwise, but the compiler design is easier if the lexical parser can already separate the literals int
s of the literals long
s, at the cost of putting this detail in the language with the suffix l
or L
. The same is true of the literal float
that requires the suffix f
or F
to differentiate from the literal double
.
The need to have these literals expressed is justified in particular by the presence of autoboxing:
Object a = 555;
Object b = 555L;
System.out.println(a.getClass().getName()); // java.lang.Integer
System.out.println(b.getClass().getName()); // java.lang.Long
Without the suffix, build 555 as long
would require a cast.
If there were no such suffix L
, you would have to use this to create a long
without the suffix:
long y = (long) 922337203685477807;
But this does not work because the number 922337203685477807 is already out of the valid range for the int
, then you can’t build it before you cast it. It must necessarily be built as long
. There we have the suffix L
for that reason.
Could have made those numbers already long
by default, but then when using this:
int x = 555;
You’d have a problem because the literal is long
and the variable is int
. To solve this or you’d have to put a suffix on us int
s, which would be much worse (having to use 555i
instead of 555
), or would have to use explicit Casts for int
always, it would be horrible, since int
s are everywhere.
Another possibility would be for the compiler to do a contextual analysis to know if the number fits the int
or not. But this is not feasible. For example:
int f = 150096 * g - h / 5;
How to know if this fits or not in the int
without using Casts or specific suffixes? It’s even possible to do it, but this complicates the syntactic and semantic analysis of the compiler to solve a simple detail of the language. That is, it would make the compiler structure quite complicated.
Another possibility would be for the lexical analyzer to verify if the number is in the int
, emitting a token literal int
if you are or a literal long
if it is not. But it would have somewhat confusing side effects:
Object a = 2147483647; // java.lang.Integer
Object b = (long) 2147483647; // java.lang.Long - Tem que ter o cast
Object c = 2147483648; // java.lang.Long - Surpresa! Agora não precisa mais do cast.
In the case of byte
and of short
, there are literals for them, which is very boring and therefore Casts are always needed from int
, long
or char
. For example:
byte b = (byte) 123;
short s = (short) 1234L;
However, as the int
and the long
are larger than the byte
and the short
, cast can be used, unlike the case of long
to the int
.
In the case of float
and of double
, the reverse occurs because the smaller type requires the suffix, which frees the larger type from having to do it. Adopt the same regarding the int
and to the long
would not be practical because it would mean that the int
is that it would have to have the suffix (555i
).
Just to complement: values
float
has the same behavior. If you dofloat x = 10.0;
the10.0
will be a literaldouble
and not afloat
. It is necessary to makefloat x = 10.0F;
orfloat x = 10.0f;
for the value to be "truly"float
.– igventurelli
Did any of the answers solve your question? Do you think you can accept one of them? Check out the [tour] how to do this, if you haven’t already. You would help the community by identifying what was the best solution for you. You can accept only one of them. But you can vote on any question or answer you find useful on the entire site.
– Maniero