Interpreting
An interpreted language executes the code directly from the source code.
Interpretation occurs in a similar way to compilation (translation), that is, it has a process of syntactic, lexical and semantic analysis, but this is done on demand. The source code is being read (can be line by line or otherwise) and interpreted with these processes and then something is executed according to what is written.
Java
Java worked like this in early versions.
There is still some confusion with this because there is still a process of "interpreting" the code generated by the compiler. But normally it is not considered as an interpreted code since even this "interpretation" does not occur instruction by instruction.
To better understand, we have to observe that a Java code goes through the same analysis processes mentioned above, what changes is the way it goes through the code and what it does at the end, which is what differentiates the interpretation of the compilation:
The interpretation takes place in short excerpts of the program, can be line by line, and at the end performs something that was determined in this excerpt.
Compilation takes place in larger parts (functions, classes, packages) trying to understand the whole and in the end a code is generated. There is a translation to another form.
In this case it is a code from the Java virtual machine (JVMtaste of link the page in Portuguese, but do not forget to see in English, it is always better). It’s like the machine code a computer understands but it’s specific to the Java platform and not to a processor. So the program is compiled but cannot run directly on the processor as with languages like C or Pascal that normally create directly understandable code for the processor.
So this virtual machine code that’s called bytecode is compiled as well, but it is an extremely simple process, it is in an easy format to be read and understood by this new compiler completely different from the compiler of the source code of the language. Besides not having to worry about whether the code is correct or not, this has been done before. And mainly this compilation does not occur instruction by instruction.
This is done by a JIT compiler (Just-in-Time) which is a compiler that generates the processor’s machine code, called native code. In the case of Java this Jitter transforms the bytecode in native code doing some optimizations that are only possible when you know well the environment that is running, not only the computer, operating system, settings, but also the other components (Packages) that are being used together.
This JIT compilation understands all intermediate code and generates native code on demand as it becomes necessary. But there is a way to force this compilation to occur a little earlier.
This Jitter did not exist in early versions of Java. Normally Jitter does not influence the semantics of the language so any language previously interpreted or compiled for a bytecode can be Jittada later. In fact this is increasingly common. We can cite as examples Javascript, Lua, PHP, etc. that have passed Jittadas later in independent implementations.
Jitter usually just has to understand this bytecode default and the processor code where it will run, you do not need to know anything of the language. But there are Jitters that work on top of the source code, so in a certain way there is a build on demand (at the time it will run) as opposed to the best known early compilation. But even this on-demand compilation is not an interpretation because it generates code to be executed and does not run directly.
Compiled languages without machine code
There are languages that strictly cannot be considered as interpreted. It executes over the bytecode (sometimes called pseudocode) but are not Jittadas. The execution is faster than the pure interpretation but not as much as the Jittada, because in a certain way there is an "interpretation" of this bytecode and it runs directly, without transformation into native code. Moon (pure, without the Luajit) is an example today.
This is not new. One of the first languages mainstream which was very successful in several parts of the world, including Brazil, was the Clipper (a dialect that survives in a modern way is the Harbour). It worked this way but as it generated an executable many programmers believed it generated code equal to C. But it was only one pcode encapsulated in the .exe
. It’s similar to what . NET does today. Its programs seem to be in a native executable, but internally it has the bytecode.
But this technique has been around since the 1950s.
There are languages that do not generate a bytecode and yes a AST (Abstract Syntax Tree or abstract syntax tree). It is a step before code generation. A compiler normally (in virtually all known implementations) generates an AST after parsing and lexical processes and the other subsequent processes occur on top of this tree. Ruby default uses (or used, I may be outdated) this AST to run. The interpretation still occurs in the AST, but it is not the normal process of interpretation. Anyway there was a previous compilation process.
Of course there are implementations of Ruby that work differently, including because they run on top of the Java platform, that is, in the end the same bytecode which is generated in Java is generated by jruby and then it is Jitted by JVM. This shows the flexibility of this JIT infrastructure.
Some people consider that these languages are still interpreted (or semi-interpreted) since they do not execute native machine code, there is a lighter interpretation because part of the necessary process was done before by a compiler and something simple to manipulate was generated with the "guarantee" that has no errors. But it takes a program that understands this code and has something executed indirectly. This would be an interpretation.
What I remember from the beginning of Java was like this, I think there was never the interpretation of direct source code. I mean, always had the javac
and the JVM interpreted the bytecode.
So it’s quite complicated to classify languages or even implementations as interpreted or compiled.
Languages are not interpreted
We cannot say that there are languages interpreted or compiled or even Jittadas. At most we can say that implementations have these characteristics. And they are not mutually exclusive. Although some people will say that they are different implementations provided together, it is possible to say that the three forms may exist in the implementation.
Completion
Obviously the execution of an interpreted program is much slower than a compiled program that has its machine code generated in advance. In the case of the code Jittado has a cost to generate machine code but is a much lower cost than direct interpretation. Besides this is done once and then the machine code is always reuse.
Pure interpretation today only makes sense in developmental time or to perform scripts very short. Hence any language used to make systems must have some form of compilation, even if optional.
++1 You have no idea how this answer has helped me, eternally grateful.
– Renan Gomes
@bigown you are graduated in Java?
– Marcus Becker
@Marcusbecker as it forms in Java?
– Maniero