How does a Java Virtual Machine written in Java work?

Asked

Viewed 418 times

16

Seeing the Jikes RVM I was curious to know how this works (in theory), but only found material in English.

I am correct in assuming that the JVM today is done in C/C++, which in turn is done in Assembler?

How is it possible for a language to "interpret itself"? This has already been done in other languages?

2 answers

13

It’s called Bootstrapping.

Compilers

Languages are only specifications. Though related, languages and compilers are different things.

Compilers and libraries form what the specification says. It is obvious that the first implementation of language needs to be written in another language. Then you can use the language itself to create a new implementation written on it.

Compilers are relatively basic algorithms, full of specific complexities, of course. It enters text data, it processes, and there’s complexity, and it generates a data, possibly binary, that a virtual or physical machine knows how to run. It’s just a transformation algorithm following specific rules. So they generate a program that can run and that can be a compiler, a virtual machine, an operating system, anything.

I understand the curiosity, but I find it strange that it seems like something very difficult to achieve. I think the only "secret" is knowing that the first compiler should be done in another language.

Actually some languages are made incrementally. It makes a compiler that treats the minimum, and adds functionality later. So you almost have an initial compiler in the language itself. Of course the first interactions of this development the language will be a little different from the desired one at the end, and a little limited.

Programming languages can produce anything, then it is no secret that a language produces a compiler for itself, as long as there is a first implementation.

Of course some languages are not the most suitable to produce compilers.

The best implementations of Java are actually written in C++. Java today doesn’t seem very suitable for producing compilers. It’s been worse. Java is compiled and not interpreted on its basis. It is possible to have an interpreter.

Interpreters

An interpreter is nothing more than a compiler that at the end instead of generating an executable, it already executes what has been parsed. An interpreter is an executable program like any other. But in this case the compiler generates a bytecode and this is what will be "interpreted" by the virtual machine.

Of course in case the interpreter runs himself he needs to be reentrant, what is not simple.

What I can guarantee is that there is an excerpt of code in another language, an excerpt that precisely does the bootstrap of the virtual machine. Reading the article in Wikipedia, it speaks of this:

A small C Loader is Responsible for loading the boot image at Runtime

The compiler, from the interpreter, is separated from the virtual machine, even if it is in the same executable. The virtual machine does not interpret Java, it plays a bytecode, and from what I understand this Jikes doesn’t even use the bytecode pattern.

Obviously I don’t have details of this project and I can’t say about every specificity of how it works.

More information

C and C++ compilers have been written in C++ for quite some time. Some still have a good part written in C. Someone can do in Assembly (Assembler is not the correct name), but for years no one seriously does this.

C# today has its compiler written in C# and works better than the original written in C++. The Runtime is written basically in C++, but there are experiments in C# with improving results, but below the desired.

I’ve talked about it in The first programming language.

  • I thought about writing something like this and I already had the answer half-written, but there’s a problem. He’s talking about an interpreter, not a compiler. With that, it gets a little more complicated.

  • @Victorstafusa because it gets more complicated?

  • 2

    Because you have to make a program interpret itself without being running on another interpreter, which is what Jikesrvm says to do. The part about not running on another interpreter is that it complicates, according to their website "i.e., its Java code runs on itself without requiring a Second virtual machine.".

  • @Victorstafusa did not know this, I will see more and I will improve. You can also answer if you have something that specifically talks about it. But at first I don’t see any problems either.

  • This idea of Bootstrapping without needing another VM, it’s like Roslyn does, it’s not?

  • 1

    @jbueno actually no, I edited and talked about it. There they talk that I get better.

  • @Well, I posted my additional answer.

Show 2 more comments

12


First, to make a compiler of the X language written itself in the X language, do this:

  1. Using the Y language, encode and Compile a C compiler1 for the X language, which produces executable code on the P platform. Compile it with the previously existing K compiler of the Y language.

  2. Using the X language, encode and Compile a C compiler2 for the X language, which produces executable code on the P platform. Compile it with the C compiler1 of the X language. Note that this will produce a compiler of the X language written in the X language itself.

  3. Recompile C2 using C itself2, generating a C compiler2b.

  4. Make sure that C2 is identical to C2b (or at least there is no difference you care about). If not, adjust the C compiler codes1 and C2 until.

  5. Throw away C1.

You can do this process several times to from the X1 language, build the X2 language, from the X2 build the X3, etc. That’s why javac and eclipsec, the only two mature and active Java compilers currently, are made themselves in Java. This process is called bootstrap.

However, the case here is with a interpreter, and not a compiler. But the reasoning is similar. Jikesrvm needs a small minimal C-loader to start the bootstrap process. Everything else that is not essential to start it all is done in Java. The idea is that all you can nay be done in C, which is done in Java, leaving in C only that for which there is no way to implement in Java.

However, most publicly available Vms have large parts developed in C and C++ even, mainly for performance, memory consumption, and code criticality issues. On the other hand, Jikesrvm is not a commercial JVM, and is therefore only used in specific niches, since its goal is not to compete with other Jvms in running user programs.

Also, this concept is not very new no. LISP is a language that does this since it was conceived. Many LISP interpreters are written in LISP itself, having only a small minimal part responsible for the most basic functionalities written in some other language.

  • I even understand an application that speaks "direct" with the OS can create a compiler using itself or a Java that manages one . class (a Java program that reads a .java and Compile), but I can’t visualize a Java-made Virtual Machine that doesn’t need JVM.

  • 2

    Marcus, I didn’t see it either, so I went to look it up. The balcony is in the small core in C++. It’s actually a hybrid JVM, with parts written in C++ and parts in Java (in fact, all Jvms are like this, Java hybrids with a native language that is almost always C++). The only thing is that in the case of Jikesrvm, they dried the C++ part until they couldn’t get anything else out.

  • 1

    Removing the whole C++ would be impossible, as it would mean that the execution would have to start in Java, and that this would require another JVM.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.