How do alternative language implementations work, such as Python in the JVM?

Asked

Viewed 187 times

12

I always see people talking about implementations of X languages in another Y language. Like for example:

  • Jruby, a Ruby implementation in Java
  • Jython, an implementation of Python in Java
  • Ironpython, an implementation of Python in . NET
  • Rhino, an implementation of Javascript in Java

How does this work? What advantages are gained by "implementing" one language into another? Is it really an implementation in another language, or just another platform, compiler or something? What is the name of this technique?

  • 1

    All language is implemented in another language

  • @LINQ every byte is scrubbed to generate more bytes :D

3 answers

5

What is a programming language, after all?

To understand this whole subject, you need to understand the concept of programming language. Whether BASIC, FORTRAN, Java, Ruby, C# or Crystal, a programming language is a notation to express an algorithm.

It is like a spoken language, each one has its specificities, but they all share one characteristic: communication. One can change the grammar, the alphabet, the origin, the tool used to speak (such as the voice or one’s own hands), but communication is maintained as an objective.

Note that a programming language is only notation, not execution. And that’s why nay there are compiled languages and interpreted languages. What exists are implementations of languages compiled and other interpreted.

Yes, but what is the implementation of language?

All this confusion is because every programming language comes with an implementation (or at least should, to be useful). When we say, "I spin Ruby on my machine," we actually probably mean, "I write Ruby code, but I run Cruby on my machine".

The default Ruby implementation is Cruby/MRI, which is an interpreter. This default implementation is called Reference implementation. The same thing happens in Python, which actually has its Reference implementation calling for Cpython (do not confuse with Cython).

This is a little clearer to see in the Java world. The name of one of the language versions is Java SE 12, and its Reference implementation is Openjdk 12. There are other implementations of the Java SE 12 language, such as GNU Compiler for Java.

And even clearer in Javascript. By the way, what is Javascript? It’s just one stack of text documents defining its specification. From this specification, they are written Engines like the V8 from Chrome and the Spidermonkey from Firefox.

Like the @LINQ mentioned in the comments, all language is implemented in another language. I would add that every language is implemented in one (other) execution system. Be it a compiler or interpreter. Be it GCC or JVM.

And what is the advantage of these alternative implementations?

There varies from Reference implementation for Reference implementation. Cruby and Cpython, for example, have no support for real competition. They support multi-thread code, but these threads do not run at the same time, because of GIL (Global Interpreter Lock). By implementing these languages in the JVM, such as Jython and Jruby, you can get real competition by maintaining language notation.

If you need to interoperate Python and Java code for some reason, it might be worth using a Python implementation in the JVM, as it would run Java code natively.

You can also do some things that still break my mind just thinking about:

And to break the mind even more, I ran this script written in Ruby, running in Jruby, compiled or interpreted in the JVM, in Ubuntu with the Windows kernel (WSL).

And how it all works?

It’s a huge rewrite. In the case of Jruby, all the code that was previously C was ported to Java. Cruby also has Ruby code, in which case this code can be preserved, since it will now call Java dependencies. This is why Gems written and tested on Cruby have great chances of being 100% compatible with Jruby.

And as I said, it’s rewritten (or rather reimplementation). Here’s the method String#upcase! in Jruby:

private IRubyObject upcase_bang(ThreadContext context, int flags) {
    modifyAndKeepCodeRange();
    Encoding enc = checkDummyEncoding();
    if (((flags & Config.CASE_ASCII_ONLY) != 0 && (enc.isUTF8() || enc.maxLength() == 1)) ||
            (flags & Config.CASE_FOLD_TURKISH_AZERI) == 0 && getCodeRange() == CR_7BIT) {
        int s = value.getBegin();
        int end = s + value.getRealSize();
        byte[]bytes = value.getUnsafeBytes();
        while (s < end) {
            int c = bytes[s] & 0xff;
            if (Encoding.isAscii(c) && 'a' <= c && c <= 'z') {
                bytes[s] = (byte)('A' + (c - 'a'));
                flags |= Config.CASE_MODIFIED;
            }
            s++;
        }
    } else {
        flags = caseMap(context.runtime, flags, enc);
        if ((flags & Config.CASE_MODIFIED) != 0) clearCodeRange();
    }

    return ((flags & Config.CASE_MODIFIED) != 0) ? this : context.nil;
}

And already in the Cruby:

rb_str_upcase_bang(int argc, VALUE *argv, VALUE str)
{
    rb_encoding *enc;
    OnigCaseFoldType flags = ONIGENC_CASE_UPCASE;

    flags = check_case_options(argc, argv, flags);
    str_modify_keep_cr(str);
    enc = STR_ENC_GET(str);
    rb_str_check_dummy_enc(enc);
    if (((flags&ONIGENC_CASE_ASCII_ONLY) && (enc==rb_utf8_encoding() || rb_enc_mbmaxlen(enc)==1))
        || (!(flags&ONIGENC_CASE_FOLD_TURKISH_AZERI) && ENC_CODERANGE(str)==ENC_CODERANGE_7BIT)) {
        char *s = RSTRING_PTR(str), *send = RSTRING_END(str);

        while (s < send) {
            unsigned int c = *(unsigned char*)s;

            if (rb_enc_isascii(c, enc) && 'a' <= c && c <= 'z') {
                *s = 'A' + (c - 'a');
                flags |= ONIGENC_CASE_MODIFIED;
            }
            s++;
        }
    }
    else if (flags&ONIGENC_CASE_ASCII_ONLY)
        rb_str_ascii_casemap(str, &flags, enc);
    else
        str_shared_replace(str, rb_str_casemap(str, &flags, enc));

    if (ONIGENC_CASE_MODIFIED&flags) return str;
    return Qnil;
}

3

Essentially, all of this has already been answered in other questions, and some things in the answer are somewhat ill-defined. So I suggest first of all read the following questions/answers:

How it works?

This is a little broad to answer in detail, but it has nothing very special and I believe that the links above answer all this well. Besides what "how it works" is a little vague, actually the question as a whole is.

What advantages are gained by "implementing" one language into another?

It seems to me that this is not what I wanted to ask here. See answer below.

The biggest advantage is to run on that platform if the goal was to run on a different platform. And possibly have interoperability within this platform.

If it’s an implementation on the same platform, it could be the second big reason that the implementation does something better that previous implementations couldn’t or couldn’t do well, for example it might not perform, it might not have access to certain Apis, or only have a certain interoperability.

Although the example quoted in the answer is generally plausible this is not even a fundamental thing because the language was not made thinking about it and will not be able to use it in the best way possible, unless it becomes a dialect. And in fact it’s common for this kind of thing to become dialect and the code written on one platform not to run on another implementation.

It’s just doing another implementation. Some alternative implementations may have advantages, others may not, depending on what was done.

It’s actually an implementation in another language, or just another platform, compiler or something?

From what I understand you’re asking about implementing on another platform. And you’re using a different compiler. Interestingly in some cases you wouldn’t even need a different compiler, just swap backend of it, that is, the generator, but for some reason they preferred to exchange everything, has some reasons to do this.

What is the name of this technique?

You’re confused about what technique you’re talking about, if you’re going to use a language to write the compiler and maybe other parts of what makes up the language is called Bootstrapping, also already answered. If creating an alternative implementation has no specific term, we usually say that you are carrying the implementation when we actually take an implementation and move to another platform or we’re just doing an alternative implementation without even being carrying it, we’re just realising a specification from scratch.

The answer talks about rewriting, but this may not be the best term. It may be, but it may just be a new writing, and in some cases not even have the rewrite, just a slight adaptation, either in the compiler, or in the standard library that seems to be what you posted in the answer.

0

On Rhino’s website she says the following:

Rhino is an open-source implementation of Javascript Written entirely in Java. It is typically Embedded into Java Applications to provide scripting to end users. It is Embedded in J2SE 6 as the default Java scripting engine.

In this case the implementation aims to provide a form of Javascript scripting for Java programs. It’s a way to extend the host language in this case.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.