What is a programming language, after all?
To understand this whole subject, you need to understand the concept of programming language. Whether BASIC, FORTRAN, Java, Ruby, C# or Crystal, a programming language is a notation to express an algorithm.
It is like a spoken language, each one has its specificities, but they all share one characteristic: communication. One can change the grammar, the alphabet, the origin, the tool used to speak (such as the voice or one’s own hands), but communication is maintained as an objective.
Note that a programming language is only notation, not execution. And that’s why nay there are compiled languages and interpreted languages. What exists are implementations of languages compiled and other interpreted.
Yes, but what is the implementation of language?
All this confusion is because every programming language comes with an implementation (or at least should, to be useful). When we say, "I spin Ruby on my machine," we actually probably mean, "I write Ruby code, but I run Cruby on my machine".
The default Ruby implementation is Cruby/MRI, which is an interpreter. This default implementation is called Reference implementation. The same thing happens in Python, which actually has its Reference implementation calling for Cpython (do not confuse with Cython).
This is a little clearer to see in the Java world. The name of one of the language versions is Java SE 12, and its Reference implementation is Openjdk 12. There are other implementations of the Java SE 12 language, such as GNU Compiler for Java.
And even clearer in Javascript. By the way, what is Javascript? It’s just one stack of text documents defining its specification. From this specification, they are written Engines like the V8 from Chrome and the Spidermonkey from Firefox.
Like the @LINQ mentioned in the comments, all language is implemented in another language. I would add that every language is implemented in one (other) execution system. Be it a compiler or interpreter. Be it GCC or JVM.
And what is the advantage of these alternative implementations?
There varies from Reference implementation for Reference implementation. Cruby and Cpython, for example, have no support for real competition. They support multi-thread code, but these threads do not run at the same time, because of GIL (Global Interpreter Lock). By implementing these languages in the JVM, such as Jython and Jruby, you can get real competition by maintaining language notation.
If you need to interoperate Python and Java code for some reason, it might be worth using a Python implementation in the JVM, as it would run Java code natively.
You can also do some things that still break my mind just thinking about:
And to break the mind even more, I ran this script written in Ruby, running in Jruby, compiled or interpreted in the JVM, in Ubuntu with the Windows kernel (WSL).
And how it all works?
It’s a huge rewrite. In the case of Jruby, all the code that was previously C was ported to Java. Cruby also has Ruby code, in which case this code can be preserved, since it will now call Java dependencies. This is why Gems written and tested on Cruby have great chances of being 100% compatible with Jruby.
And as I said, it’s rewritten (or rather reimplementation). Here’s the method String#upcase!
in Jruby:
private IRubyObject upcase_bang(ThreadContext context, int flags) {
modifyAndKeepCodeRange();
Encoding enc = checkDummyEncoding();
if (((flags & Config.CASE_ASCII_ONLY) != 0 && (enc.isUTF8() || enc.maxLength() == 1)) ||
(flags & Config.CASE_FOLD_TURKISH_AZERI) == 0 && getCodeRange() == CR_7BIT) {
int s = value.getBegin();
int end = s + value.getRealSize();
byte[]bytes = value.getUnsafeBytes();
while (s < end) {
int c = bytes[s] & 0xff;
if (Encoding.isAscii(c) && 'a' <= c && c <= 'z') {
bytes[s] = (byte)('A' + (c - 'a'));
flags |= Config.CASE_MODIFIED;
}
s++;
}
} else {
flags = caseMap(context.runtime, flags, enc);
if ((flags & Config.CASE_MODIFIED) != 0) clearCodeRange();
}
return ((flags & Config.CASE_MODIFIED) != 0) ? this : context.nil;
}
And already in the Cruby:
rb_str_upcase_bang(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_UPCASE;
flags = check_case_options(argc, argv, flags);
str_modify_keep_cr(str);
enc = STR_ENC_GET(str);
rb_str_check_dummy_enc(enc);
if (((flags&ONIGENC_CASE_ASCII_ONLY) && (enc==rb_utf8_encoding() || rb_enc_mbmaxlen(enc)==1))
|| (!(flags&ONIGENC_CASE_FOLD_TURKISH_AZERI) && ENC_CODERANGE(str)==ENC_CODERANGE_7BIT)) {
char *s = RSTRING_PTR(str), *send = RSTRING_END(str);
while (s < send) {
unsigned int c = *(unsigned char*)s;
if (rb_enc_isascii(c, enc) && 'a' <= c && c <= 'z') {
*s = 'A' + (c - 'a');
flags |= ONIGENC_CASE_MODIFIED;
}
s++;
}
}
else if (flags&ONIGENC_CASE_ASCII_ONLY)
rb_str_ascii_casemap(str, &flags, enc);
else
str_shared_replace(str, rb_str_casemap(str, &flags, enc));
if (ONIGENC_CASE_MODIFIED&flags) return str;
return Qnil;
}
All language is implemented in another language
– Jéf Bueno
@LINQ every byte is scrubbed to generate more bytes :D
– nullptr