According to the documentation:
Strings are Compared based on standard lexicographical Ordering, using Unicode values.
That is, the lexicographic comparison is made, taking into account the Unicode code points of the string. To better understand what a code point is, read here.
But very briefly, each character (and this is not restricted to letters, but also to digits, spaces, punctuation marks, emojis, etc.) has an associated numerical value, called code point. When two strings are compared, the numeric values (code points) corresponding to each character are taken into account in the comparison.
In this case, the letter a
corresponds to the code point U+0061 (61 in hexadecimal, or 97 in decimal), and the letter b
, at the code point U+0062 (62 hexa, 98 decimal). That’s why the string 'a'
is considered "smaller" than the string 'b'
.
And that has nothing to do with string size:
console.log('abacate' < 'bola'); // true
console.log('abacate' < 'abra'); // true
According to the algorithm described in the language specification (item 3, if both operands are strings), what happens is that the first character of each string (the value of its code points) is compared. If they are equal, compare the second, and so on, until you find one that is different.
In the first case ('abacate' < 'bola'
), the first characters of each string are a
and b
and how a
is less than b
(the code point of a
is less than the code point of b
), then the string 'abacate'
is "smaller" than the string 'bola'
.
In the second case ('abacate' < 'abra'
), the first and second character of the strings are equal (both start with 'ab'
), but when we get to the third character, we have a
and r
, and how a
is "less" than r
(for the code point of a
is U+0061 and the r
is U+0072), then the string 'abacate'
is smaller than the string 'abra'
.
String size is only relevant in cases like this:
console.log('aba' < 'abacate'); // true
Remember that this is not restricted to letters, because each existing character has a code point. Then we can have things like:
console.log('' > '丵124'); // true
Because emoji "" also has a code point (U+1F4A9), whose value is greater than the code point of the character 丵
(U+4E35).
And remember a classic "trap", which is to compare strings that contain digits:
console.log('10000' > '2'); // false
As we are comparing strings, we take into account the code point of the characters, and the character 1
owns the code point U+0031, while the character 2
owns the code point U+0032, and therefore the string '10000'
is considered smaller than the string '2'
.
If you want to compare numerical values, you must transform the strings into numbers, for example using parseInt
:
console.log(parseInt('10000') > parseInt('2')); // true, pois agora são números, e não strings
Remember that Unicode also hides its own "traps":
console.log('á' < 'á'); // true
This happens because the first á
is in NFD, and the second is in NFC. To better understand what this is, I suggest you read here, here and here. But to summarize, Unicode defines two different ways of representing the letter a
high-pitched:
- composite form (NFC), as a single code point: the character itself
á
- decomposed form (NFD) as two code points: the character
a
(without accent) and the accent itself (code point U+0301)
But both, when shown on the screen, appear the same way (á
), and just "brushing the bits" of the strings to see how many code points there are there:
// mostrar codepoints da string
function codepoints(s) { return Array.from(s).map(c => c.codePointAt(0).toString(16)).join(' '); }
// string em NFD, possui 2 code points
console.log(codepoints('á')); // 61 301
// string em NFC, possui 1 code point
console.log(codepoints('á')); // e1
So the first string above actually has two code points, the first being the letter a
, which we have already seen is code point U+0061, but the first code point of the second string corresponds to the character á
, whose value is U+00E1, and so the first string is considered "smaller".
This can be solved by normalizing both to the same form (something like 'á'.normalize('NFC')
, for example), but what exactly to do will depend on each case.
There is still the method localeCompare
to compare strings according to a locale specific (that is, according to the rules of a given language, because this varies a lot: accented characters can come before or after the non-stressed, there are languages in which the alphabetical order is different, etc.). But I believe it already runs a little outside the scope of the question (anyway, you can see more details here).
I’ll edit your code, I’ll change it
print()
forconsole.log()
because it’s bothering.– Augusto Vasques
Yes! I had actually taken this code from the MDN website (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#Comparing_strings) and forgot to change it.
– marquinho