79
How to invert a string in Javascript? For example:
Entree:
"something here"
Exit:
"iuqa asioc amugla"
79
How to invert a string in Javascript? For example:
Entree:
"something here"
Exit:
"iuqa asioc amugla"
73
Do so:
'texto aqui'.split('').reverse().join(''); // 'iuqa otxet'
More examples:
'♕ ♖ ♗ ♘'.split('').reverse().join(''); // '♘ ♗ ♖ ♕'
'サブリミナルメッセージ'.split('').reverse().join(''); // 'ジーセッメルナミリブサ'
'✓ ✔ ✕ ✖ ✗ ✘ ✙ ✚ ✛ ✜ ✝ ✞'.split('').reverse().join(''); // ✞ ✝ ✜ ✛ ✚ ✙ ✘ ✗ ✖ ✕ ✔ ✓
'STACKOVERFLOW'.split('').reverse().join(''); // WOLFREVOKCATS
'▁▂▃▄▆▇█'.split('').reverse().join(''); // █▇▆▄▃▂▁
'ab'.split('').reverse().join(''); // ba
63
Both Gabriel Santos' and mgibsonbr’s answers are wrong (although they work well in some limited cases that end up being the most common).
Why are they wrong?
Unicode. In ancient times (ASCII), each character occupied a constant space, but this is no longer true.
Javascript uses the UTF-16 system, in which what most people associate the character to (a grapheme) is divided into "code points", which in turn are divided into "code Units".
A character can have one or more code points and each code point can have one or two Units. The first is the code point "main" and the following (optional) are modifiers that add something to the character (example: accents). When a code point occupies two code Units in UTF-16, it is said that the first is the "High Surrogate" (or lead surrogate) and the second is the "Low Surrogate" (or Trail surrogate). When a code point occupies a single Unit code it is said to be in the Basic Multilingual Plane (BMP).
Example of a character that fails in the answers presented: the character . The character I put up consists of two code points: The character and the modifier . The code point, in turn, occupies two Units code.
If you calculate the inverse of "Hello !" with the other functions you will get ! Lo, which is clearly the wrong result.
The solution
The complete solution (jsfiddle) is the following:
// Procurei na norma do Unicode e não encontrei informação
// suficiente sobre como lidar com strings que comecem em
// combining code points.
var ReversalMode = {
//Assume uma string "normal"
'PERMISSIVE': 0,
// Substitui um caracter pelo <?> (U+FFFD REPLACEMENT CHARACTER)
// quando não o reconhece.
'STRICT': 1,
// Por defeito, usar o modo STRICT.
'DEFAULT': 1,
// Semelhante a STRICT, mas pode acrescentar code points ao inicio
// de uma string se esta começar por um combining.
'COMPLETE_COMBINING': 3,
// Semelhante a COMPLETE_COMBINING, mas sem substituir caracteres
// inválidos pelo <?>
'PERMISSIVE_COMPLETE_COMBINING': 2
};
String.prototype.isHighSurrogate = function () {
var charCode = this.charCodeAt(0);
return charCode >= 0xD800 && charCode <= 0xDBFF;
}
String.prototype.isLowSurrogate = function () {
var charCode = this.charCodeAt(0);
return charCode >= 0xDC00 && charCode <= 0xDFFF;
};
String.prototype.isCombining = function () {
if (this.length != 1) {
//Todos os caracteres de combinação estão no BMP
return false;
}
var codePoint = this.charCodeAt(0);
//Combining Diacritical Marks
if (codePoint >= 0x0300 && codePoint <= 0x036F)
return true;
//Combining Diacritical Marks Supplement
if (codePoint >= 0x1DC0 && codePoint <= 0x1DFF)
return true;
//Combining Diacritical Marks for Symbols
if (codePoint >= 0x20D0 && codePoint <= 0x20FF)
return true;
//Combining Half Marks
if (codePoint >= 0xFE20 && codePoint <= 0xFE2F)
return true;
return false;
}
String.prototype.codePoints = function (strictMode) {
var codePoints = [];
var currentPoint = '';
for (var i = 0; i < this.length; ++i) {
var currentUnit = this[i];
if (currentUnit.isHighSurrogate()) {
if (currentPoint.length !== 0 && strictMode) {
codePoints.push('\uFFFD');
currentPoint = '';
}
else {
currentPoint += currentUnit;
}
} else {
if (currentUnit.isLowSurrogate() && strictMode &&
currentPoint.length !== 1) {
codePoints.push('\uFFFD');
currentPoint = '';
} else {
currentPoint += currentUnit;
codePoints.push(currentPoint);
currentPoint = '';
}
}
}
if (currentPoint !== '') {
if (strictMode) {
codePoints.push('\uFFFD');
}
else {
codePoints.push(currentPoint);
}
}
return codePoints;
}
String.prototype.chars = function (strictMode) {
var chars = [];
var codePoints = this.codePoints(strictMode);
var currentChar = '';
for (var i = 0; i < codePoints.length; ++i) {
var codePoint = codePoints[i];
if (!codePoint.isCombining() && currentChar != '') {
chars.push(currentChar);
currentChar = '';
}
currentChar += codePoint;
}
if (currentChar !== '') {
// Tecnicamente, esta verificação não é necessária para o
// reverse, porque o join('') trata disso, mas para outros
// usos de chars() isto evita que nalguns casos fique uma
// string vazia no final do array (quando o último elemento
// não é um combinador)
chars.push(currentChar);
}
return chars;
}
String.prototype.reverse = function (reversalMode) {
if (reversalMode === undefined) {
reversalMode = ReversalMode.DEFAULT;
}
var chars = this.chars(reversalMode & ReversalMode.STRICT);
if (chars.length > 0 && chars[0].isCombining()) {
switch (reversalMode) {
case ReversalMode.COMPLETE_COMBINING:
case ReversalMode.PERMISSIVE_COMPLETE_COMBINING:
chars[0] = '\u00A0' + chars[0];
break;
case ReversalMode.STRICT:
chars[0] = '\uFFFD';
break;
}
}
return chars.reverse().join('');
}
To use the above code, use "String a inverter".reverse()
.
Nice response! Helped me better understand the UTF-16.
Extraordinary answer. Very good your explanation, I did not know anything of what you said, added much to me!
"Both Gabriel Santos' and mgibsonbr’s answer are wrong" wrong? Would you say that Newton’s laws are "wrong" (in the face of Einstein’s relativity)? I would say "limited" - in the sense that they are correct answers if the domain is strings in the BMP using only pre-compound characters. In any case, your answer is indeed more complete, +1 and I will include a link to your answer from mine.
@mgibsonbr One thing is either wrong or right. The other answers are right in some cases, and wrong in others. If I answer that "x + 1 = 2", I will be wrong whenever x != 1. In fact, I would say that Newton’s laws are wrong. They’re useful and they work in some cases, but they’re wrong. I apologize if I offended anyone, but it seemed to me a direct and effective way to convey my message (which is: do not ignore the unicode).
@luiscubal Not offended, after all in an exact science there is no middle ground. However, something that is true in a domain may not necessarily be true in a wider domain (e.g., there is no root of -1
if the domain is Real Numbers).
@mgibsonbr Certo. There are scenarios where the other answers work perfectly (example: interaction with an old system that will never be updated and only supports the English language). In such cases using my answer is unnecessary to the point of being ridiculous.
Just out of curiosity, does that answer support UTF-32? If not, since it has lent itself to writing one that supports UTF-16, it could at least make it clear that it is not suitable for 32. Actually I’m not sure if Asians use UTF-32.
@Emersonrochaluiz Javascript does not use UTF-32. If used, the function codePoints
became simpler (that’s where it was just one split('')
), to isHighSurrogate
disappeared, the isCombining
missed the length check, and the rest was the same. In UTF-32 code point = code Unit.
Answer too much mass. But, is it just for me that the Unicode characters in the answer post (for example, this: ) are not correctly displayed in Chrome? (in Firefox and IE they appear correctly...)
@Luizvieira Not all fonts(fonts) support all letters. And the less common the letter, the less likely it is to be supported. Letters outside the BMP are very unusual.
This answer is correct. For historical reasons, HFS filenames+ sane normalized with a kind of NFD, instead of NFC like the rest of the world. If in OS X you pass a filename to a script, node hello-world.js "Olá.txt"
, this "Hello.txt" has an "a" followed by an acute composer accent (U+0302) (the Terminal atrophies with this).
@Leandroamorim I’m not sure he has the right answer, since the matching characters appear afterward of the character they change. I tried this but Firefox and C#+WPF give different results. But other than that, I think the best approach would be a ́a ́a
. Accents are applied to space, not to a
, so put them in a
in the reversed version would be incorrect.
The junction of a diacritic+letter ́a
generates the feat for the viewer other than the letter+diacritic á
that in this second case the letter would appear accented. So how should the script that reverses the letters behave? it must insert a space between the letter and the diacritic a ́
for the accent to remain outside the letter or it must be accepted that the previously unstressed letter has the accent?
@luiscubal just deleted the previous comment to make it clearer.
Then the ideal to transform the ́a ́a ́a
in a ́ a ́ a ́
? or turn into a ́a ́a ́
?
@Leandroamorim But I don’t always have this behavior. In Firefox, it seems to ignore the first accent (although it has an inconsistent behavior -- sometimes it creates an accent space). In WPF, it makes the first a
has two accents (but looks very different from á́)
Hmmm... " In the Unicode Standard, all sequences of Character codes are permitted. This does not create an Obligation on implementations to support all possible Combinations equally well. " (pag. 42, Unicode Standard 6.2) Translation "In Unicode, all code point sequences are allowed. However, implementations are not required to support all possible combinations equally well"
But right on the next page: "all Combining characters are o be used in Sequence following the base characters to which they apply." Translation: "all matching characters are to be used in sequence after the base characters to which they are applied." Another section mentions code point A0, which can be used for when you want isolated combinatorial characters.
@Leandroamorim, note that the inverse of "\u0301a"
as much can be "a \u0301"
in an attempt to preserve the accent, as may be "a"
by discarding combinatorial characters that do not apply to anything. I now think that "a \u0301"
must, without doubt, become " \u0301a"
.
I would personally assume "Garbage in, Garbage out" (garbage comes in, garbage comes out). That is, if there is a matching character at the beginning of the string - so combining nothing - having that character at the end of the string even if it affects any letter (i.e. change the graphemes) is "good enough". I don’t think there is any correct behavior in this case, so I would keep one that changes the string as little as possible (i.e. I am against entering a new code point - " "
- or remove an existing code point).
@Leandroamorim @mgibsonbr New jsfiddle, I hope it fits. "Garbage-in Garbage-out" that adds nothing is the mode PERMISSIVE
, the "complete" character is the COMPLETE_COMBINING
. Replace all invalid with is STRICT
.
28
There are several ways of implementing the Reverse function and some have higher performance according to the browser.
1. Decremental loop with concatenation
function reverse(s) {
var o = '';
for (var i = s.length - 1; i >= 0; i--) {
o += s[i];
}
return o;
}
2. Incremental/Decremental loop with two arrays
function reverse(s) {
var o = [];
for (var i = s.length - 1, j = 0; i >= 0; i--, j++) {
o[j] = s[i];
}
return o.join('');
}
3. Incremental loop with push and charAt array
function reverse(s) {
var o = [];
for (var i = 0, len = s.length; i <= len; i++)
o.push(s.charAt(len - i));
return o.join('');
}
4. Native functions
function reverse(s) {
return s.split('').reverse().join('');
}
5. Loop with while
concatenating and substring
function reverse(s) {
var i = s.length,
o = '';
while (i > 0) {
o += s.substring(i - 1, i);
i--;
}
return o;
}
6. Simple Loop Declaration using for
concatenating
function reverse(s) {
for (var i = s.length - 1, o = ''; i >= 0; o += s[i--]) { }
return o;
}
7. Recursiveness with substring
and charAt
function reverse(s) {
return (s === '') ? '' : reverse(s.substr(1)) + s.charAt(0);
}
8. Recursion with internal function
function reverse(s) {
function rev(s, len, o) {
return (len === 0) ? o : rev(s, --len, (o += s[len]));
};
return rev(s, s.length, '');
}
9. Intermediary index exchange using for
function reverse(s) {
s = s.split('');
var len = s.length,
halfIndex = Math.floor(len / 2) - 1,
tmp;
for (var i = 0; i <= halfIndex; i++) {
tmp = s[len - i - 1];
s[len - i - 1] = s[i];
s[i] = tmp;
}
return s.join('');
}
10. Intermediate index exchange using recursion
function reverse(s) {
if (s.length < 2) {
return s;
}
var halfIndex = Math.ceil(s.length / 2);
return reverse(s.substr(halfIndex)) +
reverse(s.substr(0, halfIndex));
}
Performance
Using the tool Jsperf, has been compared to the performance of each function. Arriving at the following result
Chrome 15 - Implementation 1 and 6 Firefox 7 - Implementation 6 IE 9 - Implementation 4 Opera 12 - Implementation 9
Completion
Apparently the best implementation is the first, as it showed acceptable results in all browsers. The ninth implementation has good results and is suitable for very long strings.
Beautiful answer!!
Good answer, all right! On the chart, even if the bars are displayed from left to right with the option values from 1 to 10, it might be interesting to put a color label to make the information clearer and easier to compare. :)
14
Note: this answer assumes strings in BMP without matching characters (only pre-compounds) - which in many practical applications is good enough. For a more complete solution, see the response of @luiscubal.
If performance is not crucial, the simplest solution would be the response of @Gabriel Santos. Otherwise, seeing this test in jsperf, I would say that the most efficient means is:
function str_reverse2(str) {
var result = '',
length = str.length;
while (length--) {
result += str[length];
}
return result;
}
P.S. Counter-intuitively, I have observed that simple string concatenation operations perform better in practice than the recommended technique of accumulating smaller strings in an array and then doing join
. I’ve noticed this in other situations too.
+1 because the unused options of split/reverse/join
are even faster. But I would still use the other mode on something other than really required performance, such as a loop
@Emersonrochaluiz I agree! If the bottleneck of the application is not there, why complicate it? 90% of the time the simplest/readable/concise solution will be the best.
9
While I know this question has good answers, both for the usual case and for more exotic cases involving surrogate pairs. I think it’s fair to mention that there are libraries to solve this problem.
The Mathias Bynens, author mentioned in response of João Paraná, responsible, among others, for the jsPerf.com and HTML5 Boilerplate created a library called review for the sole purpose of inverting Strings.
var input = 'alguma coisa aqui';
var reversed = esrever.reverse(input);
console.log(reversed);
// → 'iuqa asioc amugla'
esrever.reverse(reversed) == input;
// → true
The library works with exotic strings and has a large number of unit tests (including passed the test of the Zalgo):
Online demo using the tool.
It’s always nice to use and contribute to a good open source project instead of reinventing the wheel.
@luiscubal, your idea to include "modes" to reverse the String is very interesting, maybe a pull request to the library.
5
A function to do the reverse of text including for UTF-16 Jsfiddle
function Reverso(input) {
var s = input, c = 1, r = '', l = '', h = /[\uD800-\uDFFF]/;
var b = /([\u0300-\u036F|\u1DC0-\u1DFF|\u20D0-\u20FF|\uFE20-\uFE2F])/;
while(s.length){
l = s.substr(0,1);
if(h.test(l)) c++;
while(b.test(s.substr(c,1))&&l!=' ') c++;
r = (b.test(l)?' ':'') + s.substr(0,c) + r;
s = s.substr(c);
c = 1;
}
return r;
}
Apparently we have a problem when ^a
is reversed. The correct answer would be a^
but html when combining a+^
generates â
thus the previous script includes a space for the output to be a^
.
Does not work for characters within BMP with combinatorics afterwards (due to cc>=2
) Demo: http://jsfiddle.net/v98AY/
You were right, thank you.
The new version seems to work well, at least in the cases I tested.
used teste de inversão da frase
and the result was frase da inversão deteste
eating the last space
@Spark Confirmo: http://jsfiddle.net/EC4Qd/1/
5
My solution uses https://raw2.github.com/bestiejs/punycode.js/master/punycode.js and was based in this article excellent (in English) do Mathias Bynens
The solution works for any UTF-16 string including those composed by one or more surrogate pair
See my code below:
HTML
<script src="/js/api/punycode.js"></script>
<div id="utf16" style="color: green;"></div>
Javascript
<script>
(function() {
var reverse = function(str) {
var myStrArray = punycode.ucs2.decode(str)
var result = [], length = myStrArray.length;
var i = 0;
while (length--) {
result[i++] = myStrArray[length];
}
return punycode.ucs2.encode(result);
}
var str = 'Olá ! Cuidado pois tem também esse caracter que é um surrogate pair.';
var div = document.getElementById('utf16');
div.innerHTML = str + ' E SEU REVERSO É : '+ reverse(str);
})();
</script>
Interesting use of Punycode! And this article reminded me of something I already suspected, but thought I had been "overcome": the fact of Javascript not force the use of UTF-16. P.S. Pity that its reverse
does not work with matching characters... (try var str = 'Ola\u0301';
)
really @mgibsonbr, I had checked this but decided to post anyway to keep the solution simple. When I get some time here I can improve it to treat this use case. In practice just invoke a method to interpret the codes in the uxxxx style in the first line of Function.
I wasn’t talking about that, I was talking about matching characters. For example, your code works well for string Ol\u00e1
(who uses a á
preform), even represented in this way. I believe it would be possible to treat combinants using punycode, but the reverse
in fact it would become more complex.
P.S. in practice, most systems only work with pre-composite same (but not all), so that simple solutions in general are good enough. Just "elegi" the response of luiscubal for its completeness even...
2
So many interesting answers, I’ll leave my contribution also, created by the Project staff PHPJS, one port of strrev() of PHP:
function strrev(string) {
// discuss at: http://phpjs.org/functions/strrev/
// original by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
// bugfixed by: Onno Marsman
// reimplemented by: Brett Zamir (http://brett-zamir.me)
// example 1: strrev('Kevin van Zonneveld');
// returns 1: 'dlevennoZ nav niveK'
// example 2: strrev('a\u0301haB') === 'Baha\u0301'; // combining
// returns 2: true
// example 3: strrev('A\uD87E\uDC04Z') === 'Z\uD87E\uDC04A'; // surrogates
// returns 3: true
string = string + '';
// Performance will be enhanced with the next two lines of code commented
// out if you don't care about combining characters
// Keep Unicode combining characters together with the character preceding
// them and which they are modifying (as in PHP 6)
// See http://unicode.org/reports/tr44/#Property_Table (Me+Mn)
// We also add the low surrogate range at the beginning here so it will be
// maintained with its preceding high surrogate
var grapheme_extend =
/(.)([\uDC00-\uDFFF\u0300-\u036F\u0483-\u0489\u0591-\u05BD\u05BF\u05C1\u05C2\u05C4\u05C5\u05C7\u0610-\u061A\u064B-\u065E\u0670\u06D6-\u06DC\u06DE-\u06E4\u06E7\u06E8\u06EA-\u06ED\u0711\u0730-\u074A\u07A6-\u07B0\u07EB-\u07F3\u0901-\u0903\u093C\u093E-\u094D\u0951-\u0954\u0962\u0963\u0981-\u0983\u09BC\u09BE-\u09C4\u09C7\u09C8\u09CB-\u09CD\u09D7\u09E2\u09E3\u0A01-\u0A03\u0A3C\u0A3E-\u0A42\u0A47\u0A48\u0A4B-\u0A4D\u0A51\u0A70\u0A71\u0A75\u0A81-\u0A83\u0ABC\u0ABE-\u0AC5\u0AC7-\u0AC9\u0ACB-\u0ACD\u0AE2\u0AE3\u0B01-\u0B03\u0B3C\u0B3E-\u0B44\u0B47\u0B48\u0B4B-\u0B4D\u0B56\u0B57\u0B62\u0B63\u0B82\u0BBE-\u0BC2\u0BC6-\u0BC8\u0BCA-\u0BCD\u0BD7\u0C01-\u0C03\u0C3E-\u0C44\u0C46-\u0C48\u0C4A-\u0C4D\u0C55\u0C56\u0C62\u0C63\u0C82\u0C83\u0CBC\u0CBE-\u0CC4\u0CC6-\u0CC8\u0CCA-\u0CCD\u0CD5\u0CD6\u0CE2\u0CE3\u0D02\u0D03\u0D3E-\u0D44\u0D46-\u0D48\u0D4A-\u0D4D\u0D57\u0D62\u0D63\u0D82\u0D83\u0DCA\u0DCF-\u0DD4\u0DD6\u0DD8-\u0DDF\u0DF2\u0DF3\u0E31\u0E34-\u0E3A\u0E47-\u0E4E\u0EB1\u0EB4-\u0EB9\u0EBB\u0EBC\u0EC8-\u0ECD\u0F18\u0F19\u0F35\u0F37\u0F39\u0F3E\u0F3F\u0F71-\u0F84\u0F86\u0F87\u0F90-\u0F97\u0F99-\u0FBC\u0FC6\u102B-\u103E\u1056-\u1059\u105E-\u1060\u1062-\u1064\u1067-\u106D\u1071-\u1074\u1082-\u108D\u108F\u135F\u1712-\u1714\u1732-\u1734\u1752\u1753\u1772\u1773\u17B6-\u17D3\u17DD\u180B-\u180D\u18A9\u1920-\u192B\u1930-\u193B\u19B0-\u19C0\u19C8\u19C9\u1A17-\u1A1B\u1B00-\u1B04\u1B34-\u1B44\u1B6B-\u1B73\u1B80-\u1B82\u1BA1-\u1BAA\u1C24-\u1C37\u1DC0-\u1DE6\u1DFE\u1DFF\u20D0-\u20F0\u2DE0-\u2DFF\u302A-\u302F\u3099\u309A\uA66F-\uA672\uA67C\uA67D\uA802\uA806\uA80B\uA823-\uA827\uA880\uA881\uA8B4-\uA8C4\uA926-\uA92D\uA947-\uA953\uAA29-\uAA36\uAA43\uAA4C\uAA4D\uFB1E\uFE00-\uFE0F\uFE20-\uFE26]+)/g;
// Temporarily reverse
string = string.replace(grapheme_extend, '$2$1');
return string.split('')
.reverse()
.join('');
}
Browser other questions tagged javascript string
You are not signed in. Login or sign up in order to post.
+1. Personally I do not even know if there is a way best or faster than that
– Emerson Rocha
Doesn’t work...
– Cabeção
Of course it works.
– user622
ah yes, it’s true
– Cabeção
Wrong for characters that occupy more than one Unit code. Experiment
Olá 𝒞!
– luiscubal
@luiscubal A response that meets this requirement (deal correctly with surrogate pairs and matching characters) indeed would be quite interesting!
– mgibsonbr
@mgibsonbr I am working on one. I already have non-BMps working, but I am lacking combinators.
– luiscubal
Almost 'pythonic' :)
– epx
I can’t upload my answer. Until I do, jsfiddle: http://jsfiddle.net/E6yb8/1/
– luiscubal
Done. I finally managed to upload. Note: Stackoverflow does not handle invalid Unic characters well.
– luiscubal
Beautiful examples, I see a great functionality. - Note: I had never noticed that
STACKOVERFLOW
the other way roundWOLFREVOKCATS
It reminded me Thundercats somehow.– Paulo Roberto Rosa
The translation of this Chinese text shows that it is a little offensive, no?
– acelent
@Thank God we’re in SO-PT, not Mandarin. =)
– OnoSendai