How to Reverse a String?

Asked

Viewed 10,820 times

32

A user will enter a sentence and I must display this inverted sentence in upper case. How can I do this?

import javax.swing.JOptionPane;

    public class ExerLar01 {
        public static void main(String args[])
        {
            char vetor[];
            String frase = JOptionPane.showInputDialog(null, "Digite uma frase: ");
            vetor = frase.toCharArray();
        }
    }

5 answers

42


Try to use the method reverse of StringBuilder:

String hi = "Hello world";
System.out.println(new StringBuilder(hi).reverse().toString());

See this example working here on IDEONE.

Adapting to your case:

 String frase = JOptionPane.showInputDialog(null, "Digite uma frase: ");
 String fraseInvertida = new StringBuilder(frase).reverse().toString();
    System.out.println(fraseInvertida);
//ou
 JOptionPane.showMessageDialog(null, fraseInvertida);

This only works if the JDK version is 1.5 or higher. For versions older, just change StringBuilder for StringBuffer.

Reference: Reverse a string in Java.

  • It worked thanks, our so simple I was complicating very vlw

  • Stringbuilder is a class?

  • @Sarah yes, it’s a class for manipulating strings, as well as stringbuffer.

  • 2

    Thanks, it worked out

  • @Sarah I don’t know how JOptionPane handles accented characters or emojis, but anyway put an answer below complementing the subject

  • @hkotsubo that runs away completely to what was asked.

  • @Articunol I think not, the question is "how to invert a String", and a String can have accents, emojis, etc. Unless Joptionpane does not support these characters.

  • @exact hkotsubo, the question did not quote anything that you spoke, so much that answer was accepted. If there was a need to treat accents or emojis, which had never been mentioned by the author, the answer would surely have been edited or the author mentioned.

  • @Articunol I understand your point of view, but in what way, if you search in google "how to invert java string", this question is one of the first results, and so wanted to leave a more "general" solution for future visitors. Please don’t misunderstand me, I didn’t mean that your answer is wrong (I even voted for it), I just wanted to record that, depending on the case, Stringbuilder may not solve.

  • 1

    @hkotsubo I understand, there is no problem in your answer, which even turned out great, by the way. I only commented because you made a point of quoting the author here in the comments, which may lead to a misinterpretation of her that my answer is wrong. I just wanted to make it clear that at no time was mentioned the situation you addressed, and the focus of the site is objectivity, if we stop to address all the possible situations, the answers will get tiresome and no room for others, as you yourself did here, respond with complements.

  • @Articunol Yeah, now I realize that I may have given the wrong impression to the author. A thousand apologies for the misunderstanding, was not my intention

Show 6 more comments

16

Just complementing, the solution proposed with StringBuilder works very well, but there are cases that are not covered, if we take into account all the possibilities of Unicode.

An example is when characters are accented. For example, í (the letter i with acute accent), according to Unicode, can be represented in two ways:

  1. like the code point U+00ED (LATIN SMALL LETTER I WITH ACUTE) (í)
  2. as a combination of two code points:

Code points are numeric values that Unicode defines for each character. To better understand, I recommend this article (in English) and this question (in Portuguese).

The first option is called Normalization Form C (NFC) and the second, Normalization Form D (NFD) - both described here, and there’s also an explanation in this question. In the NFD form, the character that corresponds to the acute accent (COMBINING ACUTE ACCENT) is applied to the previous character (in this case, the letter i - LATIN SMALL LETTER I), and when you print the String, both "come together", forming the í.

It is possible to have Strings in both forms, and when printing them, both will be shown as í. But internally, the code points will be different. The same goes for other accented characters, such as ô, ã and even characters that are not necessarily accentuated, such as the ç, which in NFD form results in the letter c followed by character cedilla (COMBINING CEDILLA).

Knowing this is important because the method StringBuilder::reverse reverses the order of the code points, which means that the result may not be as expected, depending on how the String is. To show what happens when the String is in NFD, I will use the class java.text.Normalizer, to create a String standardised in NFD, and then I will use the StringBuilder to invert it:

// normalizar em NFD
String s = Normalizer.normalize("ímãs", Normalizer.Form.NFD);
System.out.println(s); // ímãs
s = new StringBuilder(s).reverse().toString();
System.out.println(s); // s̃aḿi

The method Normalizer::normalize returns to String standard in NFD. This means that í is stored internally as two code points: the letter i and the acute accent. The same goes for the ã, which in NFD form corresponds to two code points: the letter a and the tilde.

Note that when printing, these code points are shown as one thing (the characters í, ã). But by reversing the String, the tilde was on top of the s and the seat was on top of the m. This happens because the StringBuilder reversed all code points. In other words:

After the reversal made by StringBuilder::reverse, the acute accent was after the letter m, so it is applied in this letter, and the i is without accent. The same goes for the tilde, which after the inversion, was after the s and so it is applied in this letter (and not in a). If your console doesn’t show the output correctly, follow an image of how it looked here:

string invertida com acentos errados

The StringBuilder will only work if the String is standardised in NFC, in which case the accented characters will be í (LATIN SMALL LETTER I WITH ACUTE) and the ã (LATIN SMALL LETTER A WITH TILDE). As they are represented by only a single code point, the accent of these characters does not "change places" after the call to StringBuilder::reverse.

The problem is that you can receive the String in any form, but if you print it, it will always be shown as ímãs. In the above example I forced String to be in NFD and so you may find that this will not happen in "normal conditions", but it may happen to a user to copy (Ctrl+C) a text from somewhere and paste (Ctrl+V) in its application, and if the original text is in NFD, the accents of the String reversed will be changed. There is no way to control this (the original text may be in any other program the user is using, and the user does not even know if it is in NFD or NFC, because the screen always shows the accent correctly) - see an example at this link (copy and paste the words mañana (NFC) and mañana (NFD), and see the difference).

Then a solution would be to use the method Normalizer::isNormalized to check whether the String is standardised in NFC, and if not, normalise before inverting it:

// inverte a String mantendo os acentos
if (! Normalizer.isNormalized(s, Normalizer.Form.NFC)) { // se não está em NFC, normaliza
    s = Normalizer.normalize(s, Normalizer.Form.NFC);
}
s = new StringBuilder(s).reverse().toString(); // sãmí

With this, accents are preserved in the correct characters and the result is sãmí (if you want, you can call normalize direct, because if the string is already in NFC, it will not be changed). But this still does not solve all the cases.


Emojis

Emojis add a new layer of complexity to this story. And as their use is becoming more common, it’s important to know how to work with them.

Usually emojis correspond to a code point, but that’s not always the case. The emojis of families, for example, are combinations of other emojis. For example, a family with father, mother and 2 daughters is actually a combination of the emoji of a man, one female and two emojis of girl. To join them, the character is used ZERO WIDTH JOINER (also called only ZWJ - and these sequences of emojis separated by ZWJ are called Emoji ZWJ Sequences).

That is, the emoji of "family with father, mother and 2 daughters" is actually a sequence of seven code points:

This sequence of codepoints can be displayed in different ways. If the system/program used recognizes this sequence, a single family image is shown:

emoji família com pai, mãe e 2 filhas

But if this sequence is not supported, the emojis are shown next to each other:

emoji família, com membros um do lado do outro, para sistemas que não suportam o emoji de família como uma única imagem

Regardless of how they are shown, because it is a sequence that can correspond to a single symbol, we cannot change the order of the code points, because then it is no longer interpreted as the Emoji ZWJ Sequence of "family with father, mother and 2 daughters". And if we use StringBuilder::reverse (even if we normalize to NFC, as suggested above), the order of these code points will be changed to "GIRL, ZWJ, GIRL, ZWJ, WOMAN, ZWJ, MAN" and we will no longer have the "family emoji".


There are still other sequences that are not separated by ZWJ, such as country flag emojis. Instead of creating a code point for each flag of each country existing in the world, region indicator symbols were created (Regional Indicator Symbols), which are characters that correspond to letters from A to Z (not letters themselves, characters correspondent to these letters, but which are another block of Unicode).

The codes follow the definition of ISO 3166, in which each country has a 2-letter code. Australia, for example, has the code "AU", so to make the emoji of the flag of Australia, just take the characters corresponding to the letters "A" and "U" in the Regional Indicator Symbols, which in this case are:

With this, systems/programs can display these 2 code points as the Australian flag:

bandeira da Austrália

Or, if this is not supported, a representation of the characters themselves is shown:

Regional Indicator Symbols A e U

If your console doesn’t even support the representation of these characters, it might just show ? or blank squares.

Regardless of how it is shown, the two Regional Indicator Symbols correspond to a single "character" (a single "drawing on the screen"), and should be treated as if they were one thing. But if we use StringBuilder::reverse, the code points are reversed and the result is the Regional Indicator Symbols corresponding to "UA". And as "AU" is the code of Ukraine, the resulting emoji is the flag of Ukraine:

// Bandeira da Austrália
int[] cp = { 0x1f1e6, 0x1f1fa };
String australia = new String(cp, 0, cp.length);
System.out.println(australia); // Bandeira da Austrália
String s = new StringBuilder( // normalização não muda nada neste caso
                Normalizer.normalize(australia, Normalizer.Form.NFC))
            // inverte os codepoints, passando a ser "UA" em vez de "AU"
            .reverse().toString();
System.out.println(s); // Bandeira da Ucrânia

If your console is not compatible with emojis, follow an image of what the output of this code would look like:

bandeiras da Austrália e Ucrânia

Or, in case the flag emojis are not supported, it may be that the output is:

Regional Indicator Symbols AU e UA

In order for this not to happen, the two Regional Indicator Symbols ("A" and "U") should be treated as if they were one thing, but with StringBuilder that is not possible.

Technically, the sequence of two Regional Indicator Symbols represents the country/region (not necessarily the flag itself), but many systems often display the respective flags when these characters are printed.


Finally, in addition to the cases already mentioned, there are many others foreseen in Unicode, in which two or more code points are interpreted as a single "character". The name given to these sequences is Grapheme Cluster.

Like StringBuilder::reverse simply inverts the code points, without considering whether they are part of a Grapheme Cluster, the solution to correctly reverse a String is more complicated than simply calling this method.

A solution option is to read the unicode document describing the Grapheme Clusters and implement it. I leave it as exercise for the reader

Another option is to see if someone smarter has ever done it. An example is this project at Github, that I found very interesting, because you recognize Emoji ZWJ Sequences and emojis of flags as a single Grapheme Cluster. With this it is possible to capture these clusters and invert the String correctly.

An example of code with this API:

import com.github.bhamiltoncx.UnicodeGraphemeParsing;
import com.github.bhamiltoncx.UnicodeGraphemeParsing.Result;

// criar uma String com vários Grapheme Clusters
int[] cp = {
        // emoji de família
        0x1f468, 0x200d, 0x1f469, 0x200d, 0x1f467, 0x200d, 0x1f467,
        // bandeira da Austrália
        0x1f1e6, 0x1f1fa,
        // outros emojis
        0x1f4a9, 0x1f4b0 };
String s = new String(cp, 0, cp.length)
        // juntar com caracteres acentuados na forma NFD
        + Normalizer.normalize("áñabc", Normalizer.Form.NFD);
System.out.println(s); // Mostrar a String original (antes de inverter)

// obter os Grapheme Clusters
List<Result> graphemeClusters = UnicodeGraphemeParsing.parse(s);
Collections.reverse(graphemeClusters); // inverter a lista de Grapheme Clusters

// construir a String invertida
StringBuilder reverse = new StringBuilder();
for (UnicodeGraphemeParsing.Result grapheme : graphemeClusters) {
    // obter o trecho da String que corresponde ao Grapheme Cluster
    String str = s.substring(grapheme.stringOffset, grapheme.stringOffset + grapheme.stringLength);
    reverse.append(str);
}
String invertida = reverse.toString();
System.out.println(invertida);

The method UnicodeGraphemeParsing::parse gets the list of grapheme clusters of a String. Then I use it Collections.reverse to invert this list, and I walk it, keeping the respective parts of String in a StringBuilder, to finally have the String reversed. The output of the code is:

Emojis e acentos corretamente invertidos

Remember that if your console does not support ZWJ Sequences or flags, the family can be shown as 4 emojis (man, woman, 2 girls) and the flag can be shown as the Regional Indicator Symbols themselves, as previously mentioned. But the important thing is that these sequences are treated as if they were one thing, not occurring the problems already mentioned (change the flag and undo the ZWJ Sequence).

For comparison purposes, see the result with StringBuilder::reverse, normalizing to NFC and NFD:

System.out.println("NFC:" + new StringBuilder(Normalizer.normalize(s, Normalizer.Form.NFC)).reverse().toString());
System.out.println("NFD:" + new StringBuilder(Normalizer.normalize(s, Normalizer.Form.NFD)).reverse().toString());

The exit is:

Saída com StringBuilder, emojis errados

Note that converting to NFC only corrects accents, but none of the ways prevents the problem of reversing the flag and ZWJ Sequence code points.


Regex

If you do not want to use an external library, starting from Java 9 it is possible to use \X to check a Unicode Extended grapheme cluster - which is explained in detail here.

However, in the tests I did, he doesn’t recognize a Emoji ZWJ Sequence, because using \X, each emoji of the sequence is considered a Grapheme Cluster separate. The flag emojis, however, are recognized correctly. Using as example the same String of the previous example:

int[] cp = {
        // emoji de família
        0x1f468, 0x200d, 0x1f469, 0x200d, 0x1f467, 0x200d, 0x1f467,
        // bandeira da Austrália
        0x1f1e6, 0x1f1fa,
        // outros emojis
        0x1f4a9, 0x1f4b0 };
String s = new String(cp, 0, cp.length)
        // juntar com caracteres acentuados na forma NFD
        + Normalizer.normalize("áñabc", Normalizer.Form.NFD);
Pattern p = Pattern.compile("\\X");
Matcher m = p.matcher(s);
while (m.find()) {
    System.out.println(m.group());
}

In this example I am looking for all Grapheme Clusters with \X (remembering that inside Strings the backslash is written as \\). The result, however, shows each emoji of ZWJ Sequence separately (the flag, however, is correct):

regex X mostra ZWJ sequence quebrada em vários emojis

In the Java 13 improved a little, but still not perfect: the \X recognizes the emojis of man and woman as a Grapheme Cluster, and the two girl emojis as other (instead of considering all one thing only).


One solution would be to try to modify the regex to get a ZWJ Sequence or one \X:

Pattern p = Pattern.compile("\\p{InMiscellaneous_Symbols_and_Pictographs}(?:\\u200d\\p{InMiscellaneous_Symbols_and_Pictographs})*|\\X");

\p{InMiscellaneous_Symbols_and_Pictographs} serves to catch any code point that is on block Miscellaneous Symbols and Pictographs (click on the link and see that the list is great). Next I see if there are one or more occurrences of ZWJ (\u200d) followed by another block symbol Miscellaneous Symbols and Pictographs. Using this regex, the Grapheme Cluster family is recognized as a single "character".

Then it would be enough to save the results of m.group() on a list, invert it and build the String inverted:

List<String> grafemas = new ArrayList<>();
while(m.find()) {
    grafemas.add(m.group());
}
Collections.reverse(grafemas);
String invertida = String.join("", grafemas);

With that, the String reversed is the same as in the previous example with UnicodeGraphemeParsing::parse. But there’s a catch.

The block Miscellaneous Symbols and Pictographs does not contain all emojis. There are several that are in other blocks, such as the heart, who’s on the block Dingbats, and many others scattered over other blocks, such as the Block Emoticons, which contains the Smiles and other symbols (plus there are blocks can have other characters that are not emojis). Finally, try to make a regex that contemplates all possible emojis it’s not that simple. Another option is to limit yourself to ZWJ Sequences described in Unicode, making a regex that contemplates only those sequences (even so it is a considerable job).


Java <= 8

In Java 8 there is no support for \X, but I found an alternative in this reply by Soen. Unfortunately, it’s nothing simple:

String extendedGraphemeCluster = "(?:(?:\\u000D\\u000A)|(?:[\\u0E40\\u0E41\\u0E42\\u0E43\\u0E44\\u0EC0\\u0EC1\\u0EC2\\u0EC3\\u0EC4\\uAAB5\\uAAB6\\uAAB9\\uAABB\\uAABC]*(?:[\\u1100-\\u115F\\uA960-\\uA97C]+|([\\u1100-\\u115F\\uA960-\\uA97C]*((?:[[\\u1160-\\u11A2\\uD7B0-\\uD7C6][\\uAC00\\uAC1C\\uAC38]][\\u1160-\\u11A2\\uD7B0-\\uD7C6]*|[\\uAC01\\uAC02\\uAC03\\uAC04])[\\u11A8-\\u11F9\\uD7CB-\\uD7FB]*))|[\\u11A8-\\u11F9\\uD7CB-\\uD7FB]+|[^[\\p{Zl}\\p{Zp}\\p{Cc}\\p{Cf}&&[^\\u000D\\u000A\\u200C\\u200D]]\\u000D\\u000A])[[\\p{Mn}\\p{Me}\\u200C\\u200D\\u0488\\u0489\\u20DD\\u20DE\\u20DF\\u20E0\\u20E2\\u20E3\\u20E4\\uA670\\uA671\\uA672\\uFF9E\\uFF9F][\\p{Mc}\\u0E30\\u0E32\\u0E33\\u0E45\\u0EB0\\u0EB2\\u0EB3]]*)|(?s:.))";

I did a test with this regex and unfortunately she does not recognize the flag emojis as if it were a single Grapheme Cluster (each Regional Indicator Symbol is considered a separate cluster) - in addition to also breaking the ZWJ Sequence into several emojis, in the same way that \X. That is, in addition to trying to understand this regex, you will still have to modify it to contemplate these cases. An attempt:

String extendedGraphemeCluster = "(?:(?:\\u000D\\u000A)|(?:[\\u0E40\\u0E41\\u0E42\\u0E43\\u0E44\\u0EC0\\u0EC1\\u0EC2\\u0EC3\\u0EC4\\uAAB5\\uAAB6\\uAAB9\\uAABB\\uAABC]*(?:[\\u1100-\\u115F\\uA960-\\uA97C]+|([\\u1100-\\u115F\\uA960-\\uA97C]*((?:[[\\u1160-\\u11A2\\uD7B0-\\uD7C6][\\uAC00\\uAC1C\\uAC38]][\\u1160-\\u11A2\\uD7B0-\\uD7C6]*|[\\uAC01\\uAC02\\uAC03\\uAC04])[\\u11A8-\\u11F9\\uD7CB-\\uD7FB]*))|[\\u11A8-\\u11F9\\uD7CB-\\uD7FB]+|[^[\\p{Zl}\\p{Zp}\\p{Cc}\\p{Cf}&&[^\\u000D\\u000A\\u200C\\u200D]]\\u000D\\u000A])[[\\p{Mn}\\p{Me}\\u200C\\u200D\\u0488\\u0489\\u20DD\\u20DE\\u20DF\\u20E0\\u20E2\\u20E3\\u20E4\\uA670\\uA671\\uA672\\uFF9E\\uFF9F][\\p{Mc}\\u0E30\\u0E32\\u0E33\\u0E45\\u0EB0\\u0EB2\\u0EB3]]*)|(?s:.))";
String zwjSequence = "\\p{InMiscellaneous_Symbols_and_Pictographs}(?:\\u200d\\p{InMiscellaneous_Symbols_and_Pictographs})*";
String flag = "[\\x{1f1e6}-\\x{1f1ff}]{2}";
// regex com ZWJ Sequence, ou sequência de bandeira, ou grapheme cluster
Pattern p = Pattern.compile(zwjSequence + "|" + flag + "|" + extendedGraphemeCluster);

With that, the ZWJ Sequences and the flag emojis are recognized as a single Grapheme Cluster (with all the aforementioned holds on the block Miscellaneous Symbols and Pictographs not having all emojis), but unfortunately it doesn’t work for Korean characters: for example, the character in the NFD form consists of 3 codepoints: (HANGUL CHOSEONG KIYEOK), (HANGUL JUNGSEONG A) and (HANGUL JONGSEONG KIYEOK ). But the above regex does not recognize these 3 code points as a single Grapheme Cluster. Already the previous solutions (UnicodeGraphemeParsing::parse and \X from Java 9) recognize this Grapheme Cluster correctly.

In that case, I think I’d better go to a library, like the one I indicated above - did not test for all possible cases, such as characters of different languages (tested only with Japanese and Korean characters and worked), but everything depends on their use cases.


That doesn’t mean that the solution with StringBuilder::reverse is wrong. If you are only dealing with ASCII characters (or texts in Portuguese/English/any language that uses the Latin alphabet), just remember to normalize the String for NFC that will not have problems (maybe there is a case or other, will know, but for most cases that only use ASCII should work).

I just wanted to show that there are several other cases that, should they arise, will require a more comprehensive (and complicated) solution. And I recognize that my answer is not complete (in the sense of contemplating all the possibilities foreseen by Unicode), because I am still beginning to deepen myself in the subject (Unicode is much more complicated than I thought) and surely there are many more details that I still do not know (I only know that I know nothing...)

  • 1

    If you can explain why I voted no, I will be happy to correct the problem...

5

When the language does not provide a method to do this, I usually use the following algorithm:

public String reverse(String str) {
  char[] chars = str.toCharArray();
  int numChars = chars.length - 1;
  for (int i = 0; i < numChars / 2; i++) {
    char temp = chars[i];
    chars[i] = chars[numChars - i];
    chars[numChars - i] = temp;
  }
  return new String(chars);
}

It works by replacing the first char with the last char, the second with the penultimate, the third with the antepenultimate and so on.

I don’t know if you’re the best, or if you have any limitations, but you always took what I needed. (The only times I needed to reverse a string were in exercises.)

This is more or less how Stringbuilder Verse works.

  • The advantage of Stringbuilder is that it manages the string in its "memory scope". The way you programmed the VM will create multiple strings in the string pool until you get on final result.

  • Actually no. The code I made will only create A String instance on the return of the method. All operation is done on top of a SINGLE character array and not on Strings.

  • I’m sorry Leonardo, you’re right. I hadn’t seen the conversion to arrays. I said m***

  • Come on, no need to apologize! Your concern is legitimate. If they were Strings, this what you said would actually happen, and performance would not be good ahaha.

  • 1

    @fernandosavio and Leonardo, just to complement, a character (in the sense of "a single symbol shown on the screen") will not always occupy only one char, then invert the array of char not always works for all cases. I detail a little of this in the my answer

3

Java is not my strong suit but I’ll leave here the recursive version to invert a string:

class Main {
  public static void main (String[] args) throws java.lang.Exception {
      String str = "miguel";
      System.out.println(new Main().str_reverse(str).toUpperCase()); // LEUGIM
  }
  public String str_reverse(String str) {
      if(str.isEmpty()) {
          return str;
      }
      // concatenar ultimo caracter com o resto string excepto o ultimo
      return str.substring(str.length() - 1) + str_reverse(str.substring(0, str.length() - 1));
  }
}

DEMONSTRATION

-5

public static void main(String[] args) {           
    Scanner entrada= new Scanner(System.in);
    String x =entrada.next();
    char charAt = x.charAt(0);
    String invertido="";

    for(int i=0;i<x.length();i++) {
        invertido= x.charAt(i)+invertido;
    }

    System.out.println(invertido);
}
  • 2

    In a robust, and "old" language like Java, it’s very likely that you have a lot of ready-made methods. It is great to know how to create an algorithm to do something, without using things already ready, but it is better to use everyday already use what the language offers not to reinvent the wheel, for reasons of Clean Code and maintainability of code.

  • For your code to work properly you’d have to use entrada.nextLine(), if you type more than one word. And you can simplify by iterating the characters from the last index to the first: public static void main(String[] args) {&#xA; Scanner entrada = new Scanner(System.in);&#xA; String x = entrada.nextLine();&#xA; String invertido = "";&#xA; &#xA; for (int i = x.length() - 1; i >= 0; i--) {&#xA; invertido += x.charAt(i);&#xA; }&#xA;&#xA; System.out.println(invertido);&#xA; }

Browser other questions tagged

You are not signed in. Login or sign up in order to post.