Return text between keys, without returning the keys themselves

Question

Return text between keys, without returning the keys themselves

Asked 6 years, 1 month ago

Viewed 333 times

3

I used the following java regular expression to return strings that are between keys:

\\{[^\\}]+?\\}

My program worked almost correctly, but it even returns the key.

Algorithm:

public static void main(String[] args) throws IOException {
        String data = "{Papel A}{Papel B}{} não é papapel";
        List<String> papeis = new ArrayList<String>();
        Pattern p = Pattern.compile("\\{[^\\}]+?\\}");
        Matcher matcher = p.matcher(data);
        while (matcher.find()) {
            String result = matcher.group();
            papeis.add(result);
        }

        for (String string : papeis) {
            System.out.println(string);
        }
    }

Return:

{Papel A}
{Papel B}

I would like to remove the results keys that match.

I believe we can solve this via regex, without the need to use the class’s replace method String.

3 answers

4

One solution is to put the section you want in parentheses, as this will form a catch group.

Then just pass the group number to the method group:

String data = "{Papel A}{Papel B}{} não é papapel";
List<String> papeis = new ArrayList<String>();
Pattern p = Pattern.compile("\\{([^\\}]+?)\\}"); // coloquei o trecho que quero entre parênteses
Matcher matcher = p.matcher(data);
while (matcher.find()) {
    String result = matcher.group(1); // pego o primeiro grupo de captura
    papeis.add(result);
}
for (String string : papeis) {
    System.out.println(string);
}

Note that I only left the keys out of the parentheses, so they will only take the content that is inside them. And as it is the first pair of parentheses of regex, they correspond to group 1, so I do matcher.group(1) to catch him.

The exit is:

Papel A
Papel B

Remember that this regex works as long as there is not a pair of keys inside another. If you have, for example, {abc{xxx}aaa}, the regex will take abc{xxx, 'cause she goes on and on until she finds the first }. You could change it to Pattern.compile("\\{([^{}]+)\\}"), for her to take only the xxx, for example.

But build a regex that detects nested key cases and take (using the example above) abc{xxx}aaa is a little more complicated as it requires the use of recursive regex, that Java does not support.

In this case, the alternative would be to go through the string and go counting the keys manually (see a version of this algorithm in this answer):

String data = "{Papel A}{Papel B}{abc{xxx}aaa} não é papapel";
List<String> papeis = new ArrayList<String>();
int chave = 0;
StringBuilder sb = new StringBuilder();
for (char c : data.toCharArray()) {
    if (c == '{') {
        chave++;
    }
    if (chave > 0) {
        sb.append(c);
    }
    if (c == '}') {
        chave--;
        if (chave == 0) {
            papeis.add(sb.toString().replaceAll("\\{(.+)\\}", "$1"));
            sb.setLength(0);
        }
        if (chave < 0) {
            chave = 0;
        }
    }
}
for (String string : papeis) {
    System.out.println(string);
}

Browser other questions tagged java regex

You are not signed in. Login or sign up in order to post.

by Lucas Miranda • **1,314** points · Answer 1 · 2019-06-25T19:26:20+00:00

I believe this one solves your problem

(?<=\{).+?(?=\})

Testing: http://refiddle.com/refiddles/5d12729475622d756b0a0000

by João Vitor Fontes • 84 points · Answer 2 · 2019-06-25T19:34:40+00:00

-1

This can help you

String suastringtratada = Normalizer.normalize(suastring, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", "");

The question asks to return a string without the keys ({abc} should result in abc), but the code presented does not do this (see here an example). In fact what this code does is remove the accents (for example, {áéí} becomes {aei}, Note that only removes accents, but not the keys), and I don’t see how this can be related to the problem presented in the question. If you can [Dit] the answer explaining better...

– hkotsubo

2019/06/26 at 10:49