Regex for split comparison operators

Asked

Viewed 171 times

2

I’m having trouble putting together a regular expression that meets the following condition:

String formula = " 100 != (50 + 20 + 30) ";

String arr = formula.split(" somente os caracteres: '=', '!=', '<', '<=','>=' e '>' ");

What would that expression look like? I would also like to not lose the operator on split.

1 answer

4


An alternative is:

String formula = " 100 != (50 + 20 + 30) ";
String[] partes = formula.split("!?=|[<>]=?");
for (String s : partes) {
    System.out.println(s);
}

The regex uses alternation (the character |, which means or), and has two options:

  • the character !, but how do you have a ? soon after, it makes you optional. Then we have equal sign (=). Thus, the passage !?= caught so much != how much =
  • [<>]=?: one character class ([<>], which corresponds to any of the 2 characters < or >), followed by optional equals signal (=?). So we can have >, >=, < or <=

The exit is:

 100 
 (50 + 20 + 30) 

One detail is that this regex can also consider cases like 10 ==== 1 + 9, and the resulting array of split can have multiple elements that are just an empty string.

To avoid this, we can assume that before and after the separator always has a space:

String[] partes = formula.split(" (!?=|[<>]=?) ");

Notice that now has a space before and after the regex. And I also grouped everything in parentheses, because the spaces must be before and after both possibilities (without the parentheses, the regex is interpreted as "has space before !?= (but then whatever), or has room after [<>]=? (but never mind)".

But this case only works if you have a space before and after the tab. If this is your case, you can use this one.


But if you want to accept formulas without spaces (such as 10!=20), can check if before and after the separator only has characters that are not the separators themselves:

String[] partes = formula.split("(?<=[^!=<>])(!?=|[<>]=?)(?=[^!=<>])");

Now I use lookarounds: the lookbehind (the stretch with (?<=) and the Lookahead (the stretch with (?=), that serve to check if something exists before and after. In case, I am checking [^!=<>] (anything that nay be the characters !, =, < or >) - the ^ right after the opening bracket causes the character class to be denied.

The trick of lookbehind and Lookahead is that they only check if something exists before or after, but this part is not part of the match, and therefore is not removed in the split.

Thus, the expression ignores cases such as 2===1+1, besides doing the split correctly even if there are no spaces before and after the separator.


After the split, the operator is lost. To recover it, you have two options:

  1. get it separately by using the same regex:
String formula = " 100>=(50 + 20 + 30) ";
Matcher matcher = Pattern.compile("(?<=[^!=<>])(!?=|[<>]=?)(?=[^!=<>])").matcher(formula);
while (matcher.find()) {
    System.out.println(matcher.group()); // >=
}

I use while in case there is more than one occurrence of the operator in the string. But if you only want the first occurrence, you can switch to if.

  1. get it in the same array returned by split. This requires further complication as another Lookahead and another lookbehind all around the expression:
String regexTemplate = "((?<=%1$s)|(?=%1$s))";
String formula = " 100>=(50 + 20 + 30) ";
String[] partes = formula.split(String.format(regexTemplate, "(?<=[^!=<>])(!?=|[<>]=?)(?=[^!=<>])"));
for (String s : partes) {
    System.out.println(s);
}

The idea of this regex was taken of Soen’s reply.
Basically, you use it ((?<= expressão )|(?= expressão)). That is, it checks the positions of the string that contains regex before or after, and does the split in those positions. As the Lookahead and lookbehind are not part of the match, the operator is also returned to split.

The exit is:

 100
>=
(50 + 20 + 30) 

If you want to include the operator ==, change the regex to:

String formula = " 100 == (50 + 20 + 30) ";
String[] partes = formula.split("[!=]?=|[<>]=?");
for (String s : partes) {
    System.out.println(str);
}

Now instead of just !, i use [!=], who accepts so much ! how much = before the other = (that is, accepts both != how much ==).

If you want to change options 1 and 2 above, it would be:

String formula = " 100==(50 + 20 + 30) ";
Matcher matcher = Pattern.compile("(?<=[^!=<>])([!=]?=|[<>]=?)(?=[^!=<>])").matcher(formula);
while (matcher.find()) {
    System.out.println(matcher.group()); // ==
}
String regexTemplate = "((?<=%1$s)|(?=%1$s))";
String formula = " 100==(50 + 20 + 30) ";
String[] partes = formula.split(String.format(regexTemplate, "(?<=[^!=<>])([!=]?=|[<>]=?)(?=[^!=<>])"));
for (String s : partes) {
    System.out.println(s);
}
  • After performing the split the operator " != " is lost, I can rescue it ?

  • @Brennosegolin Yes, you can. I updated the answer

  • Thank you very much for your help !

  • I came across an error in my instruction, following the logic of the above regex it is not possible to get the return if the expression contains "==", correct?

  • 1

    @Brennosegolim I updated the answer

  • @hkotsubo it is possible to follow this same logic for an expression using javascript?

  • @Victorhenrique Yes: https://jsfiddle.net/3ugj9txk/ (but if your question is another, I suggest you ask another question, because then it is visible to everyone on the main page and you have a better chance of someone answering - if you comment here, probably only I will see, and lately I am without much time to dedicate to the site...)

Show 2 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.