Regular expression to retrieve strings starting with colons (:)

Asked

Viewed 2,734 times

4

I need a regular expression to retrieve a list of strings starting with the two-point character (":") and ending with the space character or end of parentheses (")").

Example:

String texto = "(:TEXTOQUALQUER NADA DO FOI :TEXTOQQDENOVO SERÁ DE NOVO :TEXTOQQMAIS DO JEITO QUE UM DIA :TEXTO3343)";

Observing

There is no standard for "keywords", the words that are accompanied by the two-point character; they are varied in size and succeed in white space or end of parentheses as already mentioned. How can I get only the list of these strings?

Expected result

[TEXTUALUALQUER, TEXTOQQDENOVO, TEXTOQQQQMAIS, TEXTO3343]

  • 3

    Suggestion: https://regex101.com/r/dM1vW1/2

  • 3

    Usually Regex for this kind of thing is an exaggeration. Incidentally, Regex is that thing that should only be used when the normal possibilities of solving the problem have really been exhausted.

  • @Bacco I agree that it is exaggeration, however it is the caveat that depending on the case its use may be the most robust way to treat the input. After all, what is text? Whichever character other than : and )? Or is there more involved? In the impossibility of making one parse complete (which can be even more Overkill that use regex) this technique helps to isolate small simple fragments within a complex structure - without having to interpret this whole structure.

  • 2

    @mgibsonbr In this case I can’t imagine any reason for Regex, honestly. If you are dealing with 2 simple and static delimiters, just go forward the next pro pointer (thing the regex will have to do, qq shape). Although when it comes to java, you may even be able to make regex more efficient, given the "consistency" with which each thing is implemented internally... PS: But I find perfectly valid answers given the will of the OP, by the way.

2 answers

5


The expression suggested by Sergio in the comments seems to be the simplest way, saved by " (that was not mentioned in the question), and the missing blank (as pointed out by Gustavo Cinque in the comments). My suggestion is to use it to find all marriages:

List<String> resultado = new ArrayList<String>();
Matcher m = Pattern.compile(":([^:\\) ]+)").matcher(texto);
while ( m.find() )
    resultado.add(m.group(1));

Note: my previous answer (in file) does not apply in this case, first because it is no longer necessary to use the trim (the string no longer contains whitespace), because it is not necessary to remove the middle spaces (idem).

  • 1

    Regex will not work... You are saving the variables and returning this result: [TEXTOQUALQUERNADADOFOI, TEXTOQQDENOVOSERÁDENOVO, TEXTOQQMAISDOJEITOQUEUMDIA, TEXTO3343]. Just add a little space (Pattern.compile(":([^:) ]+)")) after the ) and before the ]. I tested it here and it works. Ah, there is no way to put a bar alone inside Regex, because the compile() expects all bars to be followed by \b \t \n \f \r \" \' \\

  • And this foreach is weird too, huh?

  • 1

    @Gustavocinque Thanks for the corrections! In fact I had misinterpreted the expected result, I thought it was all the text until the next : (but in fact no, it’s only up to the next space itself). As for the forEach, when you know many languages always end up confusing one with another hehe...

  • Normal, when I put it here in the IDE, I didn’t even realize, for me it was right too =)

  • @mgibsonbr, the answer is perfect. The motivation for creating this question arose from the need to recover the "named parameters".

1

Answer without using Regex:

import java.util.*;
 
class Program {
    public static void main (String[] args) {
        String texto = "(:TEXTOQUALQUER NADA DO FOI :TEXTOQQDENOVO SERÁ DE NOVO :TEXTOQQMAIS DO JEITO QUE UM DIA :TEXTO3343)";
        List<String> textos = new ArrayList<String>();
        while (texto.length() > 0) {
            texto = texto.substring(texto.indexOf(":") + 1);
            int posicaoParentese = texto.indexOf(")");
            int posicaoEspaco = texto.indexOf(" ");
            int posicaoFinal = Math.min((posicaoParentese == -1 ? Integer.MAX_VALUE : posicaoParentese), (posicaoEspaco == -1 ? Integer.MAX_VALUE : posicaoEspaco));
            textos.add(texto.substring(0, posicaoFinal));
            texto = texto.substring(posicaoFinal + 1);
        }
        for (String item : textos) System.out.println(item);
    }
}

Behold working in the ideone. And in the repl it.. Also put on the Github for future reference.


I’m leaving the previous attempts to help anyone with a similar problem. The question was quite confusing forcing the answers (not only mine) to be edited to get the desired result. I hope it is now ok.

Reading your question better I think you want something else, I think it would just be istol

import java.util.*;

class Program {
    public static void main (String[] args) {
        String texto = "(:TEXTOQUALQUER NADA DO FOI :TEXTOQQDENOVO SERÁ DE NOVO :TEXTOQQMAIS DO JEITO QUE UM DIA :TEXTO3343)";
        List<String> textos = new ArrayList<String>();
        while (texto.length() > 0) {
            texto = texto.substring(texto.indexOf(":") + 1);
            int posicaoParentese = texto.indexOf(")");
            int posicaoEspaco = texto.indexOf(" ");
            int posicaoFinal = Math.min((posicaoParentese == -1 ? Integer.MAX_VALUE : posicaoParentese), (posicaoEspaco == -1 ? Integer.MAX_VALUE : posicaoEspaco));
            textos.add(texto.substring(0, posicaoFinal));
            texto =  texto.substring(posicaoFinal + 1);
        }
        for (String item : textos) System.out.println(item);
    }
}

Behold working in the ideone. And in the repl it.. Also put on the Github for future reference.

If it has not yet been answered, you do not need Regex for this, just a Split():

class Program {
    public static void main (String[] args) {
        String texto = "(:TEXTOQUALQUER NADA DO FOI :TEXTOQQDENOVO SERÁ DE NOVO :TEXTOQQMAIS DO JEITO QUE UM DIA :TEXTO3343)";
        String[] textos = texto.split(":");
        for (String item : textos) System.out.println(item);
    }
}
 

Behold working in the ideone. And in the repl it.. Also put on the Github for future reference.

If you don’t want what comes before the first : simply ignore element 0 of arryay (texts[0]).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.