How to select a text snippet in a String?

Asked

Viewed 6,231 times

12

I’m creating a text file and using some delimiters (<# and #>) I need to select what’s inside the delimiter <# texto.delimitado #>. Using the function split() javascript.

What would be the best regular expression for this? I’ve used Regular Expression (<#|#>), but did not bring the desired result.

4 answers

13


With that expression

/<#(.*?)#>/

You can capture the text between <# and #>.

To get all the match you need to use it as follows:

// cria um objeto RegExp com a flag global
var regex = new RegExp("<#(.*?)#>", "g");

var teste = "<# Meu primeiro teste aqui é # bem esperto #> "
            + "<# Este é meu # segundo # teste #>";

And to run the regex:

var match;
while ((match = regex.exec(teste))) // se chegou ao fim retorna null
{
    console.log(match[1]); // match[1] = o que está entre parenteses
}

Upshot:

Meu primeiro teste aqui é # bem esperto
Este é meu # segundo # teste 

12

We can use your regular expression <#|#> without problems. Thus, using the method split(), as requested, the following can be done:

/* Declarações gerais */
var er = new RegExp("<#|#>","g");
var dados_arquivo = new String("<#texto.delimitado.1#><#texto.delimitado.2#>");
var i = new Number();
var resultado = new Array();

/* Obtém os dados que importam */
resultado = dados_arquivo.split(er);

/* Remove os itens não desejados (criados pelo método split) */
for(i = 0; i < resultado.length; i++)
{   
    if(resultado[i] == "")
    {
        resultado.splice(i,1);
    }
}

The result is a array (vector) with the values "delimited text. 1" and "delimited text. 2".

At the end of the code, there is a for which serves to remove empty items from array created by the split. Explaining:

The split() takes everything that "home" (match) and throws away and, what not "house", it returns as a array. However, how the split takes everything left and right of what "married" (but who was not married), where there is nothing he simply takes this "nothing" and puts as another item of the array resultant.

It is worth noting that the case of text not between "<#" and "#>" (in this order): the portion of text that is not among "<#" and "#>" is seen as bordering them (as explained above), even if it is not among the bounders themselves. This is because the ER used does not see these delimiters as a unit, but as two distinct separators because they are separated by "or" (|). Example:

  • change the code above with

    var dados = new String("a<#texto.delimitado.1#>b<#texto.delimitado.2#>c");
    
  • the final result will be 5 items: "a", "text.delimited. 1", "b", "delimited text. 2" and "c"

Thus, it is important that, if this occurs, use an algorithm that removes first these unwanted text data. If this is the case, you can use the code below:

/* Declarações gerais */
var er = new RegExp("<#|#>","g");
var dados_arquivo = new String("a<#texto.delimitado.1#>b<#texto.delimitado.2#>c");
var i = new Number();
var resultado = new Array();

/* Algorítimo auxiliar // INÍCIO */
var er_auxiliar = new RegExp("<#.*?#>","g");
var texto_delimitado = dados_arquivo.match(er_auxiliar);

while(texto_delimitado.length > 1)
{   
    texto_delimitado[0] = texto_delimitado[0] + texto_delimitado[1];
    texto_delimitado.splice(1,1);
}
/* Algoritmo auxiliar // FIM */

/* Obtém dados que importam */
resultado = texto_delimitado[0].split(er); /* <- Foi trocada a variável */

/* Remove os itens não desejados (criados pelo método split) */
for(i = 0; i < resultado.length; i++)
{       
    if(resultado[i] === "")
    {
        resultado.splice(i,1);
    }
}

The novelty (algorithm added) has been marked in the code. Changes have been made to the variables name to conform to the new code.

What the added algorithm does is as follows: it searches the data obtained from the original file (with the delimiters) and gets everything that is between "<#" and "#>" (by means of an auxiliary ER for the method match(). The result would be a array. But what’s in the while is precisely a way of uniting the entire result obtained as if it were a single string so that the algorithm (which already had itself) can separate everything with its ER.

That’s it; I hope I’ve helped!

  • A simpler way would be to use this regex for the delimiter ^<#|#>$|#><#, in the first example. So there is the need for the loop at the end and covers all cases. However I do not know if it is valid to assume that the delimiters will be in sequence without spaces or line breaks.

  • @Guilhermebernal that regular expression, ^<#|#>$|#><#^, 'doesn’t work' in both examples: a) it only allows removing the internal "" (voids) of the algorithms, leaving the side "" in the array end (example: ,texto.delimitado.1,texto.delimitado.2,). b) assuming that there is not only text delimited by "<#" and "#>", this text would be in the array final (first example). But the algorithm I added in the second example, for such a case, already corrects. loop (foror while in both cases). You can explain better?

  • In fact, I didn’t take into account the items at the beginning and end of the array, you’re right. I also assumed that there would be no items outside the delimiters.

2

Another way using filter and map:

var string = "bla bla <# texto.delimitado #> bla bla bla<# texto.delimitado2#>";
var resultado = string.split(/<#/).filter(function(v){
   return ~v.indexOf("#>");
}).map(function(v){
   return v.match(/(.*)#>/)[1].trim();
})
console.log(resultado);

  • 1

    I had difficulty following the expression in the filter. This ~v.indexOf, is to ensure that, if -1, deny all bits and become false?

  • 1

    Dude, I use it as a short form of indexOf() != -1

  • 2

    got it now. That looks like code-golf, pretty minimal. I’m not even a little used to it. Thanks for clarifying =D

0

In a simpler way

var regex = /\[(.*?)\]/g;
var texto = '[Palavra chave 1 = 296] Se refere ao item do produto para direcionamento. [Palavra chave 2 = 1234]'
alert(texto.match(regex));

Browser other questions tagged

You are not signed in. Login or sign up in order to post.