Separate text by space except inside quotation marks

Asked

Viewed 409 times

6

I’m trying to use a regex to separate the texts separated by space, except those inside quotes, for example:

Entrada: texto1 texto2 "texto3 texto4" texto5
Saida: Array("texto1", "texto2", "texto3 texto4", "texto5" );


Entrada: "texto0 texto1 texto2" texto3 "texto4"
Saida: Array("texto0 texto1 texto2", "texto3", "texto4" );


Entrada: "texto0 texto1" texto2
Saida: Array("texto0 texto1", "texto2");

Entrada: texto0 texto1 "texto2 texto3"
Saida: Array("texto0", "texto1", "texto2 texto3");

4 answers

17


You can use this regex:

/".*?"|\w+/g

Explanation:

".*?" → seleciona o que estiver entre aspas duplas (inclusive as aspas)
|     → "ou"
\w+   → seleciona caractere alfanumérico (inclusive o underscore)
        e "soma" com os "próximos"
/g    → flag que seleciona todas as ocorrências

As quotes are also selected, use .map() with replace to exclude remaining double quotes:

entrada.match(/".*?"|\w+/g).map(function(e){ return e.replace(/"/g,''); });

Examples:

entrada1 = 'texto1 texto2 "texto3 texto4" texto5';
entrada2 = '"texto0 texto1 texto2" texto3 "texto4"';
entrada3 = '"texto0 texto1" texto2';
entrada4 = 'texto0 texto1 "texto2 texto3"';

saida1 = entrada1.match(/".*?"|\w+/g).map(function(e){ return e.replace(/"/g,''); });
saida2 = entrada2.match(/".*?"|\w+/g).map(function(e){ return e.replace(/"/g,''); });
saida3 = entrada3.match(/".*?"|\w+/g).map(function(e){ return e.replace(/"/g,''); });
saida4 = entrada4.match(/".*?"|\w+/g).map(function(e){ return e.replace(/"/g,''); });

console.log(saida1);
console.log(saida2);
console.log(saida3);
console.log(saida4);

  • 5

    Not belittling the other answers, but I think that every response involving regex should be like this, explaining in detail how it works, it helps the most laity (like myself) to understand what is happening, not just copy and use.

  • 1

    Every time I get lost in these parameters.

  • 1

    For those who like to view blocks and diagrams (Usually people connected to programming feel ease), a site recommended for better visualization of Regex is the Debuggex. In which an example can be seen in this link.

  • 1

    @danieltakeshi I don’t know why but I like this one better: https://regexr.com/3ls5g rs

  • I also like this and the Regex101, but I think for those who are learning, the use of both is very valid. I used both when I was learning. Because the formatting colors and explanation of Regex101/Regexr are complemented with the visualization of the Debuggex

1

Try this, using a head look.

(?<=")[\w\s]+(?=")\b|\w+

https://regex101.com/r/Ccy9h2/2

Example of the js code executed by the link above:

    const regex = /(?<=")[\w\s]+(?=")\b|\w+/g;
const str = `"texto0 texto1 texto2" texto3 "texto4"

texto1 texto2 "texto3 texto4" texto5

"texto0 texto1 texto2" texto3 "texto4"

"texto0 texto1" texto2

texto0 texto1 "texto2 texto3"`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }

    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

0

I believe that can also solve the problem:

((\w+)|("[\w\s]+"))

-1

The REGEX you are looking for is this:

[^\s"']+|"([^"])"|'([^'])'

Browser other questions tagged

You are not signed in. Login or sign up in order to post.