Regex to catch single word

Question

Regex to catch single word

Asked 4 years, 4 months ago

Viewed 87 times

3

I’m trying to create a regex that takes an exact value within a text, example:

let texto = "troquei meu carro por um outro carro uma carroça"

console.log(texto.replace(/carro/g, 'barco'))

The problem is that the output is as follows:

troquei meu barco por um outro barco e por uma barcoça

How could I do to give replace or match in a single, literal string? For the same case happens if I type dez and dezembro in the same sentence, and in case I wanted to replace only the dez he replaced dezembro also.

I’m new to regex, so I’m having a hard time.

Thank you in advance if you can help me.

\bcarro\b Character b delimits a word

– Lucas

2021/03/23 at 06:23

2 answers

3

You used the flag g (this letter "g" after the second bar), which causes all occurrences to be replaced. Then the first thing is to remove it. But that doesn’t solve all the problems, for example:

let texto = "troquei minha carroça por um carro";
// não funciona, troca "carroça" por "barcoça"
console.log(texto.replace(/carro/, 'barco'));

When the regex doesn’t have the flag g, the method String.prototype.replace replaces only the first occurrence found. In the above case, "wagon" will be replaced by "barge", which does not seem to be what you want.

If the idea is to take only "car" when it is exactly this word, and not part of a word, you up could use the shortcut \b, indicating a "boundary between words" (word Boundary - see here for more details), it just doesn’t always work:

let texto = "troquei meus carros por um único carro";
// funciona, troca "carro", mas não "carros"
console.log(texto.replace(/\bcarro\b/, 'barco')); // troquei meus carros por um único barco

texto = "troquei minha carroça por um carro";
// não funciona, porque \b não reconhece o "ç"
console.log(texto.replace(/\bcarro\b/, 'barco')); // troquei minha barcoça por um carro

In Javascript the \b does not recognise accents and ç, then "cart" continues to be replaced by "barcoça", because the "ç" is not recognized as alphanumeric.

In this case, an alternative is to indicate that before and after the word has no letter, using lookarounds:

let texto = "troquei minha carroça por um carro";
console.log(texto.replace(/(?<![a-záéíóúãõâêôûç])carro(?![a-záéíóúãõâêôûç])/i, 'barco')); // troquei minha carroça por um barco

I mean, I wear one lookbehind negative (indicated by (?<! )) which says that before the word "car" there can be no letters (and there you put all you need), and the Lookahead negative (indicated by (?! )) does the same for after the word.

I also used the flag i (case insensitive) to consider uppercase and lowercase (so I only need to put "á", "ç", etc in the regex). Only this will also replace "Car" with "boat", so see if that’s what you need (if it’s not, put also the uppercase letters on the list: "A", "Ç", etc and remove the flag i).

Another alternative is to use Unicode Properties (recalling that to enable this resource, needs to have the flag u):

let texto = "troquei minha carroça por um carro";
console.log(texto.replace(/(?<!\p{L})carro(?!\p{L})/u, 'barco')); // troquei minha carroça por um barco

In the case, \p{L} is any letter defined by Unicode (including accents and "ç"), then lookarounds check if there is a letter before and after the word.

A simpler option, if words are separated only by spaces, would separate them using split, replace what you need and then add again:

let texto = "troquei minha carroça por um carro";
let partes = texto.split(' '); // separar por espaços
for (let i = 0; i < partes.length; i++) {
    if (partes[i] === 'carro') {
        partes[i] = 'barco';
        break; // se já encontrei, não preciso continuar procurando
    }
}

// juntar com espaços
console.log(partes.join(' ')); // troquei minha carroça por um barco

But of course, if you have commas, punctuation and others, then it doesn’t work anymore, and you would end up going back to the same problem: detecting which characters are parts of a word, including accents and "ç," and maybe it’s not even worth breaking, because it would have to use some complex regex that excludes accented letters, etc.

Or, you can choose a simplified list of separators (e.g., space, period, exclamation, question mark and comma):

let texto = "troquei minha carroça por um carro, uma moto e um caminhão";
let partes = texto.split(/([ ,.!?]+)/);
for (let i = 0; i < partes.length; i++) {
    if (partes[i] === 'carro') {
        partes[i] = 'barco';
        break; // se já encontrei, não preciso continuar procurando
    }
}

console.log(partes.join('')); // troquei minha carroça por um barco, uma moto e um caminhão

In the regex used in split I put everything in parentheses, forming a catch group. With this, separators are also placed in the array partes, so I can put it all together in the end. But this is true if you know all the characters that are between the words, then you have to include them all in the split.

Browser other questions tagged javascript node.js regex

You are not signed in. Login or sign up in order to post.

by Leo Letto • **3,303** points · Answer 1 · 2021-03-23T12:13:08+00:00

When using the /g as a flag, you are specifying to search across the "global" string and replace, you could simply not put g as a flag, this forces regex to return only the first found result.

let texto = "troquei meu carro por um outro carro uma carroça"
console.log(texto.replace(/carro/, 'barco'))

This site is very interesting to test regex and understand the meaning of home part of the written expression.