str_replace_all - how to find words by the first 3 letters of a string?

Asked

Viewed 64 times

1

I have the following structure:

library(stringr)

filtro_palavras <- structure(list(palavras = c("cultivo", "produtos", "atacadista", 
"papel", "madeira", "água", "agrícola", "vestuário", "calçados", 
"fumo", "agricultura", "bebidas", "agropecuária", "florestas", 
"abate")), row.names = c(NA, 15L), class = "data.frame")

I would like to replace the words that start with "agr" (in this case: agricultural, cagropecuária) by agr.

For that, I’m trying the following:

filtro_palavras$palavras <- str_replace_all(filtro_palavras$palavras, "^agr", "agr")

But no change happens.

2 answers

2


It is not necessary to load an external package to make this replacement, the base R has functions sub and gsub that solve the problem.

sub("^agr.*\\b", "agr.", filtro_palavras$palavras)
# [1] "cultivo"    "produtos"   "atacadista" "papel"     
# [5] "madeira"    "água"       "agr."       "vestuário" 
# [9] "calçados"   "fumo"       "agr."       "bebidas"   
#[13] "agr."       "florestas"  "abate" 

Explanation of the regex

  1. "^" String start.
  2. "^agr" the string starts with "agr".
  3. "^agr.*" a string começada por "agr"` is followed by zero or more characters.
  4. "^agr.*\\b" the string of the above point is bounded by word border. Instead of \\b which can be used on both sides of the word, in this case can also be \\> (only at the end of the word).
  • Interesting! Can explain better what it would be fronteira de palavra and the \\b? Thank you!

  • 2

    @Rxt There is an explanation about the \b here

  • @hkotsubo Or here, in the paragraph before "A regular Expression may be Followed by one of several Repetition quantifiers".

0

I understood what was wrong.

The symbol ^ indicates only that the word should begin with these letters.

The symbol is missing . indicating any character and symbol + indicating repeats of the last character (in this case, .).

So to get the first three letters it was like this:

filtro_palavras$palavras <- str_replace_all(filtro_palavras$palavras, '^agr.+', 'agr')

Translation: starts with "agr", then can come any character, infinite times.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.