How to remove a word from a string without changing larger words that contain it

Question

Asked 9 years, 5 months ago

Viewed 1,310 times

8

I would like to remove a word from a string in R. I was doing it as follows:

> s <- "ele esta bem mas tambem esta triste"
> stringr::str_replace_all(s, "tambem", "")
[1] "ele esta bem mas  esta triste"

So far, so good. The problem is if I just wanted to take the word "well" out of the text.

> stringr::str_replace_all(s, "bem", "")
[1] "ele esta mas tam esta triste"

In this case the word "too" gets cut off, and I didn’t want it to happen.

I thought I’d search the word between spaces:

> stringr::str_replace_all(s, " bem ", " ")
[1] "ele esta mas tambem esta triste"

But then, if I looked for the word "he," it wouldn’t be removed. Is there any way to remove all words without thinking about all cases?

2 answers

8

I don’t know about R, but a little regex, in this specific case you can use the exact anchor(\b) to marry exactly the word bem

stringr::str_replace_all(s, "\\bbem\\b", " ")

Related:

What good is a b oundary in a regular expression?

I think the expression "\\b\\s?bem\\s?\\b" would be better, because this way stay three spaces in a row in the final string, and changing " " for "" two. With this expression you remove the spaces before and after the word (if any) and exchange everything for a single one. That still wouldn’t solve the word at the end of the punctuated sentence (bem.), but in that case perhaps the ideal would be a second expression to remove traces.

– Molx

2016/02/24 at 22:58

Browser other questions tagged r regex

You are not signed in. Login or sign up in order to post.

by Jean • **1,187** points · Answer 1 · 2016-03-20T20:53:06+00:00

Using the expression suggested by @Molx "\\b\\s?bem\\s?\\b", but with the function gsub()

 gsub("\\b\\s?bem\\s?\\b","",s)