Searching strings in R language

Question

Searching strings in R language

Asked 7 years, 9 months ago

Viewed 690 times

2

I need to search a df column where the text may not be exact. Example: df$titulo=="SE" & df$titulo=="projeto de pesquisa" can’t find anything. I’ve tried using like instead of =, I’ve tried using df$titulo == "%projeto de pesquisa%", but it doesn’t work. Ah! The function subset doesn’t bring anything either.

Just so you understand better what I am saying, in sql there is a like command that searches part of the string instead of the =.

Have you tried agrep("projeto de pesquisa", df$titulo)? See the instruction help page agrep, for inaccurate searches. If you want to search for alternative strings as the question suggests, give an example of the table, edit the question with the output of dput(head(df, 30)).

– Rui Barradas

2017/11/05 at 08:23
@Rui Barradas the function you mentioned returned me the number of lines where the string is, but I want the records relating to this string. Example: id_project, title, advisor, year, beginning, end, ... When the function finds the string, bring the data from these columns relative to these rows.

– André Nascimento

2017/11/05 at 14:16
@Rui Barradas got it. I put the value=TRUE parameter in the function.

– André Nascimento

2017/11/05 at 14:28

2 answers

2

I got it this way:

x <- agrep(pattern="projeto de pesquisa", df$titulo, ignore.case = TRUE, 
  value = TRUE, fixed = TRUE)

ignore.case uppercase ignore and value returns the value of the corresponding string.

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by Carlos Cinelli • **16,826** points · Answer 1 · 2017-11-07T05:02:23+00:00

For a simple match, you can use the function str_subset package stringr:

library(stringr)
texto <- c("abc projeto de pesquisa cdf", "123 projeto de pesquisa", "progeto de pesquisa")
str_subset(texto, pattern = regex("projeto de pesquisa", ignore_case = T))

Note that the third case, however, that contains a Portuguese error is not detected. agrep that you are using is more liberal in that sense because it will make a match approximate, using the distance of Levenshtein and can capture the third case, if that’s what you want.