Searching strings in R language

Asked

Viewed 690 times

2

I need to search a df column where the text may not be exact. Example: df$titulo=="SE" & df$titulo=="projeto de pesquisa" can’t find anything. I’ve tried using like instead of =, I’ve tried using df$titulo == "%projeto de pesquisa%", but it doesn’t work. Ah! The function subset doesn’t bring anything either.

Just so you understand better what I am saying, in sql there is a like command that searches part of the string instead of the =.

  • Have you tried agrep("projeto de pesquisa", df$titulo)? See the instruction help page agrep, for inaccurate searches. If you want to search for alternative strings as the question suggests, give an example of the table, edit the question with the output of dput(head(df, 30)).

  • @Rui Barradas the function you mentioned returned me the number of lines where the string is, but I want the records relating to this string. Example: id_project, title, advisor, year, beginning, end, ... When the function finds the string, bring the data from these columns relative to these rows.

  • @Rui Barradas got it. I put the value=TRUE parameter in the function.

2 answers

2

For a simple match, you can use the function str_subset package stringr:

library(stringr)
texto <- c("abc projeto de pesquisa cdf", "123 projeto de pesquisa", "progeto de pesquisa")
str_subset(texto, pattern = regex("projeto de pesquisa", ignore_case = T))

Note that the third case, however, that contains a Portuguese error is not detected. agrep that you are using is more liberal in that sense because it will make a match approximate, using the distance of Levenshtein and can capture the third case, if that’s what you want.

2


I got it this way:

x <- agrep(pattern="projeto de pesquisa", df$titulo, ignore.case = TRUE, 
  value = TRUE, fixed = TRUE)  

ignore.case uppercase ignore and value returns the value of the corresponding string.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.