How to run str_detect (stringr) for more than one variable and at once?

Asked

Viewed 369 times

3

I want to filter my database based on two variables: via and city. This filter, however, is made by means of particles of cases present in these two variables. For example, I want to analyze who took the first route (1via) and who lives in Santa Monica (Santa Monica).

The particles would be: 1v and ca of the variables via and city, respectively.

I tried to do this:

library(dplyr)
library(stringr)
library(magrittr)

df1<-data%>%
    filter(stringr::str_detect(via,'1v')%>%
               filter(stringr::str_detect(city,'ca')))

but it didn’t work out.

Actually, I tried several combinations, but I couldn’t get to the expected result.

dput for aid in response:

data=structure(list(bin = c(0, 0, 0, 0, 1, 1, 0, 0, 1, 1), group1 = c(1, 
2, 2, 1, 2, 1, 2, 1, 2, 1), missing = c(NA, 4, 5, NA, 7, 6, NA, 
NA, 4, 5), score1 = c(3, 2, 4, 4, 7, 6, 4, 3, 6, 7), valor = c(100, 
200, 321, 34, 3424, 2344, 4232, 43, 22, 22), gender = c("M", 
"M", "M", "M", "M", "F", "F", "F", "F", "F"), via = structure(c(2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("1via", "2via"
), class = "factor"), income = c(1605.52545357496, 1957.10460608825, 
3463.77286640927, 2241.49697413668, 2575.95523679629, 3004.28174249828, 
3458.30937661231, 1786.68619645759, 2065.093211364, 1561.55416276306
), city = c("San Francisco", "Santa Monica", "Santa Monica", 
"Santa Monica", "Santa Monica", "Hollywood", "Hollywood", "Hollywood", 
"Hollywood", "Hollywood"), desbloq = structure(c(10553, 9537, 
10553, 10553, 9212, 10658, 10957, 11822, 11822, 10188), class = "Date"), 
trans = structure(c(10556, 9541, 10555, 10554, 9218, 10660, 
10958, 11823, 11826, 10190), class = "Date")), .Names = c("bin", 
"group1", "missing", "score1", "valor", "gender", "via", "income", 
"city", "desbloq", "trans"), row.names = c(NA, -10L), class = "data.frame")

1 answer

4


There is an error in the code. The first parenthesis is closing at the end of the second filter. I use asterisks to highlight this in the code below:

library(dplyr)
library(stringr)
library(magrittr)

df1<-data%>%
    filter*(*stringr::str_detect(via,'1v')%>%
               filter(stringr::str_detect(city,'ca'))*)*

The correct is

df1<-data%>%
    filter(stringr::str_detect(via,'1v'))%>%
    filter(stringr::str_detect(city,'ca'))
df1
  bin group1 missing score1 valor gender  via   income         city    desbloq      trans
1   0      2       4      2   200      M 1via 1957.105 Santa Monica 1996-02-11 1996-02-15
2   0      1      NA      4    34      M 1via 2241.497 Santa Monica 1998-11-23 1998-11-24

Also, it is redundant to load a package with the command library(stringr) and call their function using stringr::str_detect. If the package has been loaded, it is possible to make the code cleaner by calling the function directly by its name:

df1<-data%>%
    filter(str_detect(via,'1v'))%>%
    filter(str_detect(city,'ca'))
df1
  bin group1 missing score1 valor gender  via   income         city    desbloq      trans
1   0      2       4      2   200      M 1via 1957.105 Santa Monica 1996-02-11 1996-02-15
2   0      1      NA      4    34      M 1via 2241.497 Santa Monica 1998-11-23 1998-11-24

Browser other questions tagged

You are not signed in. Login or sign up in order to post.