How to use filter_functions?

Asked

Viewed 243 times

1

I try to use the functions filter_ (all, at, if), but unsuccessfully, mainly for strings. Consider the data set below:

set.seed(1234)

data_1 <- data.frame(
  a = c(paste('group', 1:6, sep = '_')), 
  b = c(paste('new', 1:6, sep = '_')), 
  d = c(rnorm(6, 10, 1))
)

Questions:

  • How to filter, at once, everything that contains the particle 1? (filter_all)

  • How to filter, at once, all that contains 1 and 3 in the variables a and b? (filter_at)

  • How to filter, at once, all that contains 1 and 3 in the variables a and b and all that is greater (>) that 10 in the variable d? (filter_at)

  • How to filter everything that is character, if it contains the particles 1 and 3? (filter_if)

Little sketch of what I tried:

library(dplyr)

filter_at(data_1, c('a', 'b'), any_vars('1'))

Error: No tidyselect variables Were Registered

I tried to filter the variables a and b, but it didn’t work out.

I never used the function filter with these suffixes, so doubt.

1 answer

3


First of all I will redo the data with set.seed to make results reproducible and with the argument stringsAsFactors = FALSE, to answer the last question.

set.seed(1234)

data_1 <- data.frame(
  a = c(paste('group', 1:6, sep = '_')), 
  b = c(paste('new', 1:6, sep = '_')), 
  d = c(rnorm(6, 10, 1)),
  stringsAsFactors = FALSE
)

On the issues, I will also make a small change to the way you have tried to solve the problems, I will use the Pipes %>%.

Common to all problems will be the use of grepl, once the columns a and b are class "character".

1.
Apparently easier. But it is not completely clear if you only want the lines where it occurs '1' in all they or in some theirs.

library(tidyverse)

data_1 %>%
   filter_all(all_vars(grepl('1', .)))
#        a     b        d
#1 group_1 new_1 8.792934

data_1 %>%
   filter_all(any_vars(grepl('1', .)))
#        a     b         d
#1 group_1 new_1  8.792934
#2 group_2 new_2 10.277429
#3 group_3 new_3 11.084441
#4 group_5 new_5 10.429125
#5 group_6 new_6 10.506056

2.
This question is simpler. It is solved with grepl applied to the pronoun '.'.

data_1 %>%
  filter_at(vars(a, b), any_vars(grepl('1|3', .)))
#        a     b         d
#1 group_1 new_1  8.792934
#2 group_3 new_3 11.084441

3.
Now it will be a composite logical condition.

data_1 %>%
  filter_at(vars(a, b, d), 
            all_vars(grepl('1|3', a) & grepl('1|3', b) & d > 10))
#        a     b        d
#1 group_3 new_3 11.08444

4.
Finally the filter_if. Here too the problem of being able to be all_vars or any_vars. By chance the results are the same.

data_1 %>%
   filter_if(~ is.character(.), all_vars(grepl('1', .)))
#        a     b        d
#1 group_1 new_1 8.792934

data_1 %>%
   filter_if(~ is.character(.), any_vars(grepl('1', .)))
#        a     b        d
#1 group_1 new_1 8.792934
  • Great. Thank you, Rui.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.