Filtering a data.frame based on multiple variables

Asked

Viewed 79 times

2

I have a data.frame with 9 variables from X1 to X9. The values of these variables are double. I need to filter the lines of this data.frame that contain (Xi,Xj,Xk) == (5.2,6.3,7.1) for example.

I was able to solve individually with the code below. But I could not write something that would allow me to perform this task by changing only the values sought. How these values can occur in any combination of variables from X1 to X9 has gotten very complicated.

vars <- c("X1","X5","X9")
cond <- c(5.2,6.3,7.1)
lfbase %>%
  filter(
    .data[[vars[[1]]]] == cond[[1]],
    .data[[vars[[2]]]] == cond[[2]],
    .data[[vars[[3]]]] == cond[[3]]
  )
  • 2

    The example is not reproducible. Can you please, edit the question with the departure of dput(lfbase) or, if the base is too large, dput(head(lfbase, 20))? And in what format are the values searched for? In a list, in a data frame? In each line the variables of X1 to X9 must have the three values or only a few?

1 answer

2


After clarification of the question in the comments:

I need to filter the lines where these measurements occur together in any of the X1 variables.... X9

The solution remains based on the command filter_at of dplyr with the argument any_vars(). But instead of the operator | OR, you apply the filters sequentially:

library(dplyr)

df1 <- lfbase %>% filter_at(vars(starts_with("X")), any_vars(. == 5.2)) %>%
                  filter_at(vars(starts_with("X")), any_vars(. == 6.3)) %>% 
                  filter_at(vars(starts_with("X")), any_vars(. == 7.1)) 

By your question, it seems that these are all variables of your dataframe ("I have a data.frame with 9 variables from X1 to X9"). If this is the case, you can simplify the above code with the filter_all:

df1 <- lfbase %>% filter_all(any_vars(. == 5.2))
              %>% filter_all(any_vars(. == 6.3))
              %>% filter_all(any_vars(. == 7.1))

If any lines containing any of the above values were to be filtered, it could be used:

library(dplyr)

df1 <- lfbase %>% filter_at(vars(starts_with("X")), any_vars(. == 5.2 | . == 6.3 | . == 7.1))

or

df1 <- lfbase %>% filter_all(any_vars(. == 5.2 | . == 6.3 | . == 7.1))
  • Thanks for the contributions. The solutions worked for the "OR" condition, that is, using "|". When I change for "&" nothing returns.... <0 lines> (or Row.Names of length 0). "OR" does not solve my problem. I need to filter the lines where these measurements occur together, that is, find the lines where these measurements occur together in any of the variables X1.... X9.

  • Okay, I edited the answer to solve your problem.

  • thanks Socrates. I’ll try.

  • Excellent. Thank you very much. It worked.

  • I’m glad it worked out. If this answer solved your problem and there is no doubt left, please mark it as correct/accept by clicking on the " " that is next to it, which also marks your question as solved: https://pt.meta.stackoverflow.com/questions/1078/como-e-por-que-aceitar-uma-answer

Browser other questions tagged

You are not signed in. Login or sign up in order to post.