Expanding the answer given by @Rafaelcunha. If in the case you want to filter one data.frame
based on another data.frame
, you can do it this way.
set.seed(1)
d1 <- data.frame(col1 = sample(LETTERS, 20, T), col2 = runif(20), stringsAsFactors = F)
d2 <- data.frame(col1 = sample(LETTERS, 20, T), col3 = sample(c(1:10), 20, T), stringsAsFactors = F)
Base solution
Filtre d1
based on the vector resulting from your filter in d2
:
# base
res_base <- d1[d1$col1 %in% unique(d2[d2$col3 < 5, "col1"]), ]
Base solution with junction (merge)
If your database is large, you might want to use merge
to filter d1
using the concept of Inner joins (i.e. each line in d1
and d2
need to have in common values given through a key (key)):
res_base_m <- merge(d1, d2[d2$col3 < 5, ][1], by = "col1")
Other solutions
dplyr
dplyr
is a data manipulation package that obeys the syntax pattern of tidyverse
. The first way is simply to filter the values found in d2
and filtering them into d1
through the verb filter
. dplyr
is quite intuitive in this regard. We can also use inner_join
of the same package and is quite recommended if you are doing this with a large database.
# carregue pacote
library(dplyr)
res_dplyr <- filter(d1, col1 %in% pull(filter(d2, col3 < 5), col1))
res_dplyr_m <- inner_join(d1, select(filter(d2, col3 < 5), col1), by = "col1")
data table.
The package data.table
is known for its succinct syntax and computing efficiency. It is quite fast and obeys the syntax DT[i, j, by]
, where i
are lines, j
columns and by
is grouping. In the first example, we filter d2
for the values desired in col3
and then use the resulting vector col1
of d2
to filter in the i
of d1
. The second example is junction held by data.table
: X[Y, on = _key_, nomatch = 0L]
.
# carregue o pacote
library(data.table)
setDT(d1)
setDT(d2)
res_dt <- d1[col1 %in% d2[col3 < 5, col1], ]
res_dt_m <- d1[d2[col3 < 5, .(col1)], on = "col1", nomatch = 0L]
I don’t quite understand. You want to filter one
data.frame
based on the values of anotherdata.frame
, that’s it?– JdeMello
Possible duplicate of Search for values in one data.frame and add to another (R)
– neves