Filter data frame according to indexes (lines) stored in a vector

Asked

Viewed 455 times

2

Hello, I have a data frame where I store tourist information. Example in the image:Dados do meu data frame

and I have vector like this: 1 1 3 4 11 12 13 14 16 29 30 41 6 7 8 9 10 5 15 17 27

this vector in the case is a dynamic vector;

What I’m trying to do is generate a new data.frame with the lines corresponding to the numbers of my vector.

In short, it would filter my main data.frame according to the vector I have with the lines q will be in the filtered data.frame.

what I meant would be: example: if my vector is [2 4 6] my second date;frame will be formed by lines 2 4 and 6 of the main data.frame (the print)

Any tips? Thanks in advance;

2 answers

5


If I understand correctly, you want to select the data frame lines with the information about the tourist spots from the dynamic vector, correct?

Be then df your data frame:

df <- data.frame(X = rnorm(100), Y = sample(LETTERS, 100, replace = T))

Your dynamic vector:

linhas <- sample(1:100, 15)

To select only the lines, you can do:

df[linhas, ]
  • what I meant would be: example: if my vector is [2 4 6] my second date;frame will be formed by lines 2 4 and 6 of the main data.frame (the print)

  • 1

    then the code I put in response does what you want

2

Expanding the answer given by @Rafaelcunha. If in the case you want to filter one data.frame based on another data.frame, you can do it this way.

set.seed(1)
d1 <- data.frame(col1 = sample(LETTERS, 20, T), col2 = runif(20), stringsAsFactors = F)
d2 <- data.frame(col1 = sample(LETTERS, 20, T), col3 = sample(c(1:10), 20, T), stringsAsFactors = F)

Base solution

Filtre d1 based on the vector resulting from your filter in d2:

# base
res_base <- d1[d1$col1 %in% unique(d2[d2$col3 < 5, "col1"]), ]

Base solution with junction (merge)

If your database is large, you might want to use merge to filter d1 using the concept of Inner joins (i.e. each line in d1 and d2 need to have in common values given through a key (key)):

res_base_m  <- merge(d1, d2[d2$col3 < 5, ][1], by = "col1")

Other solutions

dplyr

dplyr is a data manipulation package that obeys the syntax pattern of tidyverse. The first way is simply to filter the values found in d2 and filtering them into d1 through the verb filter. dplyr is quite intuitive in this regard. We can also use inner_join of the same package and is quite recommended if you are doing this with a large database.

# carregue pacote
library(dplyr)
res_dplyr <- filter(d1, col1 %in% pull(filter(d2, col3 < 5), col1))

res_dplyr_m <- inner_join(d1, select(filter(d2, col3 < 5), col1), by = "col1")

data table.

The package data.table is known for its succinct syntax and computing efficiency. It is quite fast and obeys the syntax DT[i, j, by], where i are lines, j columns and by is grouping. In the first example, we filter d2 for the values desired in col3 and then use the resulting vector col1 of d2 to filter in the i of d1. The second example is junction held by data.table: X[Y, on = _key_, nomatch = 0L].

# carregue o pacote
library(data.table)
setDT(d1)
setDT(d2)

res_dt <- d1[col1 %in% d2[col3 < 5, col1], ]

res_dt_m <- d1[d2[col3 < 5, .(col1)], on = "col1", nomatch = 0L]
  • what I meant would be: example: if my vector is [2 4 6] my second date;frame will be formed by lines 2 4 and 6 of the main data.frame (the print)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.