5
Suppose I have the following database
set.seed(100)
base <- expand.grid(grupo = c("a", "b", "c", "d"), score = runif(100))
And that I want to select the lines with smaller score
depending on the group according to the table below:
qtds <- data.frame(grupo = levels(base$grupo), qtd = c(1, 2, 3, 4))
qtds
grupo qtd
1 a 1
2 b 2
3 c 3
4 d 4
That is, I wish to select the line with smaller score
of the group a
, the two lines with smaller score
of the group b
and so on...
At the moment, I’m doing so:
novaBase <- data.frame()
for(i in levels(base$grupo)){
novaBase <- rbind(novaBase,
base %>%
filter(grupo == i) %>%
filter(row_number(score) <= qtds$qtd[qtds$grupo == i])
)
}
grupo score
1 a 0.0003950703
2 b 0.0003950703
3 b 0.0039051792
4 c 0.0003950703
5 c 0.0221628349
6 c 0.0039051792
7 d 0.0269371939
8 d 0.0003950703
9 d 0.0221628349
10 d 0.0039051792
This way it works, but seems to me very inefficient, besides the code is difficult to understand. Someone knows a better way?
Cool!! I hadn’t thought of that!
– Daniel Falbel