I find the following way more concise to do what you need:
library(purrr) # para a função map
library(tidyr) # para a função unnest
library(dplyr) # para a função as_data_frame
map(lista, ~map(.x, ~.x[1:10])) %>%
as_data_frame() %>%
unnest()
The result is this:
# A tibble: 30 × 2
num chr
<int> <chr>
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
7 7 g
8 8 h
9 9 i
10 10 j
# ... with 20 more rows
Another way, which also looks cool is:
lista %>%
as_data_frame() %>%
mutate(chr = map(chr, ~.x[1:10])) %>%
unnest()
list columns
, that is, columns of data.frames that are lists are being widely used and popularized by Hadley Wickham. See here on R for Data Science.
In the example with list columns
I only modified the Chr column, but you could modify all the columns using:
lista %>%
as_data_frame() %>%
mutate_all(funs(map(., ~.x[1:10]))) %>%
unnest()
Complementing the Tomás Benchmark
> lista <- list(
+ num = lapply(1:10, function(x) sample(1:100, 20)),
+ chr = lapply(1:10, function(x) sample(letters, 20))
+ )
> microbenchmark(
+ solucao_tomas = {as.data.frame(sapply(lapply(lista, pegar_elem, 1:10), unlist))},
+ solucao_daniel = {unnest(as_data_frame(map(lista, ~map(.x, ~.x[1:10]))))}
+ )
Unit: microseconds
expr min lq mean median uq max neval
solucao_tomas 419.026 439.375 466.7568 454.947 476.889 695.780 100
solucao_daniel 2456.108 2559.625 2745.8009 2680.130 2836.733 4466.647 100
> lista <- list(
+ num = lapply(1:1000, function(x) sample(1:100, 20)),
+ chr = lapply(1:1000, function(x) sample(letters, 20))
+ )
> microbenchmark(
+ solucao_tomas = {as.data.frame(sapply(lapply(lista, pegar_elem, 1:10), unlist))},
+ solucao_daniel = {unnest(as_data_frame(map(lista, ~map(.x, ~.x[1:10]))))}
+ )
Unit: milliseconds
expr min lq mean median uq max neval
solucao_tomas 13.559905 14.15854 14.64829 14.56517 14.83060 16.89264 100
solucao_daniel 9.871144 10.27053 11.07952 10.80652 11.29402 19.82793 100
> lista <- list(
+ num = lapply(1:10000, function(x) sample(1:100, 20)),
+ chr = lapply(1:10000, function(x) sample(letters, 20))
+ )
> microbenchmark(
+ solucao_tomas = {as.data.frame(sapply(lapply(lista, pegar_elem, 1:10), unlist))},
+ solucao_daniel = {unnest(as_data_frame(map(lista, ~map(.x, ~.x[1:10]))))}
+ )
Unit: milliseconds
expr min lq mean median uq max neval
solucao_tomas 156.63202 171.06855 195.3683 180.86325 227.1462 271.7314 100
solucao_daniel 80.93934 91.22597 100.5079 96.73947 104.7544 154.6254 100
That is, when the list is small Tomás' solution using for
is more efficient, however the difference there is in the microsecond house. (efficiency is not very important when the objects are small). When objects begin to grow, the solution using purrr
, dplyr
and tidyr
becomes more efficient. With lists of size 10,000 it becomes 2x faster. This solution is efficient when needed, that is, when the size of objects grows.
I do not understand why to answer your own question. It would not be more interesting to put the code of this link in your original post? Imagine if someone answers your question and this answer gets votes. It will stay ahead of your answer, decontextualizing your doubt. This is not interesting for those who come to do research in the OS in the future.
– Marcus Nunes
This is an option given by Stackoverflow. I understand your concern, but my answer is a possible answer to the question. If someone else has another solution they can post below.
– Tomás Barcellos