What are columns-lists of a data.frame?

Question

What are columns-lists of a data.frame?

Asked 6 years, 6 months ago

Viewed 85 times

5

The tidyverse stimulates the use of columns-list in data frames.. But, after all,

what are columns-list?
on what occasions they are commonly used?
they can be created with the r-base or just as tibbles?

For example,

data.frame(idade = 1:5, nome = letters[1:5], lista = lapply(1:5, rnorm))

Error in (Function (..., Row.Names = NULL, check.Rows = FALSE, check.Names = TRUE, :

Arguments imply differing number of Rows: 1, 2, 3, 4, 5

tibble::tibble(idade = 1:5, nome = letters[1:5], lista = lapply(1:5, rnorm))
# A tibble: 5 x 3
  idade nome  lista    
  <int> <chr> <list>   

1     1 a     <dbl [1]>
2     2 b     <dbl [2]>
3     3 c     <dbl [3]>
4     4 d     <dbl [4]>
5     5 e     <dbl [5]>

1 answer

Browser other questions tagged r list tidyverse

You are not signed in. Login or sign up in order to post.

by Daniel Falbel • **12,504** points · Answer 1 · 2018-12-26T14:02:00+00:00

List columns or list-Columns are a data structure that can be useful at various times when working with tidyverse. They are mainly used as intermediate structures.

They can be used in R-base but you will have to use the function I to prevent the base from releasing an error. Example:

data.frame(idade = 1:5, nome = letters[1:5], lista = I(lapply(1:5, rnorm)))

  idade nome        lista
1     1    a 0.178046....
2     2    b 0.407768....
3     3    c -0.84749....
4     4    d -0.44864....
5     5    e 1.229863....

An example that illustrates well the use of list-columns is when we are using vector functions that return more than one value within a mutate. For example:

df <- tribble(
  ~x1,
  "a,b,c", 
  "d,e,f,g"
) 

df %>% 
  mutate(x2 = stringr::str_split(x1, ","))
#> # A tibble: 2 x 2
#>   x1      x2       
#>   <chr>   <list>   
#> 1 a,b,c   <chr [3]>
#> 2 d,e,f,g <chr [4]>

Next, it is common to simplify data.frame using the function unnest of tidyr:

df %>% 
  mutate(x2 = stringr::str_split(x1, ",")) %>% 
  unnest()
#> # A tibble: 7 x 2
#>   x1      x2   
#>   <chr>   <chr>
#> 1 a,b,c   a    
#> 2 a,b,c   b    
#> 3 a,b,c   c    
#> 4 d,e,f,g d    
#> 5 d,e,f,g e    
#> 6 d,e,f,g f    
#> # ... with 1 more row

There are many other interesting use cases. Another example I like is the one created by the package rsample:

library(tidyverse)
library(rsample)

vfold_cv(mtcars, v = 5) %>% 
  mutate(
    modelos = map(splits, ~lm(mpg ~ ., data = analysis(.x))),
    mse = map2_dbl(modelos, splits, ~mean((assessment(.y)$mpg - predict(.x, assessment(.y)))^2))
    )

#  5-fold cross-validation 
# A tibble: 5 x 4
  splits         id    modelos    mse
* <list>         <chr> <list>   <dbl>
1 <split [25/7]> Fold1 <S3: lm> 40.4 
2 <split [25/7]> Fold2 <S3: lm>  5.99
3 <split [26/6]> Fold3 <S3: lm>  9.11
4 <split [26/6]> Fold4 <S3: lm> 11.6 
5 <split [26/6]> Fold5 <S3: lm> 21.3

In the example above we set a model for each fold of cross-validation and then calculate the mean quadratic error for the observations left out at each fold.