Manipulation of columns-list

Asked

Viewed 104 times

4

I have a tibble so-called my, which contains the column-list data

library(tidyverse)

dataset<-data.frame(matrix(rnorm(6*30,1000,100),ncol=6))
cluster<-kmeans(dataset,centers=3)
dataset$kmeans<-as.factor(cluster[['cluster']])

my<-dataset%>%
  group_by(kmeans)%>%
  nest()

rm(dataset,cluster)

I have the following doubts regarding the manipulation of colunas-lista:

  • how to apply the function nest and, at the same time, to maintain the group variables in the tibbles? Something similar to function base::split.

  • how do I place these columns-list in the globalenv()? Something similar to function base::list2env.

  • how to apply a mutate and keep the results the same tibble? Consider that each tibble has 6 columns. With a mutate_if(is.numeric,sum), would like these tibbles had 12 columns. I’ve tried some actions, but they return a new column-list, which I don’t want.

All these procedures I can do with the current functions of r. But, I have interest in manipulating the columns-list to stay in the framework tidyverse.

  • The first one has already been answered in the example itself. No? With the group_by() + nest()...

1 answer

3


The manipulation of columns-list within the tidyverse occurs in the same way that this universe proposes to manipulate lists, that is, with . The difference is that this manipulation takes place within a data.frame and therefore uses the rules of tidyverse to manipulate them - staying for example within a mutate().

Given this general consideration, let’s go the questions.

Keep variables in the nest()

I see two possible interpretations for the question. In the first one, where the expected result is in the tibble which is the result of nest(), the answer is already in the question itself. In the second one, in which the variables are expected to be matched within each tibble nested, can be solved by adding the new variable in the tibble nestled with map().

my %>% 
  mutate(data2 = map2(data, kmeans, ~mutate(.x, var = .y)))
# A tibble: 3 x 3
  kmeans data              data2            
  <fct>  <list>            <list>           
1 1      <tibble [14 x 6]> <tibble [14 x 7]>
2 3      <tibble [10 x 6]> <tibble [10 x 7]>
3 2      <tibble [6 x 6]>  <tibble [6 x 7]> 

Bring the column-list to the .GlobalEnv

First of all, if you really intend to stay on frameword tidyverse, you should not do this. In this case the information should be kept on tibble. With this exception, the operation can be done in the same way as placed in this question, remembering that for such the list should be named. So we would have:

ls()
[1] "cluster" "dataset" "my" 
# Adiciona nomes aos elementos da lista
my$data <- set_names(my$data, paste0("tabela", seq_along(my$data)))
list2env(my$data, .GlobalEnv)
<environment: R_GlobalEnv>
ls()
[1] "cluster" "dataset" "my"      "tabela1" "tabela2" "tabela3"

Mutate

Finally, to apply a mutate() in a column-list it usually happens, but to apply the operation the date element of the column-list (which is what is desired in this case) it is necessary to include a map() within the mutate().

my %>% 
  mutate(soma = map(data, ~mutate_if(.x, is.numeric, sum)),
         final = map2(data, soma, bind_cols)) %>% 
  select(kmeans, final)
# A tibble: 3 x 2
  kmeans final             
  <fct>  <list>            
1 1      <tibble [14 x 12]>
2 3      <tibble [10 x 12]>
3 2      <tibble [6 x 12]> 

Note that it was enough to include your code as a formula within the map() for it to work. To produce the expected result in the question I joined the two data.frames in a single.

Not every column-list operation needs to result in another list column. To do this, just use some of the map_*().

my %>% 
  mutate(tamanho = map_dbl(data, nrow))
# A tibble: 3 x 3
  kmeans data              tamanho
  <fct>  <list>              <dbl>
1 1      <tibble [14 x 6]>      14
2 3      <tibble [10 x 6]>      10
3 2      <tibble [6 x 6]>        6

Browser other questions tagged

You are not signed in. Login or sign up in order to post.