Function `Broom::Tidy` does not produce the output for some functions

Asked

Viewed 72 times

2

I have the data.frame:

library(tidyverse)
library(broom)

set.seed(1)

dataset<-as_tibble(matrix(runif(6*30,20,100),ncol=6))
cluster<-kmeans(dataset,3)
dataset$kmeans<-as.factor(cluster[['cluster']])

And I do this analysis:

res1<-dataset%>%
  group_by(kmeans)%>%
  do(reg=
   lm(V1~V2+V3+V4+V5+V6,data=.))%>%
   tidy(.,reg)

And everything works out. But when I try to insert the Shapiro-Wilk, the result is not produced by the function broom::tidy:

shap<-function(x){
  lapply(x,shapiro.test)
} # função criada para aplicar o teste nas colunas de um data.frame

res2<-dataset%>%
  group_by(kmeans)%>%
  do(shapiro=
         shap(.[c(1:6)]))%>%
  tidy(.,shapiro)

Error: No Tidy method recognized for this list.

What’s the matter?

2 answers

3


The function tidy package broom is a generic function that implements a method for each type of test/model. Therefore, it expects a input with class htest to make tidy a Shapiro test. When you created the result with its function, you deleted the result class.

One way to solve this is to leave the test results inside a column-list and then apply the functions for each test.

dataset %>% 
  group_by(kmeans) %>% 
  nest() %>% 
  mutate(shap = map(data, ~map(.x, shapiro.test)),
         tidy = map(shap, ~map_df(.x, tidy))) %>% 
  unnest(tidy)

# A tibble: 18 x 4
   kmeans statistic p.value method                     
   <fct>      <dbl>   <dbl> <chr>                      
 1 3          0.956  0.731  Shapiro-Wilk normality test
 2 3          0.943  0.534  Shapiro-Wilk normality test
 3 3          0.807  0.0113 Shapiro-Wilk normality test
 4 3          0.931  0.389  Shapiro-Wilk normality test
 5 3          0.964  0.844  Shapiro-Wilk normality test
 6 3          0.945  0.561  Shapiro-Wilk normality test
 7 2          0.903  0.198  Shapiro-Wilk normality test
 8 2          0.905  0.211  Shapiro-Wilk normality test
 9 2          0.838  0.0298 Shapiro-Wilk normality test
10 2          0.954  0.689  Shapiro-Wilk normality test
11 2          0.923  0.341  Shapiro-Wilk normality test
12 2          0.951  0.653  Shapiro-Wilk normality test
13 1          0.777  0.0240 Shapiro-Wilk normality test
14 1          0.888  0.264  Shapiro-Wilk normality test
15 1          0.942  0.659  Shapiro-Wilk normality test
16 1          0.868  0.178  Shapiro-Wilk normality test
17 1          0.914  0.424  Shapiro-Wilk normality test
18 1          0.854  0.135  Shapiro-Wilk normality test

The question ended up getting a little more complex because it required separating into groups and applying the test for each variable given the group. That’s why we need to nest() and then unnest()

1

The problem is that shap returns a list through lapply. "Disconnect" the output from shap:

shap <- function(x){
  lapply(x,shapiro.test) %>% unlist
  }

Producing:

res2<-dataset%>%
  group_by(kmeans)%>%
  do(shapiro=
       shap(.[c(1:6)]))%>%
  tidy(.,shapiro)

> res2
# A tibble: 72 x 3
# Groups:   kmeans [3]
   kmeans names          x                          
   <fct>  <chr>          <chr>                      
 1 1      V1.statistic.W 0.776846950517062          
 2 1      V1.p.value     0.0239963439526711         
 3 1      V1.method      Shapiro-Wilk normality test
 4 1      V1.data.name   X[[i]]                     
 5 1      V2.statistic.W 0.887908776882885          
 6 1      V2.p.value     0.263931722353138          
 7 1      V2.method      Shapiro-Wilk normality test
 8 1      V2.data.name   X[[i]]                     
 9 1      V3.statistic.W 0.942285213321747          
10 1      V3.p.value     0.659382055356446          
# ... with 62 more rows

All code:

library(tidyverse)
library(broom)

set.seed(1)

dataset<-as_tibble(matrix(runif(6*30,20,100),ncol=6))
cluster<-kmeans(dataset,3)
dataset$kmeans<-as.factor(cluster[['cluster']])

shap <- function(x){
  lapply(x,shapiro.test) %>% unlist
  }

res2<-dataset%>%
  group_by(kmeans)%>%
  do(shapiro=
       shap(.[c(1:6)]))%>%
  tidy(.,shapiro)
  • There is no way to simplify output using broom::tidy? I say this because I have to use various functions to tidy up the data. In regression analysis, they come out perfect.

  • What would simplify the output in this case?

  • For example, the name Shapiro-Wilk normality test adjoins the variables, which automatically transforms them into character. In regression analysis, the data comes out clean.

  • 1

    If you do not want "method" and "data.name" not to return in the output of shap(), then you can remove them inside shap(): shap <- function(x){&#xA; foo <- lapply(x,shapiro.test) %>% &#xA; unlist&#xA; &#xA; foo <- foo[-grep(x = names(foo), pattern = "method|data\\.name")]&#xA; &#xA; return(foo)&#xA;}. In res2, the type of vector x will still be character but just turn it into numerical with mutate().

Browser other questions tagged

You are not signed in. Login or sign up in order to post.