Function `Broom::Tidy` does not produce the output for some functions

Question

Function `Broom::Tidy` does not produce the output for some functions

Asked 6 years, 6 months ago

Viewed 72 times

2

I have the data.frame:

library(tidyverse)
library(broom)

set.seed(1)

dataset<-as_tibble(matrix(runif(6*30,20,100),ncol=6))
cluster<-kmeans(dataset,3)
dataset$kmeans<-as.factor(cluster[['cluster']])

And I do this analysis:

res1<-dataset%>%
  group_by(kmeans)%>%
  do(reg=
   lm(V1~V2+V3+V4+V5+V6,data=.))%>%
   tidy(.,reg)

And everything works out. But when I try to insert the Shapiro-Wilk, the result is not produced by the function broom::tidy:

shap<-function(x){
  lapply(x,shapiro.test)
} # função criada para aplicar o teste nas colunas de um data.frame

res2<-dataset%>%
  group_by(kmeans)%>%
  do(shapiro=
         shap(.[c(1:6)]))%>%
  tidy(.,shapiro)

Error: No Tidy method recognized for this list.

What’s the matter?

2 answers

3

The function tidy package broom is a generic function that implements a method for each type of test/model. Therefore, it expects a input with class htest to make tidy a Shapiro test. When you created the result with its function, you deleted the result class.

One way to solve this is to leave the test results inside a column-list and then apply the functions for each test.

dataset %>% 
  group_by(kmeans) %>% 
  nest() %>% 
  mutate(shap = map(data, ~map(.x, shapiro.test)),
         tidy = map(shap, ~map_df(.x, tidy))) %>% 
  unnest(tidy)

# A tibble: 18 x 4
   kmeans statistic p.value method                     
   <fct>      <dbl>   <dbl> <chr>                      
 1 3          0.956  0.731  Shapiro-Wilk normality test
 2 3          0.943  0.534  Shapiro-Wilk normality test
 3 3          0.807  0.0113 Shapiro-Wilk normality test
 4 3          0.931  0.389  Shapiro-Wilk normality test
 5 3          0.964  0.844  Shapiro-Wilk normality test
 6 3          0.945  0.561  Shapiro-Wilk normality test
 7 2          0.903  0.198  Shapiro-Wilk normality test
 8 2          0.905  0.211  Shapiro-Wilk normality test
 9 2          0.838  0.0298 Shapiro-Wilk normality test
10 2          0.954  0.689  Shapiro-Wilk normality test
11 2          0.923  0.341  Shapiro-Wilk normality test
12 2          0.951  0.653  Shapiro-Wilk normality test
13 1          0.777  0.0240 Shapiro-Wilk normality test
14 1          0.888  0.264  Shapiro-Wilk normality test
15 1          0.942  0.659  Shapiro-Wilk normality test
16 1          0.868  0.178  Shapiro-Wilk normality test
17 1          0.914  0.424  Shapiro-Wilk normality test
18 1          0.854  0.135  Shapiro-Wilk normality test

The question ended up getting a little more complex because it required separating into groups and applying the test for each variable given the group. That’s why we need to nest() and then unnest()

1

Both the Hadley Wickham as to Jenny Bryan has lectures that can help.

– Tomás Barcellos

2019/01/14 at 16:32

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by JdeMello • **844** points · Answer 1 · 2019-01-14T15:12:23+00:00

The problem is that shap returns a list through lapply. "Disconnect" the output from shap:

shap <- function(x){
  lapply(x,shapiro.test) %>% unlist
  }

Producing:

res2<-dataset%>%
  group_by(kmeans)%>%
  do(shapiro=
       shap(.[c(1:6)]))%>%
  tidy(.,shapiro)

> res2
# A tibble: 72 x 3
# Groups:   kmeans [3]
   kmeans names          x                          
   <fct>  <chr>          <chr>                      
 1 1      V1.statistic.W 0.776846950517062          
 2 1      V1.p.value     0.0239963439526711         
 3 1      V1.method      Shapiro-Wilk normality test
 4 1      V1.data.name   X[[i]]                     
 5 1      V2.statistic.W 0.887908776882885          
 6 1      V2.p.value     0.263931722353138          
 7 1      V2.method      Shapiro-Wilk normality test
 8 1      V2.data.name   X[[i]]                     
 9 1      V3.statistic.W 0.942285213321747          
10 1      V3.p.value     0.659382055356446          
# ... with 62 more rows

All code:

library(tidyverse)
library(broom)

set.seed(1)

dataset<-as_tibble(matrix(runif(6*30,20,100),ncol=6))
cluster<-kmeans(dataset,3)
dataset$kmeans<-as.factor(cluster[['cluster']])

shap <- function(x){
  lapply(x,shapiro.test) %>% unlist
  }

res2<-dataset%>%
  group_by(kmeans)%>%
  do(shapiro=
       shap(.[c(1:6)]))%>%
  tidy(.,shapiro)