Problems with sum() function inside summarise using plyr and tidyverse!

Asked

Viewed 58 times

2

Hello, good afternoon!

I’m finding it difficult to use tidyverse, to perform a stacking function + summarise(soma=sum(value)) + pop = base%>%pivot_longer(cols=(),names_to(),values_to())%>%group_by()%>%summarise(soma=sum(value))%>%spread(variable,m). When using this code by modifying only the function within summarise(), of sum() for mean() or sd(), always worked, now to sum(), does not work, it does not sum, just repeats the value.

To check, I used reshape2::dcast(plyr::ddply(reshape2::melt())) that was the function I always had the habit of using, in addition to confronting the function sum() applied to the values in a subset() determined.

I would like to help you to check how to use by tidyverse, preferably without those exits from "summarise() ... (override with .groups argument)".

Below follows the base, and the codes:

  base<-data.frame(expand.grid(FAT1=1:4,FAT2=1:2,FAT3=1:2,AVA=1:6,REP=1:3),
VAR1=runif(3*96,0,1),
VAR2=runif(3*96,-1.5,0),
VAR3=runif(3*96,0,2.5))

require(reshape2)
require(plyr)
require(tidyverse)

mbase<-base %>% pivot_longer(cols=all_of(c("VAR1","VAR2","VAR3")),names_to = "variable",values_to = "value")




dcast(ddply(mbase, .(FAT1,FAT2,FAT3,AVA,variable), summarise, soma=sum(value)),FAT1*FAT2*FAT3*AVA~variable,fun.aggregate = sum,value.var ="soma" )
# (FAT1=1,FAT2=1,FAT3=1,AVA=1)$VAR1 = 1.704

sum(subset(base,FAT1==1 & FAT2==1 & FAT3==1 & AVA==1)[,"VAR1"])
# = 1.704



base %>% pivot_longer(cols=all_of(c("VAR1","VAR2","VAR3")),names_to = "variable",values_to = "value")%>%
  group_by(FAT1,FAT2,FAT3,AVA,variable) %>%  summarise(soma = sum(value)) %>% spread(variable,soma)
  • Can you give an example of how you want the final answer? Because from what I understand your code is right, the result of the formula with tidyverse is the same as with the dcast, although the impression is not exactly the same.

  • 1

    You’ve seen that even I couldn’t answer, because one hour it was working, then it stopped, it stopped so much that I had to generate this question, and now, it worked :'/ , R things that make us ashamed. Anyway, I will test the solutions of Rui Barradas, to suppress the messages!.

1 answer

1


These two examples give different values of the sums in each row. The first example is the question, only with ungroup at the end in order to obtain TRUE when the results are compared with the second method, much simpler.

As for the second question, not to have the messages from summarise follows what is in the documentation, help('summarise'),

When .groups is not specified, you either get "drop_last" when all the Results are size 1, or "keep" if the size Varies. In addition, a message informs you of that Choice, unless the option "dplyr.summarise.inform" is set to FALSE.

op <- options(dplyr.summarise.inform = FALSE)

mbase <- base %>% 
  pivot_longer(
    cols = all_of(c("VAR1", "VAR2", "VAR3")),
    names_to = "variable",
    values_to = "value"
  ) %>% 
  group_by(FAT1, FAT2, FAT3, AVA, variable) %>% 
  summarise(soma = sum(value)) %>% 
  spread(variable, soma) %>%
  ungroup()


mbase2 <- base %>% 
  group_by(FAT1, FAT2, FAT3, AVA) %>% 
  summarise_at(vars(starts_with("VAR")), sum) %>%
  ungroup()


identical(mbase, mbase2)
#[1] TRUE
  • I’ll test the code to suppress the messages, (I don’t know why it wasn’t working!), anyway thank you! ( I think it must be some package I’m calling after the tidyverse, I’ll put them all before! ( where you called after the command op???

  • @Jeankarlos The variable op means old_options, can then be restored with options(op).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.