R, How to calculate the mean of a variable x for each group of other variables

Question

R, How to calculate the mean of a variable x for each group of other variables

Asked 4 years ago

Viewed 41 times

0

I wanted to make a table more or less like this: A column with the nominal variables and their groups and another column with the average of a variable x for each group of each variable.

With dplyr I can(know) do this but only if I do one group_by() at a time.

Example: Let’s say in the mtcars dataset my variable x is mpg while my qualitative variables are vs,am, gear,carb.

mtcars %>% group_by(vs) %>% summarise(mean(mpg))

mtcars %>% group_by(am) %>% summarise(mean(mpg))

mtcars %>% group_by(gear) %>% summarise(mean(mpg))

mtcars %>% group_by(carb) %>% summarise(mean(mpg))

With this I have the average mpg for each group in each variable but what I’m looking for is a way to, whether by dplyr or another package do these four things at once, so that with this single output I get a table through something like a knit::kable()

1

Welcome to Sopt. To have your question properly answered, provide a minimum, complete and verifiable example. Check out this Meta post on how to ask a playable question in R.

– Carlos Eduardo Lagosta

2021/07/22 at 00:03

2 answers

1

Reformat to long format and then group and calculate the averages all at once.

library(dplyr)
library(tidyr)

mtcars %>%
  select(mpg, vs, am, gear, carb) %>%
  pivot_longer(-mpg) %>%
  group_by(name, value) %>%
  summarise(mean_mpg = mean(mpg))

Browser other questions tagged r dplyr summary

You are not signed in. Login or sign up in order to post.

by Carlos Eduardo Lagosta • **5,497** points · Answer 1 · 2021-07-22T20:33:11+00:00

The question is with the tag dplyr and Rui Barradas already provided a great answer. But as the question cites "either by dplyr or another package", here are two alternatives, to get registered:

R base

The function aggregate calculates summaries per group. As in the case of using dplyr, the data must be in long format:

# Empilha as variáveis selecionadas e adiciona valores de mpg
dados <- stack(mtcars, select = c("vs", "am", "gear", "carb"))
dados$mpg <- mtcars$mpg

aggregate(mpg ~ ind + values, dados, mean)
#>     ind values      mpg
#> 1    vs      0 16.61667
#> 2    am      0 17.14737
#> 3    vs      1 24.55714
#> 4    am      1 24.39231
#> 5  carb      1 25.34286
#> 6  carb      2 22.40000
#> 7  gear      3 16.10667
#> 8  carb      3 16.30000
#> 9  gear      4 24.53333
#> 10 carb      4 15.79000
#> 11 gear      5 21.38000
#> 12 carb      6 19.70000
#> 13 carb      8 15.00000

Alternatively, using the package reshape2 to change the data format:

aggregate(mpg ~ variable + value,
  data = reshape2::melt(mtcars, "mpg", c("vs", "am", "gear", "carb")),
  FUN = mean)

data table.

Just as dplyr, operates with "divide/apply" paradigm. It has the functions melt and dcast (equivalent to those of the package reshape2) to change the data format.

library(data.table)

dados <- melt(as.data.table(mtcars), "mpg", c("vs", "am", "gear", "carb"))

dados[, mean(mpg), .(variable, value)]