Weighted Average in R

Asked

Viewed 1,179 times

3

With the data set I have the columns C1(year),C2(states), C3(weight), and C4(value). I’d like to do the weighted average (C4) with (C3), per state(C2) per year(C1).

 dados <- data.frame(c1 = c(rep(1996:2003, 4)), c2 = c(rep('rs', 8), rep('sc', 8)),
                     c3 = 1:32, c4 = 32:1)
 dados
      c1 c2 c3 c4
 1  1996 rs  1 32
 2  1997 rs  2 31
 3  1998 rs  3 30
 4  1999 rs  4 29
 5  2000 rs  5 28
 6  2001 rs  6 27
 7  2002 rs  7 26
 8  2003 rs  8 25
 9  1996 sc  9 24
 10 1997 sc 10 23
 11 1998 sc 11 22
 12 1999 sc 12 21
 13 2000 sc 13 20
 14 2001 sc 14 19
 15 2002 sc 15 18
 16 2003 sc 16 17
 17 1996 rs 17 16
 18 1997 rs 18 15
 19 1998 rs 19 14
 20 1999 rs 20 13
 21 2000 rs 21 12
 22 2001 rs 22 11
 23 2002 rs 23 10
 24 2003 rs 24  9
 25 1996 sc 25  8
 26 1997 sc 26  7
 27 1998 sc 27  6
 28 1999 sc 28  5
 29 2000 sc 29  4
 30 2001 sc 30  3
 31 2002 sc 31  2
 32 2003 sc 32  1
 > 

2 answers

4


I’m not sure I understand it very well. Here’s a solution that calculates by year and state the average of C4 weighted by C3. That’s right?

library(tidyverse)
dados %>% 
  group_by(c1, c2) %>% 
  summarise(media = weighted.mean(c4, c3))

# # A tibble: 16 x 3
# # Groups:   c1 [?]
# c1 c2    media
# <int> <fct> <dbl>
# 1  1996 rs    16.9 
# 2  1996 sc    12.2 
# 3  1997 rs    16.6 
# 4  1997 sc    11.4 
# 5  1998 rs    16.2 
# 6  1998 sc    10.6 
# 7  1999 rs    15.7 
# 8  1999 sc     9.8 
# 9  2000 rs    15.1 
# 10  2000 sc     8.95
# 11  2001 rs    14.4 
# 12  2001 sc     8.09
# 13  2002 rs    13.7 
# 14  2002 sc     7.22
# 15  2003 rs    13   
# 16  2003 sc     6.33
  • That’s right! Thank you Daniel.

3

Here are two ways to calculate averages for groups of c1 and c2.

R base.
The function aggregate is ideal for this. It’s simple, it solves the problem in a line of code.

aggregate(c3 ~ c1 + c2, dados, mean, na.rm = TRUE)
#     c1 c2 c3
#1  1996 rs 24
#2  1997 rs 23
#3  1998 rs 22
#4  1999 rs 21
#5  2000 rs 20
#6  2001 rs 19
#7  2002 rs 18
#8  2003 rs 17
#9  1996 sc 16
#10 1997 sc 15
#11 1998 sc 14
#12 1999 sc 13
#13 2000 sc 12
#14 2001 sc 11
#15 2002 sc 10
#16 2003 sc  9

Bundle dplyr.

The package dplyr is today a standard in R, at least for the increasingly numerous supporters of the tidyverse.

library(dplyr)

dados %>%
  group_by(c1, c2) %>%
  mutate(n = n(),
         média = mean(c3, na.rm = TRUE))
## A tibble: 32 x 5
## Groups:   c1, c2 [16]
#      c1 c2       c3     n média
#   <int> <fct> <int> <int> <dbl>
# 1  1996 rs       32     2    24
# 2  1997 rs       31     2    23
# 3  1998 rs       30     2    22
# 4  1999 rs       29     2    21
# 5  2000 rs       28     2    20
# 6  2001 rs       27     2    19
# 7  2002 rs       26     2    18
# 8  2003 rs       25     2    17
# 9  1996 sc       24     2    16
#10  1997 sc       23     2    15
## ... with 22 more rows
  • Great! Thank you Rui Barradas.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.