How to calculate average between lines

Asked

Viewed 56 times

3

I have a dataset of information with years on the line. I need to do an average every two years, as the example below, where the first four columns are the existing data and the last two, which is what we want to calculate. I tried with aggregate and apply, but I was unsuccessful.

My data is following:

structure(list(
  Cod_ARES = c(1, 1, 1, 1, 1, 1, 4, 4, 4, 4, 4, 4), 
  ano = c(2015, 2016, 2017, 2018, 2019, 2020, 2015, 2016, 2017, 2018, 2019, 2020), 
  AG001 = c(1, 2, 4, 1, 5, 4, 4, 3, 2, 5, 2, 5), 
  AG002 = c(3, 2, 2, 4, 5, 1, 5, 1, 1, 1, 2, 3), 
  AG001M = c(NA, 1.5, 3, 2.5, 3, 4.5, 4, 3.5, 2.5, 3.5, 3.5, 3.5), 
  AG002M = c(NA, 2.5, 2, 3, 4.5, 3, 3, 3, 1, 1, 1.5, 2.5)), 
row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"))

1 answer

7


This is a moving average problem. The goal is to calculate the arithmetic mean of the last two periods. The package zoo of R is able to accomplish this task easily with the function rollmean:

library(tidyverse)
library(zoo)
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric

dados <- dados[, 1:4]

dados %>%
  mutate(AG001M = rollmean(AG001, k = 2, fill = NA, align = "right"),
         AG002M = rollmean(AG002, k = 2, fill = NA, align = "right"))
#> # A tibble: 12 x 6
#>    Cod_ARES   ano AG001 AG002 AG001M AG002M
#>       <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
#>  1        1  2015     1     3   NA     NA  
#>  2        1  2016     2     2    1.5    2.5
#>  3        1  2017     4     2    3      2  
#>  4        1  2018     1     4    2.5    3  
#>  5        1  2019     5     5    3      4.5
#>  6        1  2020     4     1    4.5    3  
#>  7        4  2015     4     5    4      3  
#>  8        4  2016     3     1    3.5    3  
#>  9        4  2017     2     1    2.5    1  
#> 10        4  2018     5     1    3.5    1  
#> 11        4  2019     2     2    3.5    1.5
#> 12        4  2020     5     3    3.5    2.5

Created on 2021-03-22 by the reprex package (v1.0.0)

Notice what I did above:

  • I uploaded the necessary packages for the analysis;

  • I selected only the first 4 columns of the data set provided;

  • with the function mutate, created the columns again AG001M and AG002M, which are the moving averages of two periods for the variables AG001 and AG002 respectively. The arguments of the function rollmean are, in order,

    • the number of periods (k = 2);
    • the completion of means which cannot be calculated with NA (fill = NA);
    • the offset of the calculated averages to the last line of the data frame (align = right);
  • 2

    Marcus, it worked great! I didn’t know the zoo package. Thank you very much! Congratulations on your dedication to those who are learning...

  • 1

    Thank you! It’s very good to know that my response has helped you in some way. So consider vote and accept the answer, so that in the future other people who experience the same problem have a reference to solve it.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.