Moving average in R

Asked

Viewed 1,427 times

3

I need to make a simple 7-day moving average on R, I’m using the function rollmean package zoo, but the values being returned are incorrect.

Example:

library(zoo)

teste <- sample(1:50)
mean <- rollmean(teste, 7, align = "right")
teste <- cbind(teste, mean)

Upshot:

  teste     mean
  42     27.42857
  21     22.85714
  11     25.57143
  48     29.85714
  33     34.85714
  29     36.28571
  8      34.57143
  10     29.28571
  40     25.85714
  41     26.85714

In the last value, which returned 26.85714, it should return 25.57143, which would be the average of the 7 previous days (40, 10, 8, 29, 33, 48, 11). What’s going on?

Obs.: I know the function sample generates random values and that its execution will not result in the same values shown in the example.

  • 3

    Can use set.seed() to make random data generation reproducible.

  • 1

    You should use set.Seed(). Please read the good practice manual here in R. The examples need to be reproducible in order to analyze. For this, it is important to specify the Seed when you use random data. Its use is very simple, just specify any number for example set.seed(123). In that your question was not crucial, but there may be others that you will end up unanswered only because it was not reproducible.

3 answers

5


The values are correct, the problem is the behavior of R with two vectors of different size. The moving average needs 7 values to be computed, so R only starts the analysis at 7 point and the first 6 observations are discarded. The vector mean then has 6 elements unless the vector teste.

For vectors of different size the default behavior of R is to repeat the smaller vector until it is the same size as the larger one.

# comportamento do R

x <- 1:2
y <- 1:3

x + y
#> Warning in x + y: comprimento do objeto maior não é múltiplo do comprimento do
#> objeto menor
#> [1] 2 4 4

# soma de 1 + 1, 2 + 2 e 3 + 1, porque o vetor menos é repetido

So your last average value in the table is not the average of the last value, but of some previous value that was repeated, if you look at the table you calculated you will see that the last numbers are repeated from the first. To fix it just put NA in the first values of the table or disregard them, because there the average is not calculated.

library(zoo)
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
set.seed(3)

#original
teste <- sample(1:50)
mean <- rollmean(teste, 7, align = "right")
teste_tabela <- cbind(teste, mean)
#> Warning in cbind(teste, mean): number of rows of result is not a multiple of
#> vector length (arg 2)

mean(teste_tabela[44:50, 1]) == mean(teste_tabela[50,2])
#> [1] FALSE

#corrigido para os primeiros 6 valores
teste_tabela <- cbind(teste, c(rep(NA, 6), mean))

mean(teste_tabela[44:50, 1]) == mean(teste_tabela[50,2]) 
#> [1] TRUE

Created on 2020-05-13 by the reprex package (v0.3.0)

3

When calculating moving averages the result will be shorter than the original data:

library(zoo)

set.seed(42)
teste <- sample(1:50)
mean <- rollmean(teste, 7, align = "right")

> length(teste)
[1] 50
> length(mean)
[1] 44

By joining the two vectors, the smallest (of the moving averages) will be recycled (i.e. will return to the beginning and fill in the spaces "missing"):

> uniao <- cbind(teste, mean)
Warning message:
In cbind(teste, mean) :
  number of rows of result is not a multiple of vector length (arg 2)
> head(uniao)
    teste     mean
[1,]    46 33.85714
[2,]    50 28.14286
[3,]    14 25.00000
[4,]    40 27.14286
[5,]    30 24.14286
[6,]    24 25.71429
> tail(uniao)
      teste     mean
[45,]    20 33.85714
[46,]    39 28.14286
[47,]     9 25.00000
[48,]     2 27.14286
[49,]    43 24.14286
[50,]     8 25.71429

When uniting the two, you need to consider the difference and include Nas according to the alignment:

uniao <- cbind(teste,
              c(rep(NA, length(teste)-length(mean)), mean)) # NAs no início porque usou align = "right"

Use align = "left", can change the length of mean; Nas will be introduced at the end:

teste <- sample(1:50)
mean <- rollmean(teste, 7, align = "left")

length(mean) <- length(teste)

> tail(cbind(teste, mean), 10)
      teste     mean
[41,]    30 27.28571
[42,]    23 23.28571
[43,]    17 26.42857
[44,]    44 28.57143
[45,]    50       NA
[46,]     6       NA
[47,]    21       NA
[48,]     2       NA
[49,]    45       NA
[50,]    32       NA

3

Beyond the answers that have already been given, (1) and (2), my answer is simpler (but it does not explain the reason for the error of the question) because it uses

  1. The function rollmeanr which by default aligns the right averages;
  2. The argument fill = NA precisely not to obtain a vector smaller than the input vector;
  3. Together, the two points above avoid error, hence this answer does not explain it.

The code goes like this:

set.seed(1234)
x <- sample(1:50)
m <- zoo::rollmeanr(x, k = 7, fill = NA)

teste_tabela <- cbind(x, m)
mean(teste_tabela[44:50, 1]) == teste_tabela[50,2] 
#[1] TRUE

Browser other questions tagged

You are not signed in. Login or sign up in order to post.