How to make mobile sum in R?

Asked

Viewed 498 times

5

I have a vector 1:50 and need to perform a moving sum (equal to moving average), that is, in the case of the last 5 observations, the new vector would be c(sum(1:5), sum(2:6), sum(3:7), ..., sum(45:49), sum(46:50)).
The Aggregate function has example aggregate(presidents, nfrequency = 1, FUN = weighted.mean, w = c(1, 1, 0.5, 1)) which was as close as I got to the solution without using a for

  • 4

    With the package zoo, try rollsum(presidents, k = 5).

2 answers

6


I know two good packages to do this. The zoo (as Rui mentioned in the commentary) and RcppRoll.

> zoo::rollsum(1:20, k = 5)
 [1] 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
> RcppRoll::roll_sum(1:20, n = 5)
 [1] 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

In terms of performance, the RcppRoll is much faster:

> bench::mark(
+   zoo::rollsum(1:50, k = 5),
+   RcppRoll::roll_sum(1:50, n = 5)
+ )
# A tibble: 2 x 14
  expression     min     mean  median    max `itr/sec` mem_alloc  n_gc n_itr total_time result memory time  gc   
  <chr>      <bch:t> <bch:tm> <bch:t> <bch:>     <dbl> <bch:byt> <dbl> <int>   <bch:tm> <list> <list> <lis> <lis>
1 zoo::roll… 909.4µs   3.45ms  1.71ms 40.3ms      290.   18.91KB     0   155      535ms <int … <Rpro… <bch… <tib…
2 RcppRoll:…  40.5µs 150.75µs 89.49µs 14.6ms     6634.    3.34KB     0  3316      500ms <dbl … <Rpro… <bch… <tib…
  • 1

    Good! In fact the zoo loses even to the base in performance.

4

There are a few ways you can calculate the moving sum in the :

R-base

diff(c(0, cumsum(1:10)), 5)
# 15 20 25 30 35 40

This proposal can be generalized as a function:

soma_movel <- function(x, n) {
  diff(c(0, cumsum(x)), n)
}

Zoo

The package zoo, as raised in the comments, it has a function for this, but it does not play very well

zoo::rollsum(1:10, 5)
# 15 20 25 30 35 40

Comparison

set.seed(123)
vetor <- rnorm(1e5) # 100 mil números

# As funções retornam valores iguais?
all.equal(zoo::rollsum(vetor, 5), soma_movel(vetor, 5))
# [1] TRUE

Finally, a comparison in the performance of the solutions raised shows that even being more about 80 times faster than with the zoo, the solution with the base still loses to the solution with the RcppRoll presented by Daniel in 5 times.

microbenchmark::microbenchmark(
  zoo = zoo::rollsum(vetor, 5),
  base = soma_movel(vetor, 5), 
  cpp = RcppRoll::roll_sum(vetor, n = 5),
  times = 30
)
Unit: microseconds
 expr        min         lq       mean     median         uq        max neval cld
  zoo 200659.545 204218.475 208418.887 206276.601 209928.673 255552.267    30   b
 base   2229.273   2536.157   3379.694   2633.918   2755.286   7725.985    30  a 
  cpp    452.116    514.725   6966.097    558.089    577.333 188068.577    30  a 

Browser other questions tagged

You are not signed in. Login or sign up in order to post.