Cumulative count of group occurrences on dates

Asked

Viewed 292 times

7

I have a data set similar to the one below. It has a column with dates and another with occurrences of groups on these dates.

         data grupo
1  2019-01-01     a
2  2019-01-01     a
3  2019-01-01     a
4  2019-01-01     a
5  2019-01-02     b
6  2019-01-02     b
7  2019-01-02     a
8  2019-01-03     a
9  2019-01-03     a
10 2019-01-03     a
11 2019-01-04     a
12 2019-01-04     b
13 2019-01-04     b
14 2019-01-05     a
15 2019-01-05     a
16 2019-01-05     a
17 2019-01-05     b
18 2019-01-06     b
19 2019-01-06     a
20 2019-01-06     a
21 2019-01-07     b
22 2019-01-07     b
23 2019-01-07     a
24 2019-01-08     b
25 2019-01-08     a
26 2019-01-08     a
27 2019-01-09     a
28 2019-01-09     a
29 2019-01-09     b
30 2019-01-10     a

I want to calculate the cumulative sum of the occurrences of the groups in relation to the dates present. For example, in 2019-01-01, the group a occurred 4 times. In 2019-01-02, a occurred once, which gives a cumulative occurrence equal to 5. And so on, for each group and each date.

Dates are in sequence, but some are missing. Groups are not present on all dates. That being said, the answer I seek for the above set is as follows:

      data grupo acumulada
2019-01-01     a         4
2019-01-01     b         0
2019-01-02     a         5
2019-01-02     b         2
2019-01-03     a         8
2019-01-03     b         2
2019-01-04     a         9
2019-01-04     b         4
2019-01-05     a        12
2019-01-05     b         5
2019-01-06     a        14
2019-01-06     b         6
2019-01-07     a        15
2019-01-07     b         8
2019-01-08     a        17
2019-01-08     b         9
2019-01-09     a        19
2019-01-09     b        10
2019-01-10     a        20
2019-01-10     b        10

Below are the sample data to facilitate the life of those who try to solve my problem.

structure(list(data = structure(c(17897, 17897, 17897, 17897, 
                                       17898, 17898, 17898, 17899, 17899, 17899, 17900, 17900, 17900, 
                                       17901, 17901, 17901, 17901, 17902, 17902, 17902, 17903, 17903, 
                                       17903, 17904, 17904, 17904, 17905, 17905, 17905, 17906), class = "Date"), 
                    grupo = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 
                                        1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 
                                        1L, 1L, 1L, 2L, 1L), .Label = c("a", "b"), class = "factor")), 
                                        class = "data.frame", row.names = c(NA, -30L))

2 answers

8


Within the there is the function tidyr::complete() to accomplish the desired.

The steps to this are:

  1. Count the occurrence of each group on each date with count().

  2. Complete with 0 (zero) cases where there are no observations

  3. Make the accumulated sum of this count for each group.

This can be done with the following code:

library(tidyverse)

dados %>% 
  count(data, grupo) %>% 
  complete(data, grupo, fill = list(n = 0)) %>% 
  group_by(grupo) %>% 
  mutate(n = cumsum(n))
  • 1

    I didn’t know the function complete. Thank you.

4

Using data.table:

library(magrittr)
library(data.table)

setDT(dados)

dados <- dados[, .(N = .N), by = c("data", "grupo")] %>% 
  dcast(data ~ grupo, value.var = "N", fill = 0) %>% 
  melt(., id.vars = "data", measure.vars = c("a", "b"), variable.name = "grupo", value.name = "N", ) %>% 
  setorderv(., c("data", "grupo")) %>% 
  .[, acumulada := cumsum(N), by = c("grupo")]

Upshot:

 > dados[]
          data grupo N acumulada
 1: 2019-01-01     a 4         4
 2: 2019-01-01     b 0         0
 3: 2019-01-02     a 1         5
 4: 2019-01-02     b 2         2
 5: 2019-01-03     a 3         8
 6: 2019-01-03     b 0         2
 7: 2019-01-04     a 1         9
 8: 2019-01-04     b 2         4
 9: 2019-01-05     a 3        12
10: 2019-01-05     b 1         5
11: 2019-01-06     a 2        14
12: 2019-01-06     b 1         6
13: 2019-01-07     a 1        15
14: 2019-01-07     b 2         8
15: 2019-01-08     a 2        17
16: 2019-01-08     b 1         9
17: 2019-01-09     a 2        19
18: 2019-01-09     b 1        10
19: 2019-01-10     a 1        20
20: 2019-01-10     b 0        10

Browser other questions tagged

You are not signed in. Login or sign up in order to post.