R generate time series with initial and final dates for service execution periods


I’m starting to study time series analysis. I have some data sets that are composed as follows: each line consists of one customer service, having the start date (lowest date 2012-01-01), the end date (highest date 2017-05-01) of the service, and the neighborhood where the activity took place.

> mse_df
# A tibble: 484 × 3
       mse_in    mse_fim           Bairro
       <date>     <date>           <fctr>
1  2015-11-03 2016-08-11         Pachecos
2  2013-03-18 2014-10-02       Bela Vista
3  2012-08-08 2015-09-24          Brejaru
4  2014-02-24 2014-12-17            Madri
5  2015-03-30 2015-04-29  Jardim Eldorado
6  2012-07-30 2013-09-19          Brejaru
7  2016-05-24 2017-05-19      Frei Damiao
8  2012-08-13 2015-02-09 Ponte do Imaruim
9  2012-08-08 2014-07-23 Ponte do Imaruim
10 2012-07-30 2012-10-10     Caminho Novo
# ... with 474 more rows

I would like to do the time series analysis of the service, with the data divided into 12 periods per year, and a second time series analysis, also into 12 periods per year, but this time for each location, but I don’t know how to generate the time series data frames to start the analysis.

  • I don’t understand what kind of analysis is being proposed here. Is there any way to edit the question or leave a comment trying to be a little more specific? For example, what is the purpose of this analysis?

  • Hello Marcus, I’m sorry for the inaccuracy. I want to check if there are trends or seasonalities in the service, and make frequency forecast for the next months, both general, as per neighborhood.

  • I’m not able to prepare df for the temporal analyses I intend to perform.

Answering here, since I have no reputation to comment on ^_^.

I don’t quite understand what you mean, but, follow here, as I would do to create a data frame with the time series of input, ending, and balance of orders per day:


df <- tribble(
~mse_in,    ~mse_fim,         ~Bairro,
'2015-11-03', '2016-08-11', 'Pachecos',
'2013-03-18', '2014-10-02', 'Bela Vista',
'2012-08-08', '2015-09-24', 'Brejaru',
'2014-02-24', '2014-12-17', 'Madri',
'2015-03-30', '2015-04-29', 'Jardim Eldorado',
'2012-07-30', '2013-09-19', 'Brejaru',
'2016-05-24', '2017-05-19', 'Frei Damiao',
'2012-08-13', '2015-02-09', 'Ponte do Imaruim',
'2012-08-08', '2014-07-23', 'Ponte do Imaruim',
'2012-07-30', '2012-10-10', 'Caminho Novo')

df %<>% 
  mutate(mse_in = lubridate::ymd(mse_in),
         mse_fim = lubridate::ymd(mse_fim))

número_pedidos_in <- 
  df %>% 
  group_by(mse_in) %>% 
  summarise(.in = n())

número_pedidos_out <- 
  df %>% 
  group_by(mse_fim) %>% 
  summarise(.out = n())

dias_com_movimento <- número_pedidos_in %>% 
  full_join(número_pedidos_out, by = c('mse_in' = 'mse_fim')) %>%
  rename(data = mse_in) %>% 
  mutate_if(is.numeric, funs(if_else(is.na(.), 0L, .))) %>% 
  mutate(saldo = .in - .out)

range_dias <- range(dias_com_movimento$data)

full_df <-
  tibble(data = seq(as.Date(range_dias[[1]]),
                    range_dias[[2]], 'days')) %>%
  left_join(dias_com_movimento) %>%
  replace_na(list(.in = 0, .out = 0, saldo = 0))

The last data frame has the following face:

# A tibble: 1,755 × 4
data   .in  .out saldo
<date> <dbl> <dbl> <dbl>
1  2012-07-30     2     0     2
2  2012-07-31     0     0     0
3  2012-08-01     0     0     0
4  2012-08-02     0     0     0
5  2012-08-03     0     0     0
6  2012-08-04     0     0     0
7  2012-08-05     0     0     0
8  2012-08-06     0     0     0
9  2012-08-07     0     0     0
10 2012-08-08     2     0     2
# ... with 1,745 more rows

That’s what you’re going after?

  • Excellent, but I think the OP wanted something done a month, not a day. It seems to me that he wants a date frame with two columns: the first column is the month, the second column is the number of services performed that month. In addition, it seems to me that he would also like other n data frames similar to this, with these two columns but separated by neighborhood.

  • Ola Guilherme, what you need is well as Marcus mentioned, because I need to check the number of customers served per month, and then per month and location. But my original df present in the mse_in column the beginning of the service, and in the mse_fim the end of the service for each client, and I have to account for each month of the entire service period, and this for each client. Qt to separate per day is quiet pq after to group, the problem was to get a date frame with just two columns, one with dates, another with volume on each date. I appreciate the help.

