How to create Time Series using Start and End in R?

Asked

Viewed 689 times

2

I’m trying to do a time series with a six-month data sample doing the following:

compras = ts(dados_dia$QTDE_COMPRAS, start = c(2018,7), end = c(2019,1),
  frequency = 90)

But by making a plot(compras) shows a time series inconsistent with the data period:

inserir a descrição da imagem aqui

I tried to do the time series without using the end and stayed like this:

compras = ts(dados_dia$QTDE_COMPRAS, start = c(2018,7), frequency = 90)

inserir a descrição da imagem aqui

It also remains inconsistent. What should I do to present a time series that is consistent with the period of my data sample?

Follows the dput of the sample:

dput(head(dados_dia, 50))
structure(list(DATA = structure(c(17731, 17732, 17733, 17734, 
17735, 17736, 17737, 17738, 17739, 17740, 17741, 17742, 17743, 
17744, 17745, 17746, 17747, 17748, 17749, 17750, 17751, 17752, 
17753, 17754, 17755, 17756, 17757, 17758, 17759, 17760, 17761, 
17762, 17763, 17764, 17765, 17766, 17767, 17768, 17769, 17770, 
17771, 17772, 17773, 17774, 17775, 17776, 17777, 17778, 17779, 
17780), class = "Date"), QTDE_COMPRAS = c(1831L, 1635L, 996L, 
889L, 2236L, 2145L, 2023L, 2036L, 1808L, 1056L, 951L, 2421L, 
2001L, 2011L, 1762L, 1364L, 865L, 778L, 2106L, 1816L, 1867L, 
1633L, 1501L, 892L, 736L, 2138L, 1971L, 1805L, 1814L, 1584L, 
874L, 756L, 2299L, 1855L, 2177L, 2096L, 1860L, 1032L, 917L, 2677L, 
2491L, 2444L, 2237L, 1933L, 1049L, 1035L, 2461L, 1929L, 1866L, 
1661L), VALOR_TOTAL = c(57652.18, 48584.93, 27914.92, 26742.56, 
72034.74, 67761.02, 62360.6, 61706.18, 51745.49, 27613.62, 26160.76, 
73334.99, 61721.56, 67054.88, 56929.74, 42995.77, 25133.76, 25312.76, 
72688.48, 62524.33, 62615.25, 55792.27, 47404.18, 26459.83, 23442.8, 
73834.73, 66589.4, 60754.27, 60277.49, 50185.86, 25684.23, 23432.76, 
78387.5, 62461.74, 72587.6, 66310.5, 56826.63, 29198.85, 27247.53, 
85714.93, 77316.9, 73900.85, 65110.36, 54674.84, 30347.08, 31843.1, 
81943.46, 63862.88, 60691.42, 49446.46)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -50L))

2 answers

3


The function ts() has 4 main arguments.

The first of them are the data that will be transformed into time series. The according to and the third are the initial and final period of the series. Note that in the past example the proposed final period does not correspond to the date of the last observation of the data.frame.

The quarter argument, in turn, concerns the number of subperiods existing in a period. It is based on that argument that the R will use the second number passed in start and end.

Note that in the example of the question the R felt that the second argument of start (7) was relative to a period that contained 90 subperiods. So it was close to 2018.1, since 7 / 90 is approximately 0.08.

Using 365 in the argument frequency, inform R that each period (first number of start) has 365 subperiods or days.

One option is to create a function that returns the expected input by ts() for start in daily series (as in the example).

data_ts_anual <- function(data) {
  ano <- lubridate::year(data)
  numero <- as.numeric(data - (lubridate::floor_date(data, "year") - 1))
  c(ano, numero)
}

And then use it within the function ts()

compras <- ts(dados_dia$QTDE_COMPRAS, data_ts_anual(dados_dia$DATA[1]),
              data_ts_anual(dados_dia$DATA[nrow(dados_dia)]), 365)
plot(compras)

inserir a descrição da imagem aqui

It is also possible not to inform the serial end (helps to avoid errors like the above) and leave the R infer the end based on the beginning, the size of the vector and the frequency. So we would have:

compras2 <- ts(dados_dia$QTDE_COMPRAS, data_ts_anual(dados_dia$DATA[1]),
               frequency =  365)
plot(compras2)

inserir a descrição da imagem aqui

1

Using the package zoo with the data you have made available:

library(zoo)

plot(zoo(dados_dia$QTDE_COMPRAS, seq(from = as.Date("2018-07-19"), to = as.Date("2018-09-06"), by = 1)))

If you repair, in this function, you can fully inform your date, Year-Month-Day. In function ts you inform a vector of up to two digits so I find it easier to use the function zoo.

The chart looks like this, I hope it’s what you expect:

Imgur

Using the function ts you must modify the starting point start and its frequency in such a way that it is considered in days (as suggested by the informed data).

The 1st day informed is 19/04/2018 - the 200th day of the year, then: start = c(2018, 200) and according to the details of the function ts, for daily data, you can use the value 7 in the frequency (The value of argument Frequency is used when the series is sampled an integral number of times in each Unit time interval. For example, one could use a value of 7 for Frequency when the data are sampled).

ts(dados_dia$QTDE_COMPRAS, start = c(2018,200), frequency = 7)
  • Hello. I tested your suggestion, but using the zoo I can’t use the decompose, believe q as you commented the ts works with vector... I think I better work with ts even, why the next processes are of the same package. I think I’m missing with the use of the frequency or the period I’m choosing.

  • I’m having a hard time understanding how Time Series works.

  • I just updated the response using the function ts

  • Hello! I tried your suggestion here, but it didn’t work. You can show me practical example?

  • what didn’t work out?

  • when I tested your suggestion on plot showed a time series from 2050 to 2070.

  • His question was about the graph (as he implied), so I used the command zoo which returns a coherent graph. When you have informed the issue of decompose the series, it becomes another problem. Your data is daily, so you can think about using frequency = 365.25, but you don’t have a complete cycle, so the error while running decompose(ts(..., frequency = 365.25)). The help function indicates use frequency = 7 for daily data, which works when it decomposes. So, I suggest creating the chart with the zoo and decompose with ts(..., frequency = 7)

  • I get it. I’ll test your suggestion.

Show 3 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.