Difference between dates located in different rows and columns

Asked

Viewed 81 times

0

In R, I have the following data:

entrada      saida        id   ordem
2013-01-15   2013-05-20   1    1     
2015-03-13   2015-09-12   1    2      
2016-01-12   2016-04-11   1    3 

I want to add a column with the difference (days) between the output (1) and the input (2) and so on, considering the id and the order of the records. Follow an example:

entrada      saida        id   ordem  diferenca
2013-01-15   2013-05-20   1    1      
2015-03-13   2015-09-12   1    2      662
2016-01-12   2016-04-11   1    3      122
2013-01-01   2013-01-30   2    1      
2014-06-18   2015-09-02   2    2      504
2016-02-26   2017-02-11   2    3      177
2018-03-04   2018-12-16   2    4      386
  • It is not a duplicate. I want to calculate the difference of dates located in different columns and rows.

  • 1

    So the question has to be clearer. How is there a difference of zero days? What are the dates that enter into the calculation of each column value diferenca?

  • Rui, thanks for the tip. I edited the example, and took out the zero. The difference is between the date of departure (e.g. 2013-05-20) and the next entry (e.g. 2015-03-13), and so on.

2 answers

2

The following function makes the calculation the question asks, with the differences calculated for each id.

Note that date columns have to be class "Date".

fun <- function(DF){
  f <- function(X){
    c(0, X[[1]][-1] - X[[2]][-nrow(X)])
  }
  unname(unlist(by(DF, DF[['id']], f)))
}

fun(df1)
#[1]   0 662 122   0 504 177 386

all.equal(df1$diferenca, fun(df1))
#[1] TRUE

Dice.

df1 <- read.table(text = "
entrada      saida        id   ordem  diferenca
2013-01-15   2013-05-20   1    1      0
2015-03-13   2015-09-12   1    2      662
2016-01-12   2016-04-11   1    3      122
2013-01-01   2013-01-30   2    1      0
2014-06-18   2015-09-02   2    2      504
2016-02-26   2017-02-11   2    3      177
2018-03-04   2018-12-16   2    4      386
", header = TRUE)

df1[1:2] <- lapply(df1[1:2], as.Date)
  • forgive my ignorance, but where do I put the dates?

  • @Marlirocha Dates? You mean the value of fun(df1)? If yes, you can do df1$diferenca <- fun(df1).

  • No. I’m having trouble using this function. What I put in DF, X, the input and output dates.. @Noisy

  • @Marlirocha No DF and in the X puts nothing. Just call the function as it is in the example. As long as the dates are the 1st and 2nd columns, fun(df1) calculates everything by herself.

1


I prefer the Rui Barradas solution because it only needs the base package and follows the functional principle of R, but here is an answer with data.table:

library(data.table)

setDT(df1)
# Converte para data.table. Ou carregue seus dados usando `fread`

df1 <- df1[order(id, ordem)]
# Garante que seus dados estejam ordenados

df1[, `:=`(entrada = as.IDate(entrada), saida = as.IDate(saida))]
# Converte para datas (usando o formato providenciado pelo data.table)

df1[, diferenca := entrada - shift(saida, 1), by = id]

> df1
      entrada      saida id ordem diferenca
1: 2013-01-15 2013-05-20  1     1        NA
2: 2015-03-13 2015-09-12  1     2       662
3: 2016-01-12 2016-04-11  1     3       122
4: 2013-01-01 2013-01-30  2     1        NA
5: 2014-06-18 2015-09-02  2     2       504
6: 2016-02-26 2017-02-11  2     3       177
7: 2018-03-04 2018-12-16  2     4       386

You can replace the Nas with 0, but it is good practice to keep the distinction (in this case, 0 corresponds to when the date of output of a line is the same as the entry of the next line, while NA corresponds to the first line of each id).

  • Thank you, @Carloseduardolagosta!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.