How to create a data frame of a database based on the difference of two dates in a column of another categorical variable in the R software


Viewed 300 times


I have the following database as an example and the result I expect:


In the case the values in the new dataframe refer to days, which is the difference of the last date of a category from the first of the same category.

2 answers


You can do whatever you want with the base R function aggregate.

Grupo <- c("A", "A", "A", "B", "B", "C", "C")
Data <- c("01/02/2017", "15/02/2017", "20/03/2017", "18/02/2017", "01/03/2017", "15/02/2017", "20/02/2017")
dados <- data.frame(Grupo, Data)

dados$Data <- as.Date(dados$Data, "%d/%m/%Y")

result <- aggregate(Data ~ Grupo, dados, function(d) d[length(d)] - d[1])
#  Grupo Data
#1     A  47 
#2     B  11 
#3     C   5
  • A question, before has a process number variable, as would be in this case:

  • @Fvasquez It is better to edit the question with the new data. Put the output of dput(dados) or if the bank is very large dput(head(dados, 20)), please.


Another way to do this is by using the dplyr package:


dados <- tribble(
  ~Grupo, ~Data,
  "A", "01/02/2017", 
  "A", "15/02/2017", 
  "A", "20/03/2017", 
  "B", "18/02/2017", 
  "B", "01/03/2017", 
  "C", "15/02/2017", 
  "C", "20/02/2017"
) %>%
  mutate(Data = as.Date(Data, format = "%d/%m/%Y"))

result <- dados %>%
  group_by(Grupo) %>%
  summarise(Data = as.integer(max(Data) - min(Data))) %>%

Browser other questions tagged

You are not signed in. Login or sign up in order to post.