How to create a data frame of a database based on the difference of two dates in a column of another categorical variable in the R software

Question

How to create a data frame of a database based on the difference of two dates in a column of another categorical variable in the R software

Asked 7 years, 3 months ago

Viewed 300 times

-1

I have the following database as an example and the result I expect:

In the case the values in the new dataframe refer to days, which is the difference of the last date of a category from the first of the same category.

Welcome to Stackoverflow! Take a look at how to improve your next questions.

– Tomás Barcellos

2018/04/17 at 17:37

2 answers

Browser other questions tagged r date

You are not signed in. Login or sign up in order to post.

by Rui Barradas • **15,422** points · Answer 1 · 2018-04-17T05:19:59+00:00

You can do whatever you want with the base R function aggregate.

Grupo <- c("A", "A", "A", "B", "B", "C", "C")
Data <- c("01/02/2017", "15/02/2017", "20/03/2017", "18/02/2017", "01/03/2017", "15/02/2017", "20/02/2017")
dados <- data.frame(Grupo, Data)

dados$Data <- as.Date(dados$Data, "%d/%m/%Y")

result <- aggregate(Data ~ Grupo, dados, function(d) d[length(d)] - d[1])
result
#  Grupo Data
#1     A  47 
#2     B  11 
#3     C   5

by user2332849 • **246** points · Answer 2 · 2020-03-01T01:24:08+00:00

Another way to do this is by using the dplyr package:

library(dplyr)

dados <- tribble(
  ~Grupo, ~Data,
  "A", "01/02/2017", 
  "A", "15/02/2017", 
  "A", "20/03/2017", 
  "B", "18/02/2017", 
  "B", "01/03/2017", 
  "C", "15/02/2017", 
  "C", "20/02/2017"
) %>%
  mutate(Data = as.Date(Data, format = "%d/%m/%Y"))

result <- dados %>%
  group_by(Grupo) %>%
  summarise(Data = as.integer(max(Data) - min(Data))) %>%
  as.data.frame()