How to create a data frame of a database based on the difference of two dates in a column of another categorical variable in the R software

Asked

Viewed 300 times

-1

I have the following database as an example and the result I expect:

Exemplo


In the case the values in the new dataframe refer to days, which is the difference of the last date of a category from the first of the same category.

2 answers

4

You can do whatever you want with the base R function aggregate.

Grupo <- c("A", "A", "A", "B", "B", "C", "C")
Data <- c("01/02/2017", "15/02/2017", "20/03/2017", "18/02/2017", "01/03/2017", "15/02/2017", "20/02/2017")
dados <- data.frame(Grupo, Data)

dados$Data <- as.Date(dados$Data, "%d/%m/%Y")

result <- aggregate(Data ~ Grupo, dados, function(d) d[length(d)] - d[1])
result
#  Grupo Data
#1     A  47 
#2     B  11 
#3     C   5
  • A question, before has a process number variable, as would be in this case:

  • @Fvasquez It is better to edit the question with the new data. Put the output of dput(dados) or if the bank is very large dput(head(dados, 20)), please.

0

Another way to do this is by using the dplyr package:

library(dplyr)

dados <- tribble(
  ~Grupo, ~Data,
  "A", "01/02/2017", 
  "A", "15/02/2017", 
  "A", "20/03/2017", 
  "B", "18/02/2017", 
  "B", "01/03/2017", 
  "C", "15/02/2017", 
  "C", "20/02/2017"
) %>%
  mutate(Data = as.Date(Data, format = "%d/%m/%Y"))

result <- dados %>%
  group_by(Grupo) %>%
  summarise(Data = as.integer(max(Data) - min(Data))) %>%
  as.data.frame()

Browser other questions tagged

You are not signed in. Login or sign up in order to post.