How to calculate the difference between two dates in a column and group by category to generate a new database in R software

Asked

Viewed 107 times

0

Following the example of the original database and the new:

inserir a descrição da imagem aqui

1 answer

5

Initially I would like to point out that the ideal is always to ask questions with reproducible examples. In your case you should have provided the date.frame dice that I ended up having to type ;-). For you to better understand how to ask a question with a reproducible example read this help: How to create a Minimum, Complete and Verifiable example

In the first part I’m simply creating a data.frame like the one you provided in the image.

## Criando o exemplo como um data.frame
dados <- data.frame(
  Processo = c(201701, 201701, 201702, 201702, 201702, 201703, 201703, 201704, 201704, 201704),
  Grupo = c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'A', 'A', 'A'),
  Data = c('01/02/2017', '15/02/2017', '20/03/2017', '18/04/2017', '01/07/2017', '15/02/2017', '20/02/2017', '01/03/2017', NA, '05/06/2017')
)

Something important you need to know about R is that when reading a dataset with dates R will initially "understand" these dates as strings. You will need to convert these strings to R date format such that you can do sum and subtraction operations with dates:

## Convertendo para data
dados$Data <- as.Date(dados$Data, format = '%d/%m/%Y')

See that I provided an argument format which shows R how days, months and year are represented. I used uppercase Y because the year is displayed with 4 digits.

Finally, just use dplyr to group and then calculate the difference between the longest and shortest date. Note that I used the na.rm = T option to remove the NA.

## Carregando o pacote dplyr
library(dplyr)

## Agrupando e calculando a diferença entre as datas com o dplyr
dados %>%
  group_by(Processo, Grupo) %>%
  arrange(desc(Data)) %>%
  summarise(Total_Dias = max(Data, na.rm = T) - min(Data, na.rm = T))

The result is exactly the final table you posted:

# A tibble: 4 x 3
# Groups:   Processo [?]
  Processo Grupo Total_Dias
     <dbl> <fct> <time>    
1  201701. A     14        
2  201702. B     103       
3  201703. C     5         
4  201704. A     96 

Browser other questions tagged

You are not signed in. Login or sign up in order to post.