How to make a difference in R dates?

Asked

Viewed 387 times

5

dataset <- structure(list(PLACA = structure(c(5L, 5L, 5L, 4L, 1L, 2L, 3L, 
7L, 6L, 8L), .Label = c("DSF9652", "EFR9618", "EQW6597", "ERB1522", 
"EWM3539", "LOC1949", "LQQ5554", "OQT5917"), class = "factor"), 
    COD_REV = c(113195L, 113196L, 113197L, 113303L, 80719L, 80720L, 
    80722L, 113318L, 80788L, 113386L), DATA = structure(1:10, .Label = c("2016-01-14 12:13:00.000", 
    "2016-01-18 18:48:00.000", "2016-01-18 19:00:00.000", "2016-01-25 11:46:00.000", 
    "2016-01-25 19:20:00.000", "2016-01-25 19:28:00.000", "2016-01-25 19:33:00.000", 
    "2016-01-25 20:56:00.000", "2016-01-26 21:28:00.000", "2016-01-27 13:50:00.000"
    ), class = "factor"), KM_ATUAL = c(52100L, 52100L, 52100L, 
    110676L, 62300L, 31144L, 165022L, 41021L, 155646L, 55030L
    ), KM_MEDIA = c(0L, 42L, 40L, 20L, 17L, 18L, 120L, 100L, 
    10L, 38L)), .Names = c("PLACA", "COD_REV", "DATA", "KM_ATUAL", 
"KM_MEDIA"), row.names = c(NA, -10L), class = "data.frame")

I have the dataset above and would like to group the boards to see how many visits the same client has made. Then I need to calculate the difference between the dates and current km_of the visits, to compare with the field Km_media_dia and see the difference between these values. I am not able to calculate the difference between the dates. This was my attempt so far:

library(tidyverse)
# Carregando os datasets
dataset <- read_csv2("dados_atuais.csv")

dataset_revisao_km <- dataset %>%
  # selecionar apenas colunas importantes
  select(CPF, PLACA, COD_REV, DATA, KM_ATUAL) %>%
  arrange(DATA) %>%
  group_by(PLACA) %>%
  mutate(ORDEM_REVISAO = row_number()) %>%
  # manter apenas placas com mais de uma revisao
  filter(n() > 1) %>%
  mutate(DIFERENCA_KM = KM_ATUAL - lag(KM_ATUAL)) %>%
  # filtrar fora a primeira revisao da placa
  filter(ORDEM_REVISAO > 1) 
  • 2

    Welcome to Stackoverflow Brasil! Unfortunately, this question cannot be reproduced by anyone trying to answer it. Please, take a look at this link and see how to ask a reproducible question in R. So, people who wish to help you will be able to do this in the best possible way.

  • Thanks for the tips Marcus. But I read the article here and did not understand very well what I should change.. I have a small code, a sample of the dataset, and the problem I’m not able to solve.. Something is missing?

  • 2

    The data is missing. An image of them does not help us to reproduce your problem. Run the command dput(head(dataset, 20)) (in which dataset is the data set only with the 5 important columns) and paste the result obtained in the original question. Thus, the work of those who try to help you will be greatly reduced, as it will not be necessary to enter the data to reproduce your problem.

  • Update it, it would be just like this?

  • No, it’s not like that. It’s only the output of Marcus Nunes' command, not a link for an image. Images do not help at all.

  • I already edited the question, @Noisy. It’s all right now.

Show 1 more comment

1 answer

4


The R it needs data with dates to be correctly specified so that it can make calculations that may be necessary. One of the best ways to do this is with the package lubridate:

library(lubridate)

dataset$DATA <- ymd_hms(dataset$DATA)

See that I just replaced the column DATA for its equivalent in ymd_hms (Yearmonthday_hourminutesecond), as it was in the original dataset. From there it was only calculated the difference in days between the equal plates, using the function difftime:

dataset %>%
  group_by(PLACA) %>%
  filter(n() > 1) %>%
  mutate(DiferencaDias=difftime(DATA, lag(DATA), units="days")) %>%
  na.omit()
# A tibble: 2 x 6
# Groups:   PLACA [1]
  PLACA   COD_REV DATA                KM_ATUAL KM_MEDIA Diferenca          
  <fct>     <int> <dttm>                 <int>    <int> <time>             
1 EWM3539  113196 2016-01-18 18:48:00    52100       42 4.27430555555556   
2 EWM3539  113197 2016-01-18 19:00:00    52100       40 0.00833333333333333  

Note that in the informed dataset only the EWM3539 card appears more than once. As it appears 3 times, it makes no sense to speak on her first visit, because there is no difference of days. Therefore, we remove this information through the na.omit.

  • thank you Marcus! sorry for the disorganization in the formulation of the question ai. is that I am beginner in both r and here in stackoverflow. I am now trying to multiply the column km_media by the difference, as I need to compare this value with km _current. but I’m not sure how to do these operations

  • Dispose. It’s great to know that my response has helped you in some way. So consider vote and accept the answer, so that, in the future, other people who go through the same problem have a reference to solve it. If you have any other questions, just ask a new question.

  • Marcus, in a new extract, the date appears to me in the following format: 27/02/2018 15:32:00. (with the first day) I tried to change the order of dataset$DATA <- ymd_hms(dataset$DATA) to: dataset$DATA <- dmy_hms(dataset$DATA) but it didn’t work.. how should I do?

  • Use dmy_hms, because now the date is in the format day_month_year_hour_minute_second.

  • i tried, but it returns: Warning message: All formats failed to parse. No formats found.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.