Problems with dates in R

Asked

Viewed 155 times

0

I am again having date problems in R. I have the dataset below and would like to calculate the difference of dates between revisions. It happens that I am not succeeding, sometimes returns some values that I do not know where it comes from, sometimes the dataset appears empty. I wonder if someone could help me?

structure(list(CPF = c(4.71e+13, 4.71e+13, 222146826, 10491445890, 
40557199816, 10799846821, 5702868838, 47011947868, 8.77e+13, 
3.9e+11, 2511422050, 45250618049, 26690683087, 50286986949, 83178880020, 
460414003, 52466078015, 2.31e+12, 10702601004, 18057438053), 
    PLACA = c("GJD-6942", "GJD-6942", "EVI-7442", "FSI-2944", 
    "FHY-4766", "FGK-0375", "EAA-3772", "FLE-1840", "EYH-8581", 
    "IVX2436", "IQL1145", "IIR3216", "INX8756", "ITI6522", "IRR7035", 
    "ISA8749", "IQB5774", "IUS4704", "ITL2283", "IOG3399"), COD_REV = c(563558L, 
    563559L, 563562L, 563567L, 563745L, 563961L, 564074L, 564184L, 
    564185L, 20313L, 20333L, 20338L, 20623L, 20627L, 20631L, 
    21107L, 21118L, 21121L, 21275L, 21284L), DATA = c("07/02/2018 15:32", 
    "07/02/2018 15:32", "07/02/2018 15:33", "07/02/2018 15:35", 
    "07/02/2018 16:56", "08/02/2018 08:21", "08/02/2018 09:26", 
    "08/02/2018 10:32", "08/02/2018 10:34", "11/04/2016 20:20", 
    "11/04/2016 20:55", "11/04/2016 21:05", "13/04/2016 20:41", 
    "13/04/2016 20:47", "13/04/2016 20:55", "16/04/2016 15:11", 
    "16/04/2016 15:39", "16/04/2016 15:45", "18/04/2016 20:57", 
    "18/04/2016 21:04"), KM_ATUAL = c(46250L, 46250L, 75425L, 
    70632L, 78257L, 176305L, 293761L, 54996L, 175200L, 32925L, 
    73251L, 151000L, 144500L, 57700L, 158000L, 52000L, 97200L, 
    47350L, 108000L, 129324L), KM_MEDIA = c(15000L, 15000L, 14000L, 
    23000L, 18500L, 35000L, 30L, 13000L, 32000L, 80L, 60L, 60L, 
    40L, 80L, 100L, 60L, 60L, 200L, 60L, 60L), TIPO_MEDIA = c("anl", 
    "anl", "anl", "anl", "anl", "anl", "dia", "dia", "anl", "NI", 
    "NI", "NI", "NI", "NI", "NI", "NI", "NI", "NI", "NI", "NI"
    ), KM_MED_DIA = c("41,0958904", "41,0958904", "38,3561644", 
    "63,0136986", "50,6849315", "95,890411", "30", "13000", "87,6712329", 
    "NI", "NI", "NI", "NI", "NI", "NI", "NI", "NI", "NI", "NI", 
    "NI"), ESTILO_DIR = c("etd", "etd", "cdd", "etd", "cdd", 
    "etd", "cdd", "cdd", "etd", "NI", "NI", "NI", "NI", "NI", 
    "NI", "NI", "NI", "NI", "NI", "NI"), CAMBIO = c("atc", "atc", 
    "mnl", "atc", "mnl", "mnl", "mnl", "mnl", "atc", "NI", "NI", 
    "NI", "NI", "NI", "NI", "NI", "NI", "NI", "NI", "NI"), LOJA = c(9L, 
    9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 144L, 144L, 144L, 144L, 144L, 
    144L, 144L, 144L, 144L, 144L, 144L)), .Names = c("CPF", "PLACA", 
"COD_REV", "DATA", "KM_ATUAL", "KM_MEDIA", "TIPO_MEDIA", "KM_MED_DIA", 
"ESTILO_DIR", "CAMBIO", "LOJA"), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

Code used:

# Carregando os pacotes
library(tidyverse)
library(taRifx)
library(lubridate)

# Carregando os datasets
revisao <- read_csv2("rev_seg.csv")
colnames(revisao)
#revisao$DATA <- dmy_hms(revisao$DATA)

dif_km <- 
  revisao %>%
  select(CPF, PLACA, COD_REV, DATA, KM_ATUAL, KM_MEDIA, TIPO_MEDIA, KM_MED_DIA) %>% 
  arrange(DATA) %>% 
  mutate(KM_MED_DIA = as.numeric(KM_MED_DIA), CPF = as.character(CPF)) %>% 
  filter(KM_MED_DIA != "NULL") %>% 

  group_by(CPF,PLACA) %>%
  mutate(ORDEM_REV = row_number()) %>%
  filter(n() > 1) %>%
  mutate(DIF_KM = KM_ATUAL - lag(KM_ATUAL)) %>%
  mutate(DIF_DATA_DIAS = difftime(DATA, lag(DATA) , units = c("days"))) %>% 
  mutate(DIF_DATA_MIN = DIF_DATA_DIAS * 1440) %>%
  mutate(KM_RODADA_ESPERADA = KM_MED_DIA * DIF_DATA_DIAS) %>% 
  na.omit() %>%
  filter(ORDEM_REV > 1)
  • In order to calculate the difference in dates between revisions we need to know which column is the revision column. And by the way, can you give an example of the code you tried, please?

  • Hello Rui, So the revision column is COD_REV. The same board may have made many revisions. Each person points to a KM_MEDIA when making a revision. It turns out that this data is subjective. However it is also collected the car KM_ATUAL so that in the case of customers who have already come more than once, I would like to calculate the difference of dates between revisions, multiply by KM_MED_DIA (which is what he said he walks by day) and then compare this to the diff of KM_ATUAL (which is how much it actually walked). Turns out I’m not making that difference between the dates.

  • The code that I wrote is under the dataset, is that I do not know very well to say the information here. Thank you!

  • @Diegopeliciari, welcome to Stackoverflow! Take a look at how to improve your question so that we can help

  • @Tomásbarcellos I took a look there, but I believe that all the elements raised are in the post, as code with trial, data sampling and the problem in question. What I could improve?

  • You have joined many problems in the question. Isolate only behavior other than expected. You have problems with the supplied code that has nothing to do with the question.

  • the different behavior is that of the date.I am not able to subtract the dates..

  • @Diegopeliciari really isn’t clear what your problem is. You need to better delimit and provide a reproducible example. You want to calculate the difference between the dates of the revisions but have no example of a car with more than one overhaul.

Show 3 more comments

1 answer

1


Diego, I believe the code below does what you asked. I added a date conversion that you were not doing mutate(DATA = dmy_hm(DATA)) and also replacing the commas with dots mutate(KM_MED_DIA = gsub("\\,","\\.",KM_MED_DIA)). With the database sent by you only one car appears more than once and the two revisions are with the same date and time. Off that your test to filter the NAs was incorrect (filter(KM_MED_DIA != "NULL"), to test cells that have NA the correct one is to use the function is.na(x)

dif_km <- 
  revisao %>%
  select(CPF, PLACA, COD_REV, DATA, KM_ATUAL, KM_MEDIA, TIPO_MEDIA, KM_MED_DIA) %>% 
  arrange(DATA) %>% 
  mutate(KM_MED_DIA = gsub("\\,","\\.",KM_MED_DIA)) %>%
  mutate(KM_MED_DIA = as.numeric(KM_MED_DIA), CPF = as.character(CPF)) %>%
  filter(is.na(KM_MED_DIA)==F) %>%
  group_by(CPF, PLACA) %>%
  mutate(ORDEM_REV = row_number()) %>%
  filter(n()>1) %>%
  mutate(DATA = dmy_hm(DATA)) %>%
  mutate(DIF_KM = KM_ATUAL - lag(KM_ATUAL)) %>%
  mutate(DIF_DATA_DIAS = difftime(DATA, lag(DATA) , units = c("days"))) %>% 
  mutate(DIF_DATA_MIN = DIF_DATA_DIAS * 1440) %>%
  mutate(KM_RODADA_ESPERADA = KM_MED_DIA * DIF_DATA_DIAS) %>% 
  na.omit() %>%
  filter(ORDEM_REV > 1)
  • Flavio, thanks first for the help. But if you look at the output of the code, the difference in dates appears negative in some cases. I don’t think he’s sorting by date before subtracting, why it will be?

  • Diego, for the database informed by you there is no way there is a difference between negative dates, because only one card is repeated. I even made a few changes to the base to see if what you said occurs, but it doesn’t. If you can pass the database you are working with it would be more useful.

  • I got Flavio here. The problem is that it is necessary to convert the date to "date" before arranging. Thanks for the help!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.