Dealing with dates of heterogeneous formats in R

Asked

Viewed 101 times

3

I have 236 files in . csv that have all the same columns. My goal is to join them all into one data frame only.

However, each of them has 4 columns with date and time values. The problem is in the columns referring to date and time. Some dates are in format YYYY-MM-DD HH:MM:SS, as 2019-10-08 10:15:00. Other dates are in format DD/MM/YYYY HH:MM, as 01/08/2001 21:56.

For the first case, I would use the function ymd_hms package lubridate to create a Posixct object:

library(lubridate)

ymd_hms("2019-10-08 10:15:00")
[1] "2019-10-08 10:15:00 UTC"

For the second case, I would use the function dmy_hm of the same package to get the result I wish:

dmy_hm("01/08/2001 21:56")
[1] "2001-08-01 21:56:00 UTC"

But how to precede in the case where I have the two date formats, still as characters, in the same vector? In other words, how to convert the character array x, given below

x <- c("2019-10-08 10:15:00", "01/08/2001 21:56")

on a vector of type Posixct?

4 answers

5


You can use the parameter tryFormats of function as.POSIXct along with some sort of function apply, then it tests for each value which of the provided formats is appropriate.

library(magrittr)
x <- c("2019-10-08 10:15:00", "01/08/2001 21:56")
lapply(x, as.POSIXct, tryFormats = c("%Y-%m-%d %H:%M:%S", "%d/%m/%Y %H:%M")) %>% 
  do.call(c,.)
#> [1] "2019-10-08 10:15:00 -03" "2001-08-01 21:56:00 -03"

library(tidyverse)
map(x, as.POSIXct, tryFormats = c("%Y-%m-%d %H:%M:%S", "%d/%m/%Y %H:%M")) %>% 
     reduce(c)
#> [1] "2019-10-08 10:15:00 -03" "2001-08-01 21:56:00 -03"

Created on 2019-12-13 by the reprex package (v0.3.0)

3

This function solves the problem for vectors where all elements are in one of the two question formats. The function can easily be made more general if necessary.

as_POSIXct_especial <- function(x){
  sp <- strsplit(x, '[ -/]')
  res <- lapply(seq_along(sp), function(i){
    k <- which(nchar(sp[[i]]) == 4)
    if(k == 1){
      lubridate::ymd_hms(x[i])
    }else if(k == 3){
      lubridate::dmy_hm(x[i])
    }else{
      NA
    }
  })
  tz <- substring(res[[1]], first = nchar(x[[1]]) - 2)
  as.POSIXct(unlist(res), origin = "1970-01-01", tz = tz)
}

x <- c("2019-10-08 10:15:00", "01/08/2001 21:56")
as_POSIXct_especial(x)
#[1] "2019-10-08 10:15:00" "2001-08-01 21:56:00"

2

You can check the date format using the substr and transform function to the right format, the code below solves the problem for vector x:

x <- c("2019-10-08 10:15:00", "01/08/2001 21:56")
x <- ifelse(substr(x,3,3) == "/", dmy_hm(x), ymd_hms(x))

1

An even simpler solution is to use the function parse_date_time package lubridate.

library(lubridate)

y <- parse_date_time(x = x, orders = c("y m d H M S", "d m y H M"))

y

#[1] "2019-10-08 10:15:00 UTC" "2001-08-20 01:21:56 UTC"

Which is class object POSIXct:

class(y)

[1] "POSIXct" "POSIXt"

Details about the use of the function parse_date_time meet here.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.