Select the most recent date of each group in r

Asked

Viewed 31 times

0

I have that data

data;code
18/02/2020;C106
05/04/2018;C107
11/09/2016;C107
16/02/2019;C109
11/03/2020;C110
04/03/2020;C114
18/02/2020;C114
06/02/2020;C121

I would like to select the latest date of each code, so:

data;code
18/02/2020;C106
05/04/2018;C107
16/02/2019;C109
11/03/2020;C110
04/03/2020;C114
06/02/2020;C121

tried to use

tapply(data$data, data$code, max)

and that mistake appeared

Error in Summary.factor(7L, na.rm = FALSE) : 
  ‘max’ not meaningful for factors

1 answer

0

In order for maximum and minimum functions to be applied, you first need to convert your Character/factor dates to a numeric class, such as POSIX:

dados <- read.table(text = "
  data;code
  18/02/2020;C106
  05/04/2018;C107
  11/09/2016;C107
  16/02/2019;C109
  11/03/2020;C110
  04/03/2020;C114
  18/02/2020;C114
  06/02/2020;C121",
  sep = ';', header = TRUE)

dados$data <- as.POSIXct(dados$data, format = '%d/%m/%Y')

> max(dados$data)
[1] "2020-03-11 -03"

To select the latest date for each code, you can use Aggregate:

> aggregate(data ~ code, dados, max)
  code       data
1 C106 2020-02-18
2 C107 2018-04-05
3 C109 2019-02-16
4 C110 2020-03-11
5 C114 2020-03-04
6 C121 2020-02-06

Browser other questions tagged

You are not signed in. Login or sign up in order to post.