2
Consider the following situation:
I have a database with two variables. The first is a variable with duplicate values (e.g. Cpfxxx.xxx.xxx-xx appears 14 times, Cpfxxx.xxx.xxx-xx appears 18 times and so on). The second variable is the dates of occurrence of the event (e.g. 2017-01-18, 2017-01-19...) associated with each CPF.
I use the following function to remove duplicate cases:
new<-dataset[!duplicated(dataset[c("CPFs")]),]
And I can remove duplicate lines.
My goal: to remove duplicates in CPFs
, but in the other variable (data
), cause the newer (or older) ones to remain attached to the CPF. That is, it is necessary to establish a sort when executing the function.
So if I have the dates (2018-01-20; 2017-02-22
) coupled to a CPF, the date attached to it would be: 2017-02-22
.
dput
fictitious to aid the answer:
dataset=structure(list(CPFs = c(1234, 2345, 1234, 2345, 1234, 2345, 1234,
2345), date = c(1998, 1997, 1993, 1992, 1998, 1998, 1992, 1989
)), class = "data.frame", row.names = c(NA, -8L))
Desired result:
CPF date
1234 1992
2345 1989
In fact, the function
arrange
ofdplyr
resolve. Thankful, @David.– neves