Filter a data frame by a column values

Question

Filter a data frame by a column values

Asked 6 years, 1 month ago

Viewed 198 times

1

I have a data frame with values of a column being a user ID, some of the values in this column are with two copies or more, being the first line with ID replicated the latest and the other older.

I would like a method that I can create another data frame with the first of the replicated values along with the other values that are already unique. All I could do was filter the distinct ones with this command:

dados2 = dados[!duplicated(dados$ID),]

Welcome to Stackoverflow! Unfortunately, this question cannot be reproduced by anyone trying to answer it. Please, take a look at this link and see how to ask a reproducible question in R. So, people who wish to help you will be able to do this in the best possible way.

– Marcus Nunes

2019/06/10 at 01:16
1

I don’t understand what they are values that are already unique, can you explain better? Also, can you please, edit the question with the departure of dput(dados) or, if the base is too large, dput(head(dados, 20))?

– Rui Barradas

2019/06/10 at 13:24

1 answer

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by Robert • **887** points · Answer 1 · 2019-06-11T17:25:45+00:00

If you are sure that the first line with replicated ID is the latest, and you want to keep that line, your code already does that, keep the first line of the repeated ID.

You can check with this data and alternative procedure:

dados<-structure(list(ID = c(4, 4, 4, 7, 7, 7, 10, 10, 10, 15, 15, 
                                15, 20, 20, 20, 25, 25, 25, 30, 30, 30, 35, 35, 35, 40, 40, 40
), COM = c(102.7408349, 46.42860925, 46.42860925, 193.9867874, 
           77.78158526, 77.78158526, 259.2226911, 142.9585464, 142.9585464, 
           338.2513753, 201.6268249, 201.6268249, 540.8096753, 230.0649675, 
           230.0649675, 621.6945295, 243.5781577, 356.2446836, 678.4896365, 
           303.6745224, 532.1778946, 731.7253377, 317.1877126, 621.6366503, 
           794.4532011, 353.1853056, 688.7228286)), class = "data.frame", row.names = 1:27)

dados$seq=1:nrow(dados) # inserir sequencia
# para cada ID único escolher a mais recente (min de sequencia)
dados2= dados[sapply(unique(dados$ID),function(z)min(dados$seq[dados$ID==z])),]
dados3= dados[!duplicated(dados$ID),]
all.equal(dados2,dados3)
#[1] TRUE