How can I filter the first occurrences of a certain variable in my R data frame?

Asked

Viewed 58 times

1

I am working with all the Ufs in Brazil:

date <- c("03/06/2020", "03/06/2020", "05/06/2020", "06/06/2020", "07/06/2020")
uf <- c("RJ", "SP", "RJ", "SP", "RJ")
confirmed <- c("0", "1", "1", "2", "3")
df <- data.frame(date, uf, confirmed)

How can I filter the first occurrences of a certain variable in my data.frame in r?

  • "filter" means that you want to be alone with the first occurrences or remove them? And "certain variable" is which of the three? The question is very unclear and, because of this, already with two votes to close.

  • It means I want to stick with the first few occurrences. Certain variable applies to "Confirmed", since it is not trivial to want the first occurrences to be UF’s. Think of it as confirmed cases of disease in each state over time, I want to find the dates of the first occurrences. Thank you

2 answers

0

The has a very useful function for this: first(). You can use it to return the first occurrence of a vector.

Combining it with group_by() we can come up with a solution to the problem:

df %>% 
  group_by(uf) %>% 
  summarise(data_inicio = first(date))
#> # A tibble: 2 x 2
#>   uf    data_inicio
#>   <fct> <fct>      
#> 1 RJ    03/06/2020 
#> 2 SP    03/06/2020 

If this command allows you to return the first value, we can use it in filter() to filter the desired observations.

df %>% 
  group_by(uf) %>% 
  filter(date == first(date))
#> # A tibble: 2 x 3
#> # Groups:   uf [2]
#>   date       uf    confirmed
#>   <fct>      <fct> <fct>    
#> 1 03/06/2020 RJ    0        
#> 2 03/06/2020 SP    1 

0


This solution dplyr obtains the dates of the first cases confirmed by "uf".
First dates become class "Date" and confirmed in class "numeric".

library(dplyr)

df <- df %>%
  mutate(date = as.Date(date, "%d/%m/%Y"),
         confirmed = as.numeric(as.character(confirmed))) 

Now a basis is obtained for the first occurrences of confirmed cases.

prim_conf <- df %>%
  filter(confirmed > 0) %>%
  group_by(uf) %>%
  summarise(date = first(date)) 

prim_conf
## A tibble: 2 x 2
#  uf    date      
#  <chr> <date>    
#1 RJ    2020-06-05
#2 SP    2020-06-03

If you also want how many confirmed cases there were at those dates, a Join with the original basis will fetch these values.

prim_conf %>%
  ungroup() %>%
  left_join(df, by = c("date", "uf"))
## A tibble: 2 x 3
#  uf    date       confirmed
#  <chr> <date>         <dbl>
#1 RJ    2020-06-05         1
#2 SP    2020-06-03         1

Or, more simply, without laying down the foundation,

prim_conf %>% left_join(df, by = c("date", "uf"))
## A tibble: 2 x 3
#  uf    date       confirmed
#  <chr> <date>         <dbl>
#1 RJ    2020-06-05         1
#2 SP    2020-06-03         1

Browser other questions tagged

You are not signed in. Login or sign up in order to post.