Compare line values in a data frame - R

Asked

Viewed 27 times

0

I have a data frame with more than 1 million lines and I need to count the number of occurrences of a variable. However, the value found has to be inserted in a new "Qtde" column and the values have to start at the minimum occurrence value (1) up to the total N value. It is a little complicated to explain, so I leave this link https://docs.google.com/spreadsheets/d/15TjkKfFm7o0PPqvOWJPMGCkZCKV1O0JmKpxnvWwRv7g/edit?usp=sharing a part of the data frame so that you can visualize and understand better. Remembering that, I can not summarize the data frame. Type, find different values of the variable "key_find" by the amount of times it appears and yes, keep the amount of original lines in the file and add the value of quantity 1 to the limit found in the new variable, column "Qtde"

This is the code I wrote, but it’s not working:

Count_Qualific <- Count_Qualific %>% 
mutate(qtde = for (i in seq_along(key_find)) {
  if_else(key_find[i] = key_find[i - 1], 1 + qtde[i], qtde[i])
})

Exit:

Erro: '=' inesperado in:
"  mutate(qtde = for (i in seq_along(key_find)) {
    if_else(key_find[i] ="
>   })
Erro: '}' inesperado in "  }"
> 

Tabela de exemplo

1 answer

2


This can be solved with ave. No loops are required for or extra packages. And the ave is reasonably fast.

qtde <- with(Count_Qualific, ave(key_find, key_find, FUN = seq_along))
Count_Qualific$qtde <- as.integer(qtde)

A solution dplyr will be the following.

library(dplyr)

Count_Qualific %>%
  group_by(key_find) %>%
  mutate(qtde = seq_along(key_find))

Although it is not requested, here is a solution data.table.

library(data.table)

setDT(Count_Qualific)[, qtde := seq_along, by = key_find]
  • We complicate things... Thank you very much Rui Barradas Help! Problem solved successfully!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.