You don’t need to complicate much to do this, you just have to count the occurrences of each level using table and then remove the lines where occurrences are smaller than the limit. For example:
tb <- table(dataset$fatores)
rem <- !(dataset$fatores %in% names(tb[tb <= 2]))
dataset[rem, ]
# fatores V2 V3
# 2 5 -0.01619026 0.36458196
# 4 11 0.82122120 -0.11234621
# 5 3 0.59390132 0.88110773
# 6 11 0.91897737 0.39810588
# 7 12 0.78213630 -0.61202639
# 8 8 0.07456498 0.34111969
# 9 8 -1.98935170 -1.12936310
# 11 3 -0.05612874 1.98039990
# 12 3 -0.15579551 -0.36722148
# 14 5 -0.47815006 0.56971963
# 18 12 0.38767161 0.68973936
# 19 5 -0.05380504 0.02800216
# 21 12 -0.41499456 0.18879230
# 22 3 -0.39428995 -1.80495863
# 23 8 -0.05931340 1.46555486
# 26 5 -0.16452360 0.47550953
# 28 5 0.69696338 0.61072635
# 29 11 0.55666320 -0.93409763
# 30 5 -0.68875569 -1.25363340
In this case, all factor lines c(1, 2, 4, 6, 7, 9, 10)
were removed.
You can apply the same logic in other ways. Using sapply
to create a vector with the count, and then filter through it:
rem <- sapply(seq_len(nrow(dataset)), function(i) {
sum(dataset$fatores[i] == dataset$fatores)
}) > 2
dataset[rem, ]
Or using dplyr
, counting line by line how many times that factor occurs and using this as a criterion for the filter.
library(dplyr)
dataset %>% rowwise() %>% filter(sum(fatores == .$fatores) > 2)
A tip: When creating random variables that should not represent numbers, it is better to use letters to facilitate the interpretation of the results. In your case, it could be letters[1:12]
.
Are in fact factors, should have been clearer, these factors are names, for example, "MLP", "KFC" etc. in this case the suggested form does not work.
– Wagner Jorge
@Wagnerjorge If they are factors with text, how do you want to check if it is "less than or equal to 2"? It makes no sense. You mean it occurs less than two or less times?
– Molx
Explaining better, I want to check if the amount of that factor is less than 2. In the example above it happens "1" twice then I remove their positions. It could be 5, if it happened twice there removed their positions as well.
– Wagner Jorge
@Wagnerjorge Pronto, I repeated the answer according to his explanation.
– Molx