How to remove unused categories (levels) in the database

Asked

Viewed 497 times

5

Suppose I have the following database:

df <- data.frame(categorias=c("A","B","C","D","E"),
                 valores=seq(1:5))

When I do a subset of that data frame the categories I removed continue.

subdf <- subset(df, valores <= 3)
levels(subdf$categorias)
[1] "A" "B" "C" "D" "E"

1 answer

4


You can use the function droplevels

subdf <- droplevels(subset(df, valores <= 3))

Upshot:

levels(subdf$categorias)
[1] "A" "B" "C"

The advantage is that it works for more than one variable factor at the same time. For example, if your data.frame were:

df <- data.frame(categorias=c("A","B","C","D","E"),
                 categorias2 = c("F", "G", "H", "I", "J"),
                 valores=seq(1:5),
                 valores2=rnorm(5))

If you just do the subset so much categorias how much categorias2 would get more levels. With subdf <- droplevels(subset(df, valores <= 3)) this is solved for all columns of factor.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.