How to categorize values in a Data Frame in R?

Asked

Viewed 397 times

2

In a Data Frame I have a column with the social vulnerability index (IVS), varying from 0 to 1 by county. I need to group some values to stay as the example:

#Como Está                            #Como preciso que fique
Município   IVS                       Município    IVS 
A           0.488                     A            Alta
B           0.253                     B            Baixa
C           0.158                     C            Muito Baixa
D           0.685                     D            Muito alta 

How to proceed?

3 answers

6

Since you did not provide the categorization ranges, I have subjectively stipulated them to answer. The function recode, of the package car, solves the problem:

library(car)

dataset<-data.frame(Município=c(LETTERS[1:4]),IVS=c(0.488,0.253,0.158,0.685))

dataset$novavariavel<-recode(dataset$IVS,
'0:0.2="Muito Baixa";
0.21:0.35="Baixa";
0.36:0.5="Alta";
0.51:1="Muito Alta"')

#> dataset
#  Município   IVS novavariavel
#1         A 0.488         Alta
#2         B 0.253        Baixa
#3         C 0.158  Muito Baixa
#4         D 0.685   Muito Alta
  • dataset$novavariavel is the new column you want to create, with the desired categories.

  • categorization tracks you can set at your discretion. Whenever categorization is a 'word', you need to use " as I did in the example.

5

Yet another way is with the function findInterval, which I believe is in this case better than the function cut.

If we have new vectors niveis and points limite the following solves the problem.

niveis <- c("Muito Baixa", "Baixa", "Alta", "Muito Alta")
limites <- c(0, 0.2, 0.4, 0.6, 1)

i <- findInterval(ivs$IVS, limites)
i
#[1] 3 2 1 4

niveis[i]
#[1] "Alta"        "Baixa"       "Muito Baixa" "Muito Alta"

So just run a line of code. Here’s two to keep the original.

novo <- data.frame(Município = ivs$Município)
novo$IVS <- niveis[findInterval(ivs$IVS, limites)]

novo
#  Município         IVS
#1         A        Alta
#2         B       Baixa
#3         C Muito Baixa
#4         D  Muito Alta

5


If you want to stick to the base package, you can use indexing and multiple comparisons:

set.seed(123)
dados <- data.frame(
  Municipio = LETTERS[1:6],
  IVS = runif(6)
)

dados$IVScat[dados$IVS < .33] <- 'baixo'
dados$IVScat[dados$IVS >= .33 & dados$IVS < .66] <- 'medio'
dados$IVScat[dados$IVS >= .66] <- 'alto'

> dados
  Municipio       IVS IVScat
1         A 0.2875775  baixo
2         B 0.7883051   alto
3         C 0.4089769  medio
4         D 0.8830174   alto
5         E 0.9404673   alto
6         F 0.0455565  baixo

Browser other questions tagged

You are not signed in. Login or sign up in order to post.