Using If in R in a data frame where the column is string

Asked

Viewed 39 times

1

I have a data_frame with two columns (UF and Municipality). I want to create a new column named region through the column UF. I did what described below, but it was a mistake. I’m a beginner in R and would like to understand my mistake. Can help me?

if populacao$UF == ("AM", "AP", "AC", "RR", "PA", "RO", "TO") {
  
  populacao$regiao <- "Norte"
  
} else if populacao$UF == c("MA", "PI", "CE", "RN", "PE", "PB", "SE", "AL", "BA") {
  
  populacao$regiao <- "Nordeste"
  
} else if populacao$UF == c("MT", "MS", "GO") {
  
  populacao$regiao <- "Centro-Oeste"
  
} else ifpopulacao$UF == c("SP", "RJ", "ES", "MG"){
  
  populacao$regiao <- "Sudeste"
  
} else if populacao$UF == c("PR", "SC", "RS"){
  
  populacao$regiao <- "Sul"
  
} else {
  
  populacao$regiao <- "ERRO"
}

2 answers

2

The error is occurring because you have not placed parentheses after the if and else if.

The right thing would be:

if (populacao$UF %in% ("AM", "AP", "AC", "RR", "PA", "RO", "TO")) {
  
  populacao$regiao <- "Norte"
  
} else if (populacao$UF %in% c("MA", "PI", "CE", "RN", "PE", "PB", "SE", "AL", "BA")) {
  
  populacao$regiao <- "Nordeste"
  
} else if (populacao$UF %in% c("MT", "MS", "GO")) {
  
  populacao$regiao <- "Centro-Oeste"
  
} else if (populacao$UF %in% c("SP", "RJ", "ES", "MG")) {
  
  populacao$regiao <- "Sudeste"
  
} else if (populacao$UF %in% c("PR", "SC", "RS")) {
  
  populacao$regiao <- "Sul"
  
}

I used a different function, the ifelse, it has a structure simulate the function SE excel: ifelse(TESTE, RESULTADO SE SIM, RESULTADO SE NÃO). I also used the package dplyr, to use the function mutate, that makes changes in columns, but is not required.

Code:

library(dplyr)

populacao <- data.frame(UF = c("AM", "AP", "AC", "RR", "PA", "RO", "TO", "MA", "PI", "CE", "RN", 
                               "PE", "PB", "SE", "AL", "BA", "MT", "MS", "GO", "SP", "RJ", "ES", 
                               "MG", "PR", "SC", "RS"),
                        regiao = NA)


mutate(populacao,
       regiao = ifelse(UF %in% c("AM", "AP", "AC", "RR", "PA", "RO", "TO"), "Norte", UF),
       regiao = ifelse(UF %in% c("MA", "PI", "CE", "RN", "PE", "PB", "SE", "AL", "BA"), "Nordeste", UF),
       regiao = ifelse(UF %in% c("MT", "MS", "GO"), "Centro-oeste", UF),
       regiao = ifelse(UF %in% c("SP", "RJ", "ES", "MG"), "Sudeste", UF),
       regiao = ifelse(UF %in% c("PR", "SC", "RS"), "Sul", UF))

Note: I used the comparator %in% instead of ==, because you’re comparing it to more than one observation.

  • 1

    Thank you very much. You helped me and very much.

  • After the first %in% function is missing c.

1

Here are two ways without if or ifelse, one in R base and the other with package dplyr.

First, to make the code more readable, vectors are created with the codes of UF.

Sudeste <- c("SP", "RJ", "ES", "MG")
Sul <- c("PR", "SC", "RS")
Centro_Oeste <- c("MT", "MS", "GO")
Norte <- c("AM", "AP", "AC", "RR", "PA", "RO", "TO")
Nordeste <- c("MA", "PI", "CE", "RN", "PE", "PB", "SE", "AL", "BA")

R base

After creating the new column with NA's, the values shall be assigned with a logical index.

populacao$regiao <- NA_character_
populacao$regiao[populacao$UF %in% Sudeste] <- "Sudeste"
populacao$regiao[populacao$UF %in% Sul] <- "Sul"
populacao$regiao[populacao$UF %in% Centro_Oeste] <- "Centro-Oeste"
populacao$regiao[populacao$UF %in% Norte] <- "Norte"
populacao$regiao[populacao$UF %in% Nordeste] <- "Nordeste"

Bundle dplyr

This solution is a pipe of mutate and case_when.

library(dplyr)

populacao <- populacao %>%
  mutate(
    regiao = case_when(
      UF %in% Sudeste ~ "Sudeste",
      UF %in% Sul ~ "Sul",
      UF %in% Centro_Oeste ~ "Centro-Oeste",
      UF %in% Norte ~ "Norte",
      UF %in% Nordeste ~ "Nordeste",
      TRUE ~ NA_character_
    )
  )

Browser other questions tagged

You are not signed in. Login or sign up in order to post.