How to stratify (group) observations within a specific variable by more than one category in the R?

Asked

Viewed 132 times

-2

I am working with a table of bank branches by address in Brazil that has 20,580 observations and 15 variables. My goal is to create a new variable with the groups of the regions from the observations of the variable "Uf", but as you can notice below there are no variables with the numerical codes of the municipalities or states. So I made two different attempts to try to form the groups, yet I was unsuccessful.

The table presents the following variables (I changed the original names):

 [1] "cnpj"         "seq_cnpj"     "dv_cnpj"      "instituicao"  "segmento"     "cod_com_area"
 [7] "nome_agencia" "endereco"     "complemento"  "bairro"       "cep"          "municipio"   
[13] "uf"           "ddd"          "fone"

inserir a descrição da imagem aqui

At first attempt, i tried via "Rename" command, but the output reported that reading is only for numeric objects, see below:

The file name is "agencies".

1 Attempt: create region groups with categorical observations within the variable "Uf"

agencias$regiao <- row.names(agencias$uf)<-c("norte" = "RO" | "AC" | "AM" | "RR" | "PA" | "AP" | "TO",
                                             "nordeste" = "MA" | "PI" | "CE" | "RN" | "PB" | "PE" | "AL" 
                                             | "SE" | "BA",
                                             "sudeste" = "MG" | "ES" | "RJ" | "SP",
                                             "sul" = "PR" | "SC" | "RS",
                                             "centro_oeste" = "MS" | "MT" | "GO" | "DF")

Error in "RO" | "AC" : operations are possible only for types numerical, logical or complex

On the second attempt, i created a numeric variable for the states from the transformation of the variable "Uf":

agencias$uf <- as.factor(agencias$uf)
agencias$num_uf <- as.numeric(agencias$uf)

As I could not define in the command "transformation" the numbering of each state according to the IBGE codes, the numbering came out in the order of R. Thus, I used the numbers assigned by R to form the regional groups. The command ran perfectly, but when checking the column "region" newly created, in place of the names of the regions, appears the acronym "NA" meaning "Not Available", ie "Not Available".

2 Attempt: create region groups with numerical observations inside the variable "num_uf"

attach(agencias)

agencias$regiao[num_uf==21 & num_uf==1 & num_uf==3 & num_uf==22 & num_uf==14 & num_uf==4 & num_uf==27] <- "norte"

agencias$regiao[num_uf==10 & num_uf==17 & num_uf==6 & num_uf==20 & num_uf==15 & num_uf==16 & num_uf==2 & num_uf==25 & num_uf==5] <- "nordeste"

agencias$regiao[num_uf==11 & num_uf==8 & num_uf==19 & num_uf==26] <- "sudeste"

agencias$regiao[num_uf==18 & num_uf==24 & num_uf==23] <- "sul"

agencias$regiao[num_uf==12 & num_uf==13 & num_uf==9 & num_uf==7] <- "centro_oeste"

detach(agencias)

Can anyone help me with this point? Is there any way to stratify this categorical variable?

NOTE: I downloaded the table of the website of the Central Bank of Brazil in the link "agencies", available at

https://www.bcb.gov.br/estabilidadefinanceira/agenciasconsorcio

Thank you.

2 answers

0

thank you for your contribution.

I managed to resolve the matter in another way. First I transformed the variable into numerical and then I encoded the variables through the "recode" command. But your suggestion is more direct and advanced, I will definitely use. Thank you!

Transforming and creating factor (factor) variable to numeric (Numeric)

agencias$num_uf <- as.numeric(agencias$uf)

Display the numbering of the ufs

table(agencias$uf)
table(agencias$num_uf)

Creating regional categories from numerical observations (Numeric) within a variable

agencias$regiao <- recode(agencias$num_uf, 
                          `21` = "norte", `1` = "norte", `3` = "norte", `22` = "norte",
                          `14` = "norte", `4` = "norte", `27` = "norte",
                          `10` = "nordeste", `17` = "nordeste", `6` = "nordeste",
                          `20` = "nordeste", `15` = "nordeste", `16` = "nordeste",
                           `2` = "nordeste", `25` = "nordeste", `5` = "nordeste",
                          `11` = "sudeste", `8` = "sudeste", `19` = "sudeste", `26` = 
                           "sudeste",
                          `18` = "sul", `24` = "sul", `23` = "sul",
                          `12` = "centro_oeste", `13` = "centro_oeste", `9` = 
                           "centro_oeste",
                          `7` = "centro_oeste")

-1

If I understand correctly, you want the region variable of your dataframe to have five categories, which are the five regions of Brazil, right?

I’ve done something like this before using a list, for loop and a ifelse:

lista <- list(Norte = c("AM", "RO", "AC", "Outras UFs"), 
              Nordeste = c("PE", "PB", "Outras UFs"), 
              Centro-Oeste = c("UFs"),
              Sudeste = c("Ufs"),
              Sul = c("Ufs"))

agencia <- agencia %>% mutate(Regiao = 0)
for (i in length(lista)){
  agencia$Regiao <- ifelse(test = (agencia$uf %in% lista[[i]]), 
                           yes = names(lista[i]),
                           no = agencia$regiao)
}

There on the part of mutate can be agencia$Regiao <- 0 if you prefer. What the code is doing is checking which UF lines have values contained in each sub-list and assigning their name (the name of the regions) if true. If false, it leaves the value already there. But at the end, all rows of the new variable will have the name of some region, which corresponds to the states in the column UF.

For me it works when the input (UF, in your case) are numbers, I believe that with strings it is not a problem if UF is also string (what seems to be the case).

  • Hi Emerson, thank you for your contribution. I managed to resolve the issue differently. First I transformed the variable into numerical and then I encoded the variables through the "recode" command. But your suggestion is more direct and advanced, I will definitely use. Thank you!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.