-2
I am working with a table of bank branches by address in Brazil that has 20,580 observations and 15 variables. My goal is to create a new variable with the groups of the regions from the observations of the variable "Uf", but as you can notice below there are no variables with the numerical codes of the municipalities or states. So I made two different attempts to try to form the groups, yet I was unsuccessful.
The table presents the following variables (I changed the original names):
[1] "cnpj" "seq_cnpj" "dv_cnpj" "instituicao" "segmento" "cod_com_area"
[7] "nome_agencia" "endereco" "complemento" "bairro" "cep" "municipio"
[13] "uf" "ddd" "fone"
At first attempt, i tried via "Rename" command, but the output reported that reading is only for numeric objects, see below:
The file name is "agencies".
1 Attempt: create region groups with categorical observations within the variable "Uf"
agencias$regiao <- row.names(agencias$uf)<-c("norte" = "RO" | "AC" | "AM" | "RR" | "PA" | "AP" | "TO",
"nordeste" = "MA" | "PI" | "CE" | "RN" | "PB" | "PE" | "AL"
| "SE" | "BA",
"sudeste" = "MG" | "ES" | "RJ" | "SP",
"sul" = "PR" | "SC" | "RS",
"centro_oeste" = "MS" | "MT" | "GO" | "DF")
Error in "RO" | "AC" : operations are possible only for types numerical, logical or complex
On the second attempt, i created a numeric variable for the states from the transformation of the variable "Uf":
agencias$uf <- as.factor(agencias$uf)
agencias$num_uf <- as.numeric(agencias$uf)
As I could not define in the command "transformation" the numbering of each state according to the IBGE codes, the numbering came out in the order of R. Thus, I used the numbers assigned by R to form the regional groups. The command ran perfectly, but when checking the column "region" newly created, in place of the names of the regions, appears the acronym "NA" meaning "Not Available", ie "Not Available".
2 Attempt: create region groups with numerical observations inside the variable "num_uf"
attach(agencias)
agencias$regiao[num_uf==21 & num_uf==1 & num_uf==3 & num_uf==22 & num_uf==14 & num_uf==4 & num_uf==27] <- "norte"
agencias$regiao[num_uf==10 & num_uf==17 & num_uf==6 & num_uf==20 & num_uf==15 & num_uf==16 & num_uf==2 & num_uf==25 & num_uf==5] <- "nordeste"
agencias$regiao[num_uf==11 & num_uf==8 & num_uf==19 & num_uf==26] <- "sudeste"
agencias$regiao[num_uf==18 & num_uf==24 & num_uf==23] <- "sul"
agencias$regiao[num_uf==12 & num_uf==13 & num_uf==9 & num_uf==7] <- "centro_oeste"
detach(agencias)
Can anyone help me with this point? Is there any way to stratify this categorical variable?
NOTE: I downloaded the table of the website of the Central Bank of Brazil in the link "agencies", available at
https://www.bcb.gov.br/estabilidadefinanceira/agenciasconsorcio
Thank you.
Hi Emerson, thank you for your contribution. I managed to resolve the issue differently. First I transformed the variable into numerical and then I encoded the variables through the "recode" command. But your suggestion is more direct and advanced, I will definitely use. Thank you!
– Rafael