Separate data from a variable

Asked

Viewed 87 times

-1

I have a variable - (Municipality), which brings me the result (Açucena - MG). How do I separate in R the Municipality of the State?

library(sidrar)
Tab1612SojaRend <-get_sidra(1612,variable = 112, period = c("last" = 22),geo ="City",
                        classific = 'c81',geo.filter = list("Region" = 3),
                        category = list(2713))
 head(Tab1612SojaRend)
 $`Abadia dos Dourados - MG`
 Município (Código)                Município Ano (Código)  Ano
 2             3100104 Abadia dos Dourados - MG         1996 1996
 3             3100104 Abadia dos Dourados - MG         1997 1997
 4             3100104 Abadia dos Dourados - MG         1998 1998
 5             3100104 Abadia dos Dourados - MG         1999 1999
 6             3100104 Abadia dos Dourados - MG         2000 2000
 7             3100104 Abadia dos Dourados - MG         2001 2001
 8             3100104 Abadia dos Dourados - MG         2002 2002
 9             3100104 Abadia dos Dourados - MG         2003 2003
 10            3100104 Abadia dos Dourados - MG         2004 2004
 11            3100104 Abadia dos Dourados - MG         2005 2005
 12            3100104 Abadia dos Dourados - MG         2006 2006
 13            3100104 Abadia dos Dourados - MG         2007 2007
 14            3100104 Abadia dos Dourados - MG         2008 2008
 15            3100104 Abadia dos Dourados - MG         2009 2009...
  • 2

    You could post the code to us?

  • 2

    strsplit('Açucena - MG', ' - ').

  • My comment above only works if class(Municipio) for "character". If it is "factor" has to be strsplit(as.character(Municipio), ' - ').

  • Thanks Rui, only I have a problem with '-' separation, because there are some municipalities that have '-'. Don’t have any function, like the right, left formula of excel? , that then I would ask the last 2.

  • Albertt, I didn’t put the code in because it’s too big

  • Note that separation is not '-', has spaces before and after, ' - '. So it must work even if the city has '-' space-free.

Show 1 more comment

2 answers

3


The function sub_str package stringr allows us to separate a string in the R according to their number of characters and their respective positions. For example, for Açucena - MG, we have 12 characters:

Açucena - MG
123456789012

(tens were omitted for obvious reasons)

To separate the city from the states, just take the string Município and remove the last 5 characters: espaço, -, espaço, estado. Since the size of the string varies by city, I created a function to automate this:

separarCidade <- function(x){

  n <- nchar(x)  
  cidade <- str_sub(x, 1, n-5)

  return(cidade)

}

head(separarCidade(unique(Tab1612SojaRend$Município)), 10)
 [1] "Abadia dos Dourados" "Abaeté"              "Abre Campo"         
 [4] "Acaiaca"             "Açucena"             "Água Boa"           
 [7] "Água Comprida"       "Aguanil"             "Águas Formosas"     
[10] "Águas Vermelhas"

I used head and unique above just to show how the function operates in names of different cities. In your case, the correct is to run separarCidade(Tab1612SojaRend$Município) to separate the names of the cities that appear repeated.

What the function separarCidade makes to calculate the number of characters of each municipality and, from that, to remove the substring that goes from the beginning to the nth-5 character.

  • 1

    I will test here. Thank you Marcus Nunes for this more help!

3

This function separates the city and state even if the city has a hyphen '-' and has as output a list with cities and states.
The vector for testing has a fictitious city, with hyphens where in reality there are no.

s <- c('Abadia-dos-Dourados - MG', 'Açucena - MG')


separar <- function(x, sep = "-"){
  separador <- paste0(sep, "[^-]+$")
  cidade <- sub(separador, "", x)
  cidade <- trimws(cidade)
  separador <- paste0("^.*", sep, "([^-]+$)")
  estado <- sub(separador, '\\1', x)
  estado <- trimws(estado)
  list(cidade = cidade, estado = estado)
}

separar(s)
#$cidade
#[1] "Abadia-dos-Dourados" "Açucena"            
#
#$estado
#[1] "MG" "MG"
  • I’ll test it here. Thanks Rui, for this help.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.