How to transform a written sequence into a numerical sequence? (R)

Asked

Viewed 75 times

3

I’m having trouble manipulating a TSE bank. The part below the code matters it:

library(tidyverse)
locais_vot_SP <- read_delim("https://raw.githubusercontent.com/camilagonc/votacao_secao/master/locais_vot_SP.csv",
                        locale = locale(encoding = "ISO-8859-1"),
                        delim = ",",
                        col_names = F) %>% 
              filter(X4 == "VINHEDO")

names(locais_vot_SP) <- c("num_zona", 
                      "nome_local",
                      "endereco",
                      "nome_municipio",
                      "secoes",
                      "secoes_esp")

As can be noticed, the variable data secoes are not properly organised, as different information is aggregated in the same cell.

secoes
196ª; 207ª; 221ª; 231ª;
197ª; 211ª; 230ª; 249ª;

With the following code, I started to fix the problem:

locais_vot_SP <- locais_vot_SP %>% mutate(secoes = gsub("ª", "", secoes)) %>% 
                                   mutate(secoes_esp = gsub("ª", "", secoes_esp)) %>%
                                   mutate(secoes_esp = gsub(";", "", secoes_esp)) %>%
                                   mutate(secoes = gsub("Da ", "", secoes)) %>% 
                                   separate_rows(secoes, sep = ";") %>%  
                                   mutate(secoes = unlist(strsplit(locais_vot_SP$secoes, ";")))

So I got the following:

secoes
32 à 38
100
121

What remains to be solved are the cells in which there are x à y. How to get the following result?

secoes
32
33
34
35
36
37
38
...

1 answer

3

To transform any alphanumeric string of the type x não_número y, with x and y two integers in the sequence x:y, can be done as follows.

x <- "32 à 38"
y <- unlist(strsplit(x, "[^[:digit:]]+"))
y <- as.integer(y)
Reduce(`:`, y)
#[1] 32 33 34 35 36 37 38

This can be easily put into a function.

camila <- function(x){
    y <- unlist(strsplit(x, "[^[:digit:]]+"))
    y <- as.integer(y)
    Reduce(`:`, y)
}

camila("32 à 38")
#[1] 32 33 34 35 36 37 38

(Of course you should choose another name for the function.)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.