How to turn part of a column into another with data.table?

Asked

Viewed 22 times

0

I have a data.table with a column that has the code of the municipalities (6 first characters) and the name of the municipalities (other characters).

I would like to separate them, using data.table.

In data.frame, it would be like this:

pop_mun_total$cod_mun <- str_sub(pop_mun_total$mun,start=1, end=6)

This also works in data.table, but it has some specific function?

Example of data.table:

pop_mun_total <- data.table(mun=c("110001 Alta Floresta D'Oeste", "110037 Alto Alegre dos Parecis","110040 Alto Paraíso", "110034 Alvorada D'Oeste", "110002 Ariquemes","110045 Buritis", "110003 Cabixi", "110060 Cacaulândia", "110004 Cacoal","110070 Campo Novo de Rondônia", "110080 Candeias do Jamari","110090 Castanheiras", "110005 Cerejeiras", "110092 Chupinguaia","110006 Colorado do Oeste", "110007 Corumbiara", "110008 Costa Marques","110094 Cujubim", "110009 Espigão D'Oeste", "110100 Governador Jorge Teixeira"))

How I’d like you to stay:

cod_mun    mun
110001     Alta Floresta D'Oeste
110037     Alto Alegre dos Parecis    
...        ...

1 answer

2


The following solution uses tstrsplit, a combination of transpose and of strsplit. But before separating the column into two, replace the first space with a "_", since this character cannot be code or name of municipality.
As there is already a column mun, the new column is created at the end.

pop_mun_total[, mun := sub(" ", "_", mun)]
pop_mun_total[, c("cod_mun", "mun") := tstrsplit(mun, "_", fixed = TRUE)]

head(pop_mun_total)
#                       mun cod_mun
#1:   Alta Floresta D'Oeste  110001
#2: Alto Alegre dos Parecis  110037
#3:            Alto Paraíso  110040
#4:        Alvorada D'Oeste  110034
#5:               Ariquemes  110002
#6:                 Buritis  110045

Putting it all together in one instruction.

pop_mun_total[, c("cod_mun", "mun") := tstrsplit(sub(" ", "_", mun), "_", fixed = TRUE)]

To have the columns in order of the question, either of the two instructions below can be used.

pop_mun_total[, 2:1]
pop_mun_total[, c("cod_mun", "mun")]
  • 2

    Instead of replacing the first space with "_", you can use a regular expression that identifies the first space after digits: tstrsplit(mun, "(?<=\\d) ", perl = TRUE)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.