Change/ abbreviate words from a data set

Asked

Viewed 27 times

2

Is there a function in the R that can change one or more words in my dataset?

Example, change the word São Paulo for SP.

1 answer

4


A simple way to create abbreviations is by using the function abbreviate base package. Since words can have accents, the function is used together iconv (also from the base package) to solve this problem.

Reproducible example:

df_1 <- data.frame(
  estados = c("São Paulo", "Minas Gerais", "Santa Catarina", "Maranhão"), 
  regiao = c("Sudeste", "Sudeste", "Sul", "Nordeste")
)

Suppose you want to abbreviate it to two (2) letters. It looks like this:

abbreviate(names.arg = iconv(x = df_1$estados), 2)

Or create a new column by maintaining the database:

library(dplyr)
library(stringr)

df_1 %>% 
  mutate(acronimos = str_to_upper(abbreviate(iconv(estados), 2)))

#             estados   regiao acronimos
#    1      São Paulo  Sudeste        SP
#    2   Minas Gerais  Sudeste        MG
#    3 Santa Catarina      Sul        SC
#    4       Maranhão Nordeste        MR

The warning:

Warning message: In abbreviate(iconv(estados), 2) abbreviate used with non-ASCII chars

Indicates only a warning about a problem of Encoding. But nothing relevant to your case. If you want to understand more about this, read here.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.