How to remove text from a string in a data frame column in r?

Asked

Viewed 1,154 times

4

Hello!

I have the following dataframe:

Município (Código)                    Município    Valor
2            1100015 Alta Floresta D'Oeste - RO   408765
3            1100023             Ariquemes - RO   477322
4            1100031                Cabixi - RO   126630
5            1100049                Cacoal - RO   463570
6            1100056            Cerejeiras - RO    96654
7            1100064     Colorado do Oeste - RO   266464

And I need the municipality column to be removed the Ufs, so I need my dataframe to be like this:

Município (Código)               Município       Valor
2            1100015 Alta Floresta D'Oeste      408765
3            1100023             Ariquemes      477322
4            1100031                Cabixi      126630
5            1100049                Cacoal      463570
6            1100056            Cerejeiras       96654
7            1100064     Colorado do Oeste      266464

How do I get this result?

  • 1

    Hello Ingled, check out some tips on how to ask a good question here. It seems you have several goals with your question. Separate your questions into topics, and put the example of the date frame you need at the end of the question

  • Hello Ingled, you must edit the question, not create a new one.

1 answer

2

You can make use of the following regular expression:

gsub("\\s\\-\\s\\S\\S", "", data$Município)

In which datais the name of your data frame (if you make your data available through dput(head(seu.data.frame, 20)) I put the full answer.

In function gsub, i first need to pass the pattern I’m looking for:

  • \\s is the representation of a space
  • \\- is the representation of a trait
  • \\S is the representation of any non-empty Character.

So basically it looks for a "space"+"dash"+"space"+"Character"+"Character" and switches to the second argument, which is empty (""). The third argument is your die.

Example:

> gsub("\\s\\-\\s\\S\\S", "", "Ariquemes - RO")
[1] "Ariquemes"
> gsub("\\s\\-\\s\\S\\S", "", "Curitiba - PR")
[1] "Curitiba"
  • 1

    My data frame you can find in : table <-get_sidra(api="/t/3939/p/2017/v/all/N6/all/c79/2670"), of the package sidrar.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.