Remove all after the second occurrence of a comma

Asked

Viewed 169 times

2

I have a df with addresses and want to remove everything that is after the second comma.

exemplo <- c("Rua Pajé, 30, apto 44", "Av. Brasil,55, blocoB")

What I’ve already tried:

gsub(",[^,]+,(.*)", "", exemplo)
[1] "Rua Pajé"   "Av. Brasil"

But what I want is:

"Rua Pajé, 30", "Av. Brasil,55"

2 answers

4

Here it comes.

sub("(^[^,]*,[^,]*),.*$", "\\1", exemplo)
#[1] "Rua Pajé, 30"  "Av. Brasil,55"

Explanation.

  1. [^,] corresponds to any character except the comma. The circumflex as the first character within the square brackets denies the class that follows.
  2. [^,]* any character other than the repeated comma zero or more times.
  3. ^[^,]* at the beginning of the string.
  4. (^[^,]*,[^,]*) the standard ^[^,]* is followed by a comma and after ^[^,]*, explained above. This is within () to form a group.
  5. (^[^,]*,[^,]*),.*$ the group defined above is followed by a comma and any number of any characters until the end of the string.

3


An alternative is:

exemplo <- c("Rua Pajé, 30, apto 44", "Av. Brasil,55, blocoB")

gsub("^([^,]+,[^,]+),.*$", "\\1", exemplo)

I use the markers ^ and $, which are respectively the beginning and end of the string.

Then I use [^,]+ (one or more characters other than a comma), followed by a comma, followed by more characters than the comma. And I place all of this in parentheses to form a capture group.

Then we have the second comma, followed by .* (zero or more characters), until the end of the string ($).

In the substitution parameter I use \\1, which corresponds to what was captured by the parentheses (in this case, everything before the second of the comma). Since it is the first pair of parentheses, then they correspond to the first capture group, hence the number 1 in "\\1". And since inside those parentheses is everything before the second comma, then \\1 is exactly the stretch you want. The result is:

[1] "Rua Pajé, 30"  "Av. Brasil,55"

Watch it run on Ideone.com.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.