How to extract a specific string snippet

Question

How to extract a specific string snippet

Asked 9 years, 6 months ago

Viewed 1,031 times

4

Let’s have this URL extracted

/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW

And I just want the piece that starts with a1G, someone knows how I only get this bit?

2 answers

2

You can do using the package stringr and regular expressions.

In your case, I would do so:

s <- "/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW"
stringr::str_extract(s, "a1G\\S+\\s*")
[1] "a1G57000003DE4QEAW"

This code works even if sis a vector, so it would work in a data.frame as follows:

df$extrair <- stringr::str_extract(df$url, "a1G\\S+\\s*")

Note that if you don’t have the package stringr installed, you will need to install it using the command install.packages("stringr").

Thank you so much for your help!!!

– Felipe Amaral Rodrigues

2016/02/29 at 17:52

Browser other questions tagged r regex

You are not signed in. Login or sign up in order to post.

by Molx • **2,659** points · Answer 1 · 2016-02-27T01:31:20+00:00

Extract part of a string using only the package base is pretty boring, but possible. I chose a simpler regular expression than Daniel’s, since you weren’t very specific. It would look like this:

> s <- "/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW"
> regmatches(s, gregexpr("a1G.+", s))
[[1]]
[1] "a1G57000003DE4QEAW"

Note that the result is a list, which will contain an element for each vector string s, with all occurrences of the regular expression. If you want only one vector as output, you can use unlist:

> s <- c("/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW", "abcsda1G000")
> regmatches(s, gregexpr("a1G.+", s))
[[1]]
[1] "a1G57000003DE4QEAW"

[[2]]
[1] "a1G000"

> unlist(regmatches(s, gregexpr("a1G.+", s)))
[1] "a1G57000003DE4QEAW" "a1G000"