How to extract a specific string snippet

Asked

Viewed 1,031 times

4

Let’s have this URL extracted

/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW

And I just want the piece that starts with a1G, someone knows how I only get this bit?

2 answers

2


You can do using the package stringr and regular expressions.

In your case, I would do so:

s <- "/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW"
stringr::str_extract(s, "a1G\\S+\\s*")
[1] "a1G57000003DE4QEAW"

This code works even if sis a vector, so it would work in a data.frame as follows:

df$extrair <- stringr::str_extract(df$url, "a1G\\S+\\s*")

Note that if you don’t have the package stringr installed, you will need to install it using the command install.packages("stringr").

1

Extract part of a string using only the package base is pretty boring, but possible. I chose a simpler regular expression than Daniel’s, since you weren’t very specific. It would look like this:

> s <- "/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW"
> regmatches(s, gregexpr("a1G.+", s))
[[1]]
[1] "a1G57000003DE4QEAW"

Note that the result is a list, which will contain an element for each vector string s, with all occurrences of the regular expression. If you want only one vector as output, you can use unlist:

> s <- c("/ac/rio-branco/xpto-xyz-1-0-16-5-abcd-a1G57000003DE4QEAW", "abcsda1G000")
> regmatches(s, gregexpr("a1G.+", s))
[[1]]
[1] "a1G57000003DE4QEAW"

[[2]]
[1] "a1G000"

> unlist(regmatches(s, gregexpr("a1G.+", s)))
[1] "a1G57000003DE4QEAW" "a1G000"     

Browser other questions tagged

You are not signed in. Login or sign up in order to post.