R - match and replace string

Asked

Viewed 64 times

3

I have this vector:

n <- c("alberto queiroz souza (alberto, q-s.)", 
       "alberto queiroz souza (alberto, queiroz souza)", 
       "alberto queiroz souza (alberto, q. s.)", 
       "alberto queiroz souza (alberto, q c)", 
       "bernardo josé silva (bernardo, j. s.)", 
       "bernardo josé silva (bernardo, j. silva)", 
       "josé césar pereira (josé, c. p.)", 
       "josé césar pereira (josé, c. pereira)")

For each element I would like to separate the name of the one in parentheses.

n <- str_split_fixed(as.character(n), " \\(", 2)

n <- c(strsplit(as.character(n), "\\)$"))

I don’t know how to do this split better

transforming into another vector with elements not duplicated.

result would be a vector like this:

result <- c("alberto queiroz souza", 
            "bernardo josé silva", 
            "josé césar pereira", 
            "alberto, q-s.", 
            "alberto, queiroz souza",
            "alberto, q. s." ...... )

2 answers

3


Try something similar:

ns<-sapply(n,function(nx)
  unlist(strsplit(nx," (",fixed=T))
  )
ns<-t(unique(ns))
row.names(ns)<-NULL
res<-apply(ns,2,unique)
res[[2]]<-gsub("\\)","",res[[2]])
unlist(res)

[1] "alberto queiroz souza"  "bernardo josé silva"    "josé césar pereira"    
 [4] "alberto, q-s."          "alberto, queiroz souza" "alberto, q. s."        
 [7] "alberto, q c"           "bernardo, j. s."        "bernardo, j. silva"    
[10] "josé, c. p."            "josé, c. pereira"

1

I’d do it this way:

library(magrittr)
library(stringr)

antes_parenteses <- n %>%
  str_extract_all(".{1,}\\(") %>%
  str_replace_all(fixed(" ("), "")

parenteses <- n %>%
  unlist() %>%
  str_extract_all("\\(.{1,}\\)") %>%
  str_replace_all(fixed("("), "") %>%
  str_replace_all(fixed(")"), "")

resultado <- c(antes_parenteses, parenteses) %>% unique()

Instead of doing the split, I am using regular expressions to extract the information.

> resultado
[1] "alberto queiroz souza"  "bernardo josé silva"    "josé césar pereira"     "alberto, q-s."         
[5] "alberto, queiroz souza" "alberto, q. s."         "alberto, q c"           "bernardo, j. s."       
[9] "bernardo, j. silva"     "josé, c. p."            "josé, c. pereira"

Browser other questions tagged

You are not signed in. Login or sign up in order to post.