R - match and add string

Asked

Viewed 77 times

3

n <- c("alberto queiroz souza","bernardo josé silva","josé césar pereira","alberto, q-s.","alberto, queiroz souza","alberto, q. s.","alberto, q c", "bernardo, j. s.", "bernardo, j. silva", "josé, c. p.", "josé, c. pereira")

I must find every element of vector n, in df:

df <- data.frame(Titulo.1 = c("ALBERTO QUEIROZ SOUZA (ALBERTO, Q-S.) - ATUA NA EMPRESA.","B. J SILVA (BERNARDO, J. SILVA)", "JOSÉ CÉSAR PEREIRA (JOSÉ, C. P.)", "LENILTON FRAGOSO (FRAGOZO, LENILTON)","ALKMIM, MARCIO"),
                  Titulo.2 = c("BERNARDO JOSÉ SILVA (BERNARDO, J. S.)","ALBERTO QUEIROZ SOUZA (ALBERTO, QUEIROZ SOUZA)","JOSÉ CÉSAR PEREIRA (JOSÉ, C. PEREIRA)","LENILTON FRAGOSO (FRAGOZO, LENILTON)","ALKMIM, MARCIO"),
                  Titulo.3 = c("LENILTON FRAGOSO (FRAGOZO, L)","BERNARDO JOSÉ SILVA (BERNARDO, J. S.) - ATUA NA EMPRESA","ALBERTO QUEIROZ SOUZA (ALBERTO, Q. S.)","JOSÉ CÉSAR PEREIRA (J. C. P.)","ALKMIM, MARCIO"),
                  Titulo.4 = c("JOSÉ CÉSAR PEREIRA (JOSÉ, CÉZAR PEREIRA)","LENILTON FRAGOSO (FRAGOZO, LENILTON) - ATUA NA FIOCRUZ","ALKMIM, MARCIO","ALBERTO (ALBERTO, Q C)","BERNARDO JOSÉ SILVA (B, J. S.)"),
                  Titulo.5 = c("BERNARDO JOSÉ SILVA (BERNARDO, JS)","JOSÉ CÉSAR PEREIRA (JOSÉ, C. PEREIRA) - ATUA NA FIOCRUZ","LENILTON FRAGOSO (FRAGOZO, L.)","ALKMIM, MARCIO","ALBERTO QUEIROZ SOUZA (ALBERTO, Q-S.)"),
                 stringsAsFactors = FALSE)

and when found I should add "- acts in the company", thus getting "josé, c. p. - acts in the company", for example.

but IF the match in df already present the "- acts in the company", obviously does not need.

I’m trying to match first with something like this:

for (x in n) {
  result <- sapply(df, gsub, pattern = x, ...)
  #ou
  result <- sapply(df, str_replace, pattern = x, ...)
}

but it’s hard.

  • Fernando, I don’t understand the logic of your data.frame. You have several columns with repeated values. You want to do it in all columns. Can the name appear more than once in a column? Are you sure you want to keep each name in a format?

  • In df, each column is an Article Title with the respective authors. Of these authors only one (in each column), appears with identification that "acts in the company" but this same author appears in other titles (columns) but without the identification that acts in the company.

  • So, I need to check if it appears in more Titles and when I find check if there is the identification of "- acts in the company", if not, I should put "- acts in the company" in front of his name. In each column you will only have it, but in others it can also appear (with or without the identif)

  • See "ALBERTO QUEIROZ SOUZA (ALBERTO, Q-S.) - ACTS IN THE COMPANY." appears with the identification " - acts in the company" only in the column Titulo.1 In the other columns "ALBERTO QUEIROZ SOUZA (ALBERTO, Q. S.)" appears but without the identification. Need to put " - operates in the company" in all "ALBERTO QUEIROZ SOUZA (ALBERTO, Q )" qm all q find.

  • I think a nice regex would help you.

1 answer

1


The following code performs the following: for each item in each column, retrieves the names, searches them in the vector n, for the names found checks whether they already act in the company, and decides to add this text in the negative case. As already mentioned in the comments, to have better results have to clean your bank.

textm<-"ATUA NA EMPRESA"
ndf<-as.data.frame(lapply(df,function(nc){#nc=df[,1]
  nct=nc
  ncm<-sapply(nc,function(nx)
    tolower(unlist(strsplit(nx," (",fixed=T))[1]) )
  enc=ncm%in%n
  emp=grepl(textm,nc[enc])
  nct[enc]<-ifelse(emp,nc[enc],paste(nc[enc]," - ",textm,".",sep=""))
  nct
  })
,stringsAsFactors = FALSE)
ndf[,1]

[1] "ALBERTO QUEIROZ SOUZA (ALBERTO, Q-S.) - ATUA NA EMPRESA."
[2] "B. J SILVA (BERNARDO, J. SILVA)"                         
[3] "JOSÉ CÉSAR PEREIRA (JOSÉ, C. P.) - ATUA NA EMPRESA."     
[4] "LENILTON FRAGOSO (FRAGOZO, LENILTON)"                    
[5] "ALKMIM, MARCIO"   

Browser other questions tagged

You are not signed in. Login or sign up in order to post.