3
n <- c("alberto queiroz souza","bernardo josé silva","josé césar pereira","alberto, q-s.","alberto, queiroz souza","alberto, q. s.","alberto, q c", "bernardo, j. s.", "bernardo, j. silva", "josé, c. p.", "josé, c. pereira")
I must find every element of vector n, in df:
df <- data.frame(Titulo.1 = c("ALBERTO QUEIROZ SOUZA (ALBERTO, Q-S.) - ATUA NA EMPRESA.","B. J SILVA (BERNARDO, J. SILVA)", "JOSÉ CÉSAR PEREIRA (JOSÉ, C. P.)", "LENILTON FRAGOSO (FRAGOZO, LENILTON)","ALKMIM, MARCIO"),
Titulo.2 = c("BERNARDO JOSÉ SILVA (BERNARDO, J. S.)","ALBERTO QUEIROZ SOUZA (ALBERTO, QUEIROZ SOUZA)","JOSÉ CÉSAR PEREIRA (JOSÉ, C. PEREIRA)","LENILTON FRAGOSO (FRAGOZO, LENILTON)","ALKMIM, MARCIO"),
Titulo.3 = c("LENILTON FRAGOSO (FRAGOZO, L)","BERNARDO JOSÉ SILVA (BERNARDO, J. S.) - ATUA NA EMPRESA","ALBERTO QUEIROZ SOUZA (ALBERTO, Q. S.)","JOSÉ CÉSAR PEREIRA (J. C. P.)","ALKMIM, MARCIO"),
Titulo.4 = c("JOSÉ CÉSAR PEREIRA (JOSÉ, CÉZAR PEREIRA)","LENILTON FRAGOSO (FRAGOZO, LENILTON) - ATUA NA FIOCRUZ","ALKMIM, MARCIO","ALBERTO (ALBERTO, Q C)","BERNARDO JOSÉ SILVA (B, J. S.)"),
Titulo.5 = c("BERNARDO JOSÉ SILVA (BERNARDO, JS)","JOSÉ CÉSAR PEREIRA (JOSÉ, C. PEREIRA) - ATUA NA FIOCRUZ","LENILTON FRAGOSO (FRAGOZO, L.)","ALKMIM, MARCIO","ALBERTO QUEIROZ SOUZA (ALBERTO, Q-S.)"),
stringsAsFactors = FALSE)
and when found I should add "- acts in the company", thus getting "josé, c. p. - acts in the company", for example.
but IF the match in df already present the "- acts in the company", obviously does not need.
I’m trying to match first with something like this:
for (x in n) {
result <- sapply(df, gsub, pattern = x, ...)
#ou
result <- sapply(df, str_replace, pattern = x, ...)
}
but it’s hard.
Fernando, I don’t understand the logic of your data.frame. You have several columns with repeated values. You want to do it in all columns. Can the name appear more than once in a column? Are you sure you want to keep each name in a format?
– Molx
In df, each column is an Article Title with the respective authors. Of these authors only one (in each column), appears with identification that "acts in the company" but this same author appears in other titles (columns) but without the identification that acts in the company.
– Fernando
So, I need to check if it appears in more Titles and when I find check if there is the identification of "- acts in the company", if not, I should put "- acts in the company" in front of his name. In each column you will only have it, but in others it can also appear (with or without the identif)
– Fernando
See "ALBERTO QUEIROZ SOUZA (ALBERTO, Q-S.) - ACTS IN THE COMPANY." appears with the identification " - acts in the company" only in the column Titulo.1 In the other columns "ALBERTO QUEIROZ SOUZA (ALBERTO, Q. S.)" appears but without the identification. Need to put " - operates in the company" in all "ALBERTO QUEIROZ SOUZA (ALBERTO, Q )" qm all q find.
– Fernando
I think a nice regex would help you.
– André Muta