0
suppose the following dataframe:
ref<-data.frame(autores=c("AZEVEDO, L. S.; NASCIMENTO, E. F.; CANDEIAS, A. L. B.",
"BERGER, R.; SILVA, J. A. A.; FERREIRA, R. L. C.; CANDEIAS, A. L. B.; RUBILAR, R.",
"AZEVEDO, L. S.; CANDEIAS, ANA LÚCIA BEZERRA",
"SILVA, JADSON FREIRE; MIRANDA, RODRIGO QUEIROGA; CANDEIAS, ANA LÚCIA BEZERRA",
"OLIVEIRA, CLAUDIANNE BRAINER DE SOUZA; CANDEIAS, ANA LÚCIA BEZERRA; TAVARES JUNIOR, J. R.",
"SANTOS, AMANDA PEREIRA; SILVA, EDER BATISTA DA; CANDEIAS, ANA LÚCIA BEZERRA; COSTA, MARIA APARECIDA TENÓRIO DA",
"SILVA, JADSON FREIRE; MIRANDA, RODRIGO QUEIROGA; CANDEIAS, ANA LÚCIA BEZERRA",
"SILVA, JADSON FREIRE; PAZ, YENÊ MEDEIROS; LIMA-SILVA, PEDRO PAULO; PEREIRA, JOÃO ANTÔNIO DOS SANTOS; CANDEIAS, ANA LÚCIA BEZERRA",
"ALEXANDRE, FERNANDO DA SILVA; CANDEIAS, ANA LÚCIA BEZERRA; GOMES, DANIEL DANTAS MOREIRA"))
autores
1 AZEVEDO, L. S.; NASCIMENTO, E. F.; CANDEIAS, A. L. B.
2 BERGER, R.; SILVA, J. A. A.; FERREIRA, R. L. C.; CANDEIAS, A. L. B.; RUBILAR, R.
3 AZEVEDO, L. S.; CANDEIAS, ANA LÚCIA BEZERRA
4 SILVA, JADSON FREIRE; MIRANDA, RODRIGO QUEIROGA; CANDEIAS, ANA LÚCIA BEZERRA
5 OLIVEIRA, CLAUDIANNE BRAINER DE SOUZA; CANDEIAS, ANA LÚCIA BEZERRA; TAVARES JUNIOR, J. R.
6 SANTOS, AMANDA PEREIRA; SILVA, EDER BATISTA DA; CANDEIAS, ANA LÚCIA BEZERRA; COSTA, MARIA APARECIDA TENÓRIO DA
7 SILVA, JADSON FREIRE; MIRANDA, RODRIGO QUEIROGA; CANDEIAS, ANA LÚCIA BEZERRA
8 SILVA, JADSON FREIRE; PAZ, YENÊ MEDEIROS; LIMA-SILVA, PEDRO PAULO; PEREIRA, JOÃO ANTÔNIO DOS SANTOS; CANDEIAS, ANA LÚCIA BEZERRA
9 ALEXANDRE, FERNANDO DA SILVA; CANDEIAS, ANA LÚCIA BEZERRA; GOMES, DANIEL DANTAS MOREIRA
>
There is a repeated value: "SILVA, JADSON FREIRE; MIRANDA, RODRIGO QUEIROGA; CANDEIAS, ANA LÚCIA BEZERRA"
I can identify through "duplicated()"
duplicated(ref)
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
I can identify the position where the duplicate value is with "which()"
which(duplicated(ref))
[1] 7
But what I really wanted was to return a dataframe only with the repeated value.
The file in Excel: references
I import the file
df<-rio::import("coautoria.artigos.original.xlsx")
Being a data.frame with multiple columns, I use the option to keep all columns
df2<-df[duplicated(df$artigo), ]
I try to organize the repeated articles by arranging the data order from the "article" column. But the result doesn’t just bring back repeated articles.
df2 %>%
arrange(artigo)
Some repeated articles appear, but others do not.
Should return only the repeated, no?
An example right at the beginning of the frame date: the first article that appears ("THE PRODUCTION OF THE TOURIST AREA VIA ACCUMULATION...") is repeated. The same article is authored (column "teacher") of "Itamar" and "Edvania".
It should, then, appear one below the other, right? One referring to the teacher "Edvania" and another to the teacher "Itamar". Or I’m wrong?
repetido <- ref[duplicated(ref), , drop = FALSE]
. It is necessary to usedrop = FALSE
to maintain the dataframe structure.– Rui Barradas
I’m sorry Rui, but I don’t get it. The square brackets open a "line" and "column" reference, don’t you think? In this case, "duplicated(ref)" would be the line of "ref [ ]" and ", ," would be referring to "all columns"? I don’t get it. What works, works! I tested it here, but I wanted to understand
– itamar
Yes, the
, ,
refers to all columns. When only row index is the same as saying "this row, regardless of column, that is, all". The same happens when you only have column index(s), you are referring to all rows.– Rui Barradas