Comparison of value within data frame

Asked

Viewed 215 times

5

Hello, I have a database, with about 50000 remarks, as follows, only figurative values:

nome<-c("joão","pedro", "joãoo")
identificador<-c(123456,124578,123456)
valor<-c(2145,350,23)
dados=data.frame(nome,identificador,valor)

I would like to identify individuals with the same identifier and create a new variable as follows:

nome=c("joão","pedro", "joãoo","maria","mariaa","carla","felipe","vitor","pedro","vitorr")
identificador=c(123456,124578,123456,000,000,123,156,2222,3232,2222)
valor=c(2145,350,23,32,12,32,1,2,54,4)'
validor=c(1,0,1,2,2,0,0,3,0,3)
dados=data.frame(nome,identificador,valor,validor)

I did so to identify the equal identifiers, but n manage to make this variable.

x<-dados$identificador
length(x)
i=1
k=1
validor=0
validor[1:50000]=0
for(i in 1:50000){
  for(j  in 1:50000){
    if(x[j]==x[i] & i!= j ){
      validor[j]=k
    }
  }
}

I would like to create a function that produces the value variable as shown. I hope I have been clear, and I thank you very much for your help.

  • Where is dados? I think you should have passed the dataframe for the given variable: dados = data.frame(... - the same functional as the <-, only it is one character less, so I like to use. What result you expect?

  • edited, I would like to create the variable "validator" identifying the pairs, or set, of identifier with an algorithms

  • You can explain how this sequence is formed? validor = c(1, 0, 1, 2, 2, 0, 0, 3, 0, 3). It seems to me that you have created an array with the order in which the numbers repeat. E.g.: 123456 is the first to repeat, 000 is the second to repeat, and finally, 2222 is the third to repeat, therefore they equal 1, 2 and 3 respectively. The others do not repeat, so they receive 0.

  • exactly that

1 answer

4


I think this is very close to what you want. The difference is that the equal identifiers will not be in that order:1,2,3...

library(uniqueAtomMat)
library(tuple)
identificador<-c(123456,124578,123456,000,000,123,156,2222,3232,2222)
validor<-grpDuplicated(identificador) # agrupa idenficadores iguais dentro de uma mesma categoria
validor[match(orphan(validor),validor)]<-0  #Atribui zero aos identificaores órfãos.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.