0
I have constantly uploaded a data frame with more than 6 million records and 40 variables (columns). From these records, I need to select
only one record per patient and that this record is as significant as possible. This significance is based on the variables: "Paciente"
, "ResultadodoExame"
and "ResultadoComplementar"
, following a priority criterion (1, 2, 3 and 4) for sorting. The lower the number (order - before data start) the higher the priority.
For example: the priority of variable data "Resultados"
is in the following order:
1-Pos, 2-Neg, 3-Inter and 9-Empty
and the variable "ResultadoComplementar"
is in order:
1-DN1, 2-DN2, 3-DN3, 4-DN4 and 9-Empty.
From this variable, I need to create a filter labeling with "0" or "1", being "1" for the significant record.
I need to perform other tasks, which depends on the entire dataset, so it is important to have the filter (variable).
The data frame below has the simulation of the data. need to create a routine that manages the Filter ("0" and "1") field, remembering that "1" is always the significant)
####Data frame
a=c("Matheus Fulano da Silva","Matheus Fulano da Silva","Fernandes Fulano da Silva","Fernandes Fulano da Silva","Fernandes Fulano da Silva","Fernandes Fulano da Silva","Manuel Fulano da Silva","Manuel Fulano da Silva","Manuel Fulano da Silva","Manuel Fulano da Silva","Manuel Fulano da Silva", "Carlos Fulano da Silva","Carlos Fulano da Silva","Carlos Fulano da Silva","Carlos Fulano da Silva","Joao Fulano da Silva","Joao Fulano da Silva","Joao Fulano da Silva","Joao Fulano da Silva")
b=c("9-Vazio","3-Inter","2-Neg","1-Pos","1-Pos","3-Inter","1-Pos","1-Pos","9-Vazio","2-Neg","3-Inter","1-Pos","2-Neg","9-Vazio","1-Pos","2-Neg","3-Inter","1-Pos","2-Neg")
c=c("9-Vazio","9-Vazio","9-Vazio","2-DN2","1-DN1","9-Vazio","1-DN1","1-DN1","9-Vazio","9-Vazio","9-Vazio","9-Vazio","9-Vazio","9-Vazio","1-DN1","2-DN2","3-DN3","9-Vazio","9-Vazio")
#meu data frame
d=data.frame(a,b,c)
names(d)<-c("Paciente","ResultadoExame","ResultadoComplementar")
# preciso criar uma função para gerar o filtro
d=c("0","1","0","0","1","0","1","0","0","0","0","0","0","0","1","0","0","1","0")
#Objetivo final
d=data.frame(a,b,c,d)
names(d)<-c("Paciente","ResultadoExame","ResultadoComplementar","filtro")
From now on, thank you very much.
Hug
Note that you actually want to create an indicator, to later apply a filter. As this future indicator depends on 3 variables, do they have the same weight? The sum of the 3 can be a good indicator of significance?
– bbiasi
Hello Mouradev. Initially I thought about this indicator (sum), but it didn’t work because there are cases I can have in the same patient (two lines of results) that the total of the sum will be the same. For example, a 2-Neg in the first line of the patient, and in the second line of patient 3-In...
– Ronaldo Jesus