1
I have the following basis of dice that contains enrollments of an institution that is developing a research, in the future to make a mapping of students I need an id that identifies each student, where with this id I can identify the student the following year, since I have data from 2009 to 2018.
To create this id I am using the columns with the attributes of students that does not change regardless of the year, which in case are TP_COR_RACA
, TP_SEXO
, NU_ANO_NASCIMENTO
, NU_MES_NASCIMENTO
and NU_DIA_NASCIMENTO
, the idea is that from these attributes I create a
id that repeats whenever that combination is the same.
I tried to do it this way:
def getKey(row):
# retorna um str com os valores das colunas que são imutaveis independente do ano
# que será usado como chave para gerar o Id.
key = f"{row['TP_COR_RACA']}{row['TP_SEXO']}{row['NU_ANO_NASCIMENTO']}\
{row['NU_MES_NASCIMENTO']}{row['NU_DIA_NASCIMENTO']}"
return key
def getDicId(df):
# retorna um dicionário em que os index são as chaves, sem repetir.
dic_id = {}
for index, row in df.iterrows():
key = getKey(row)
dic_id[key] = 'null'
# cria um id para cada chave do dicionario.
idx = 1
for i in dic_id:
dic_id[i] = idx
idx += 1
return dic_id
def getId(df):
df.loc[df_concat.TP_SEXO == 2, 'TP_SEXO'] = 0
dic_id = getDicId(df)
df['Id'] = 'null'
# Obtem uma chave de cada matricula e atribui sue id usando o
# dicionario que tem o index igual sua chave.
for index, row in df.iterrows():
key = getKey(row)
df.loc[index, 'Id'] = dic_id[key]
return df
But it didn’t work very well, the same id was generated for different students even using these attributes as key contrary to what I expected, the result can be checked here, if anyone knows where the mistake is or how else I can do it I appreciate it.
Instead of creating a
ID
it might be more convenient to use theCPF
of the student, since, by its nature, it is non-transferable and unalterable. This way, you can access the student’s data, many years after the same has left the educational institution.– Solkarped
It would be a good but does not have the CPF in the database
– Renato Lopo
Important you [Dit] your question by providing a [mcve] problem that makes it clear what went wrong in the solution attempt. To better enjoy the site, understand and avoid closures and negativities worth reading the Stack Overflow Survival Guide in English.
– Bacco
@Bacco gave an improved question
– Renato Lopo