1
I have a csv with data from some matriculations I’m doing a study, I generated an id for each matriculation, and wanted to check if it repeats in the same year.
I have the following code:
# retorna true se o idx for repetido.
def repetido(idx, ano, df):
cont = 0
for index, row in df.iterrows():
if idx == row['Id'] and ano == row['NU_ANO_CENSO']:
cont += 1
if cont > 1:
return True
else:
return False
# imprime uma lista com os id que se repetem, se tiver algum.
def contIdRepetido():
df = pd.read_csv('../dados/dados_padronizados_matriculas_januaria_2009_2018_com_id.csv')
repetidos = []
for index, row in df.iterrows():
if repetido(row['Id'], row['NU_ANO_CENSO'], df):
repetidos.append(row['Id'])
print(f'Id repetidos: {repetidos}')
But this way it’s taking too long to execute, someone knows somehow more efficiently to do it?
csv can be found here
I did in a slightly different way to returns the values greater than one, plus this was the method with higher performance, thank you!
– Renato Lopo
Yes, there are several ways to return these values. I just wanted to show you that it was possible by doing the pandas functions. I’m glad it helped you! Hugs!!
– lmonferrari