0
In two files .csv
, how can I know whether or not a particular user is in the files ?
These files have a different number of columns, but they have in common some fields (columns), for example:
NOME, CNS, CPF, PIS, SEXO, NOME DA MAE, NOME DO PAI ...
The problem is that there are null fields in these columns. Therefore, there is the problem of knowing if the user exists or not in the databases. In short I have two systems, X and Y, how to know if a certain user belongs to the two systems ?
import csv
arquivo = ['cidade_social.csv' , 'gestantes_prenatal.csv']
f1 = open(arquivo[0], 'r').readlines()
f2 = open(arquivo[1], 'r').readlines()
fNome = open('saida.csv', 'a')
for _ in range(2):
for row in f1:
if row not in f2:
fNome.write(row)
f1, f2 = f2, f1
I tried to that, but I think this is more for reading lines and not comparing fields.
I believe you are looking to mess with data science. It gives a research on the Pandas. Eh a library (powerful mt) specific for analysis and reading tables, columns.. I believe you will find the solution with this.
– Gui Reis
In this case, you would use pandas to create two tables (one for each file), from this you would stir as needed: if you want to take the blank gaps (null or n/a), look for some word or number inside the file... anyway. Da a researched about
– Gui Reis
I tried this: import pandas as pd dataset = pd.read_csv('citade_social.csv') print(dataset.head()) print(dataset['NAME']) print(dataset.SEX) print(dataset[['NAME','SEX','CNS']]) print(dataset.describe() Tranquil, I can do this for the other csv, but then, how to compare them and write in a new dataframe, the corresponding?
– Hudson Souza
It’s like you take the values of a column and read them.. if Voce wants to see if it has repeated name between two tables (probably in the same column - ex: name and name (duplicated names) the easiest way is to create a new dataframe with two columns, a column will receive the names of the two files (column merge into one) and also put the respective indises (where they are located in the table), thus vc uses: df["column"]. value_counts() (where: df is your created dataframe and "column" eh the column with all names) and it will display all names with a num (one of the times it appears)
– Gui Reis
I could tell?
– Gui Reis