Find equal values between 2 columns of . csv and refresh another column X

Asked

Viewed 1,160 times

1

Hello,

I have two files . csv as below:

.csv1

POS,ID
28000167,.
28000263,.
28000484,.
28000711,.
28000728,.
28000885,.
28089922,.
28089927,.
28090173,.
28090325,.
28090326,.
28090331,.
28090415,.
28090467,.
28096247,.
28096264,.
28096284,.

.csv2

POS,ID
28000167,rs75043266
28000263,rs151210258
28000484,rs7175266
28000627,.
28000711,.
28000728,rs140405700
28000885,.
28089732,.
28089847,.
28089876,.
28089898,.
28089922,rs12592271
28089927,rs113937352
28090008,.
28090173,rs12592307
28090325,rs187389297
28090326,rs74005248
28090331,rs113905655

I would like to update the Row[1] values of the . csv1 file with the Row[1] values of the . csv2 file if the Row[0] of . csv1 is present in Row[0] of . csv2.

In this case my file . csv1 would become as:

.csv1

POS,ID
28000167,rs75043266

and so on for all the other iterations he finds...

What I have so far of code is not much since I didn’t get any iteration...

import csv

csv1 = open("arquivo1.csv")
reader1 = csv.reader(csv1, dialect='excel-tab')
csv2 = open("arquivo2.csv")
reader2 = csv.reader(csv2, dialect='excel-tab')

next(reader1, None)
for row1 in reader1:
    next(reader2, None)
    for row2 in reader2:

Any help would be welcome! Thank you.

1 answer

5


Do it this way:

  1. Read the data of both files in a dictionary. The best way to do this is, in my opinion, by using the DictReader.
  2. Process each dictionary item created from CSV2, and add/update the value over the value of the same "key" (POS column) in CSV1.
  3. Rewrite the CSV1 file.

Code example:

import csv
from collections import OrderedDict

# Lê ambos os arquivos e armazena em um dicionário mapeando a chave (POS) ao conteúdo (ID)
csv1 = OrderedDict()
csv2 = OrderedDict()

with open('csv1.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        csv1[row['POS']] = row['ID']

with open('csv2.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        csv2[row['POS']] = row['ID']

# Processa cada item de CSV2 e atualiza o conteúdo em CSV1
for p, i in csv2.items():
    if p in csv1: # Remova essa verificação se quiser adicionar os itens de CSV2 inexistentes em CSV1
        csv1[p] = i

# Regrava o arquivo CSV1
with open('csv1.csv', 'w', newline='') as f:
    writer = csv.DictWriter(f, fieldnames=['POS', 'ID'])
    writer.writeheader()
    for p, i in csv1.items():
        writer.writerow({'POS': p, 'ID': i})

If you are using the fantastic library Pandas, a solution that I prefer and that is much more succinct and easy is this:

import pandas as pd

# Lê ambos os arquivos e armazena em tabelas do Pandas
csv1 = pd.read_csv('csv1.csv')
csv2 = pd.read_csv('csv2.csv')

# Filtra os elementos de CSV2 cujo ID seja diferente de '.' e cuja POS exista em CSV1
query = csv2[(csv2['ID'] != '.') & (csv2['POS'].isin(csv1['POS']))]

# Copia apenas a coluna ID da view filtrada de CSV2 para a tabela CSV1
csv1['ID'] = query['ID']

# Regrava o arquivo CSV1
csv1.to_csv('csv1.csv', index=False)
  • Excellent! I understood both methods perfectly! I already know Pandas because I used it in a course of data analysis of Coursera... However I thought it was more used to generate graphics and Plots and I forgot this part of "basic" edition of it! I’ll take a closer look at it, since most of my data analysis files are in csv!!!! Very good! Thank you!!

  • Not at all. Dude, Pandas exists exactly to help with data analysis. I strongly advise its use. Good luck!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.