I need to find the duplicates inside a CSV and count them

Asked

Viewed 53 times

-1

I am trying to read the amount of repetitions in column A and B of a CSV and see the amount of times A and B are equal as the example below:

input:

[email protected] | [email protected]

[email protected] | [email protected]

[email protected] | [email protected]

[email protected] | [email protected]

[email protected] | [email protected]

[email protected] | [email protected]

output:

[email protected] | [email protected] | 2

[email protected] | [email protected] | 2

[email protected] | [email protected] | 1

[email protected] | [email protected] | 1
     import csv
        import collections

        with open(r"grafo.csv") as f:
            csv_data = csv.reader(f,delimiter=",")
            count = collections.Counter()

            for row in csv_data:
                address = row[0]
                count[address] += 1


            for address, nb in count.items():
                if nb > 1:
                    print('{} é um endereço duplicado, visto {} vezes'.format(address, nb))
                else:
                    print('{} é um endereço exclusivo'.format(address))

The code above was taken from the Internet, but it only takes the amount of repetitions from a single column.

after the CSV is processed I want q it generates another CSV that contains a C column that shows the amount of repetitions

1 answer

0


Fabio, using part of the code you posted, is possible to do as follows:

import collections

count = collections.Counter()

#Lê o arquivo e pega a quantidade de repetições
with open("grafo.csv") as f:
  for row in f:
    #Removo o \n que é a quebra de linha
    count[row.replace("\n","")] +=1

#Cria um arquivo com a nova coluna de repetições
with open("new_grafo.csv","w") as f:
  #Se precisa trocar o delimitador, seria na linha abaixo
  [f.write(f"{row},{qtd}\n") for row, qtd in count.items()]

#Exibe o novo arquivo no console
with open("new_grafo.csv") as f:
  [print(row.replace("\n","")) for row in f]

This example works with the delimiter being the comma, but it is possible to change this part to work with pipe or semicolon, as your need.


If you want to check the result, you can run it here:

https://repl.it/repls/CurlyWholeFan

Browser other questions tagged

You are not signed in. Login or sign up in order to post.