How to count the records of a csv in python?

Asked

Viewed 3,451 times

0

I’m a beginner in Python and I’m having a hard time.

I have a CSV database separated by ";" with 6 columns. I’d like to count how many records you have, the sex of which is "Female". You can tell me how to do it or where to start my studies?

Below is the code I have so far.

import csv
with open('teste.csv', 'r') as arquivo:
delimitador = csv.Sniffer().sniff(arquivo.read(1024), delimiters=";")
arquivo.seek(0)
leitor = csv.reader(arquivo, delimitador)
dados = list(leitor)
for x in dados:
    print(x)

I converted it into a list, but I couldn’t think of a way to check for "female" in the line.

Any tips for beginner will also be welcome.

3 answers

1


To count the lines/lists that have "Feminine", you can do so:

import csv

l_feminino = []
with open('teste.csv', 'r') as arquivo:
    delimitador = csv.Sniffer().sniff(arquivo.read(1024), delimiters=";")
    arquivo.seek(0)
    leitor = csv.reader(arquivo, delimitador)
    dados = list(leitor)
for x in dados:
    if('Feminino' in x): # se existir 'Feminino'
        l_feminino.append(x) # vamos armazenar o registo

print(l_feminino) # todos os registos que tem "Feminino"
print(len(l_feminino)) # quantos registos que tem "Feminino"

Honestly, you don’t even have to import any module for this, if this is it, you can just:

dados = []
with open('teste.csv') as arquivo:
    lines = arquivo.read().split()
cols = lines.pop(0).split(';')
dados = [i.split(';') for i in lines]
dados_f = [i for i in dados if "Feminino" in i]
print(cols) # colunas
print(dados) # todos os registos
print(dados_f) # registos femininos
  • Thanks for the answer. For now, only if "Female" inside the for already suits me, is a simple algorithm. Vlw

  • You are welcome to reply @Matheusmacedo: https://pt.meta.stackoverflow.com/questions/1078/como-e-por-que-aceitar-uma-reply

0

No need to iterate over CSV lines. You can get what you need by using pandas:

import pandas as pd

df = pd.read_csv('test.csv')

print(len(df[df['Sexo'] == "Feminino"].index))

Explaining:

Returns a List with True and False for all lines that meet the condition

df['Sexo'] == "Feminino"

When placed inside the df itself, you are "cutting" the dataframe only with the lines that meet the condition:

df[df['Sexo'] == "Feminino"]

Then the Len() gives you the number of lines.

Another way to achieve the same goal eh using the Shape:

print(df[df['Sexo'] == "Feminino"].shape[0])

Shape is a tuple that:

  • shape[0] returns the number of rows
  • shape[1] returns the number of columns

0

To get the count of lines that have the occurrence "Female" in a given file, you can do something like:

nlinhas = 0

with open('teste.csv') as arquivo:
    for linha in arquivo:
        if "Feminino" in linha:
            nlinhas += 1

print(nlinhas) 

Or simply:

nlinhas = sum( 1 for line in open('teste.csv') if "Feminino" in line )
print(nlinhas)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.