manipulating csv file

Asked

Viewed 238 times

2

good people. I need to use a csv file, which has more than ten columns and two thousand lines. I’m not sure how to get a single column. this program below, I took on the internet.

separador = ','

with open('owid-covid-data.csv', 'r') as txt_file:
    for line_number, content in enumerate(txt_file):
        if line_number: 
            colunas = content.strip().split(separador)
            print(f"\nPaís: {colunas[2]}")

2 answers

1

Python has a lib csv that can help you, I think it is native and does not need to install anything, only matter, with it becomes very simple to work with files csv, let’s explain, to test create a file to check the functionality:

tt.csv (a silly and clueless lol test)

Nome, nota, media, aprovado
ederwander, 5, 3, nao
thomas, 10, 8, sim
maria, 9, 7, sim
luiz, 0, 4, nao
joao, 6, 5, nao
fernanda, 2, 6, sim

Now we can demonstrate how easy it is to grab the column you want, notice that the file has standard delimiters separated by , and that has 4 columns, the first column begins with 0 the second column shall be the 1 and so on ...

So let’s make a simple example just to take the last column of the number 3:

import csv

with open('tt.csv') as csv_file:
    
    ler_csv = csv.reader(csv_file, delimiter=',')

    ler_csv.__next__()

    for coluna in ler_csv:
        #print( row[0] + ', ' + row[1] + ', ' + row[2] )
        print( coluna[3])

The above example will print column 4:

C:\Python33>python.exe testcsv.py
 nao
 sim
 sim
 nao
 nao
 sim

see in the code the other example print commented, if you want to print the other columns, I think that’s it, that’s the basics ...

0

Python has pandas which is a very useful library, mainly for handling large files. Example:

import pandas as pd

# Cria-se uma DataFrame para receber o arquivo
# Pode se passar mais parametros como a forma como o arquivo e separado, padrão é por virgula
df = pd.read_csv('nome_arquivo.csv')

# Você pode usar o proprio nome da coluna ao df, contudo se o nome tiver espacos, acentos usa-se colchetes
# Funcao value_counts ira contar os valores das diferentes aparições na coluna
var = df.Nome_Coluna.value_counts()
var2 = df['Nome Coluna'].value_counts()

# Voce pode buscar uma string especifica em uma coluna
palavra = df[df.Nome_Coluna.str.contains('palavra')]

# Pode agrupar a partir de 'palavra'
# size() = numero de entradas
# Voce tambem pode plotar graficos com plot()
palavra.groupby('Outra Coluna').size().plot(kind= 'barh')

This link helped me a lot to use this library: https://towardsdatascience.com/analysis-of-boston-crime-incident-open-data-using-pandas-5ff2fd6e3254

Browser other questions tagged

You are not signed in. Login or sign up in order to post.