How to delete the first line in a python CSV file

Asked

Viewed 2,910 times

1

I need to delete the first line of a csv file, which is the header and has no use. Before deleting the line the script should write with the same name the csv file (without the deleted line). I searched for the panda module and the python itself and could not find. Can help an enthusiast?

Example:

ACYPR556- VALORES UTILIZADOS P/ CÁLCULO DO ÍNDICE,,,ANO BASE - 017,EXERCÍCIO - 2018,VIGÊNCIA - 2019,
CODIGO,MUNICIPIO,V.A ANTERIOR,V.A ATUAL,RECEITA PROPRIA,POPULACAO,AREA(KM2)
00500-2,ACORIZAL,"37644152,27","60575938,58","812558,81",5269,850
01000-6,AGUA BOA,"710615986,43","735690297,77","14285458,94",24501,7410

3 answers

6

To save a dataframe without the header, just do

df.to_csv('arquivo.csv', header=False)

If you also want to remove the indexes

df.to_csv('arquivo.csv', header=False, index=False)

So, if you have a type CSV

id,nome
1,Anderson
2,Carlos
3,Woss

In doing

import pandas as pd

df = pd.read_csv('data.csv')
df.to_csv('data.csv', header=False, index=False)

You will have:

1,Anderson
2,Carlos
3,Woss

Without using Pandas, you can scroll through the lines of the CSV file by ignoring the first, referring to the header, and writing the rest in a new file, as well as answered by Alexciuffa, However, the way he implemented the contents of the entire file will be stored in memory. If the CSV file is too large, it will consume unnecessary machine resources and can even stop it. The best way to implement is to take advantage of the generator that function open returns, keeping only one line of the file in memory at a time:

with open('arquivo_com_cabeçalho.csv') as stream,
     open('arquivo_sem_cabeçalho.csv', 'w') as output:
    next(stream)  # Ignora a primeira linha do arquivo de entrada
    for line in stream:
        output.write(line)

0

From what I’ve seen in the documentation you can use your own read_csv of pandas to make a ignore the line.

df = pd.read_csv( "meu.csv", skiprows=[0] );
#                                      ^ lista com as linha que devem ser ignoradas

and to save in csv again use the to_csv

df.to_csv( "meu-novo.csv" , index=False);
  • to jump more than one line, would know how?

  • If you put skiprows=20 it will skip the first 20 lines. If you put a list Ex. skiprows=[ 0, 2, 4, 10 ], it will skip the index lines 0, 2, 4 and 10

0

With Python alone, it is possible to read like this:

import csv

# Lê um CSV com o cabeçalho
with open("arquivo.csv") as f:
    reader = csv.reader(f)
    #next(reader) # skip header
    data = [r for r in reader]

# Separa o cabeçalho e os dados
header = data[0]
data = data[1:]

After performing operations with the data, we can write like this:

with open("arquivo_modificado.csv", 'w') as file:
    writer = csv.writer(file)
    writer.writerow(header)
    writer.writerows(data)

As pointed out by @Anderson in the comments, there is no need to upload all data to memory, especially if the amount of data is too large. We can already open the file and read line by line, already changing and saving each line in another csv.

with open("arquivo.csv") as arquivo_leitura:
    reader = csv.reader(arquivo_leitura)
    cabecalho = next(reader, None)

    with open("arquivo_modificado.csv", 'w') as arquivo_escrita:
        writer = csv.writer(arquivo_escrita)
        writer.writerow(cabecalho)

        for linha in reader:
            linha = realiza_operacao(linha)
            writer.writerow(linha)
  • Thank you!!! Perfect.

  • 2

    I recommend that you do the operation within the same context manager, without storing all lines of the file in memory. With this you will have a very considerable resource gain, especially if the file in question is very large.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.