Python Pandas: regravando pd.read_table() with original comments

Asked

Viewed 546 times

2

I have a tab-separated file in which the first lines are comments designated with '#'. I use the following code to pull the file without the comments... The file is something like:

#comentario
#comentario
#comentario
#comentario
#comentario
Header1 Header2 Header3
a b c
d e f
g h i

And then I use the code below to load it without the comments...

import pandas as pd
file_in = pd.read_table('arquivo.tsv', comment='#')

In this way:

Header1 Header2 Header3
a b c
d e f
g h i

After that I make some changes to the Header1 column based on information from another file, and rewrite the file file_in:

file_in.to_csv('arquivo.csv', index=False, sep='\t')

The point here is that I would like the comments to return as in the original, but the saved file starts with Header and no longer with the comments!

1 answer

3


The problem is that comments are simply being ignored in reading. Pandas does not represent comments internally because this is something specific to this storage format (i.e., CSV; if you save the table in an SQL database, for example, there are no "comments"). So the most you can do is ask the reading function to ignore the lines with the comment character.

If you want to keep the comments, I suggest reading them along with the table (in a distinct code snippet), storing them in a list, and then recording them before making the table recording.

Here’s an example of code:

import pandas as pd

commentChar = '#'

# Primeiro, lê os comentários do arquivo original
comments = []
with open('arquivo.tsv', 'r') as f:
    for line in f.readlines():
        if line[0] == commentChar:
            comments.append(line)

# Agora, lê a tabela ignorando os comentários
file_in = pd.read_table('arquivo.tsv', comment=commentChar)

# Abre o arquivo de destino para escrita, grava os comentários antes
# e só depois grava a tabela (note que ao invés de receber o nome do arquivo,
# a chamada de to_csv recebe o handler do arquivo aberto, já posicionado onde
# deve começar a gravação).

with open('arquivo.csv', 'w') as f:
    for comment in comments:
        f.write(comment)

    file_in.to_csv(f, index=False, sep='\t')
  • I understand Luiz Vieira! In this case everything I want to make changes to my file file_in i realize between file_in = pd.read_table('arquivo.tsv', comment=commentChar)
 and with open('arquivo.csv', 'w') as f:
 for comment in comments:
 f.write(comment)

  • 1

    No, no. The with open('arquivo.csv', 'w') open the file for recording, then only afterward of that line is that you can record anything (and, of course, while within the with).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.