Read a.dat file and assign names to its columns using Pandas

Asked

Viewed 717 times

1

How do I assign names in columns using pandas? For example, consider a arquivo.dat in the following format:

3.141592543 3.141592543 3.141592543 3.141592543

3.141592543 3.141592543 3.141592543 3.141592543

3.141592543 3.141592543 3.141592543 3.141592543

I seek to know how to make it in the following format below:

col1        col2        col3        col4

3.141592543 3.141592543 3.141592543 3.141592543

3.141592543 3.141592543 3.141592543 3.141592543

3.141592543 3.141592543 3.141592543 3.141592543

2 answers

1

The "dataframe" in Pandas, which is the data structure that binds columns with name to the data itself accepts, to be constructed, precisely, a "two-dimensional sequence" with the data as an argument and the names of the columns as another argument.

So, if you have a txt file with the numbers in columns separated by space, as shown above, you can use a little Python to read that file, cut the lines where the spaces are, and convert each number of a "string" object (as read from the file) for number (in this case, the "float type").

The syntax of the language allows all this to be done in a single expression - that is, it is possible to do:

import pandas as pd

dados = pd.DataFrame( 
    [[float(token) for token in line.split()]  
        for line in open("arquivo.dat") if line.strip()],
     columns = ["col1", "col2", "col3", "col4"]
)

0


Whereas you have already been able to interpret the . dat file for Dataframe and your goal is just to name the columns:

df
3.141592543 3.141592543 3.141592543 3.141592543
3.141592543 3.141592543 3.141592543 3.141592543
3.141592543 3.141592543 3.141592543 3.141592543

df.columns = ['col1', 'col2', 'col3', 'col4']
col1 col2 col3 col4
3.141592543 3.141592543 3.141592543 3.141592543
3.141592543 3.141592543 3.141592543 3.141592543
3.141592543 3.141592543 3.141592543 3.141592543

Whereas you haven’t read the file yet, so you still need to interpret . dat

import pandas as pd
from io import StringIO

dat = """3.141592543 3.141592543 3.141592543 3.141592543
3.141592543 3.141592543 3.141592543 3.141592543
3.141592543 3.141592543 3.141592543 3.141592543"""

df = pd.read_csv(StringIO(dat), sep="\s+", header=None)

df.columns = ['col1', 'col2', 'col3', 'col4']

print(df)

read_csv will perform the reading of a file within your file system and will try to turn it into a Dataframe.

sep is the separator (delimiter) between the columns of the file

\s+ is a regex that is searching for occurrences of space. You can read more about regex here

header=None says that your file . dat has no header (which will be added later, in your case)

  • Thank you so much for the answers. Both solved my problem!!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.