Separate data from a txt file

Asked

Viewed 2,778 times

0

I have a relatively large file of data, which I removed from a time-stamping machine, but it comes in the following way:

00003000527005792012635570932000219305130720170713
00003000527005792012635570932000219305130720170713

I would like to separate this data into columns so that I can export to Excel:

00003000527005792 - numero de serie do relógio | 012635570932 - Numero do PIS | 000219305 - NSR  |13 - dia | 07 - Mês | 2017 - Ano |07 - hora |13 - minuto

Thus remaining:

00003000527005792 012635570932 000219305 13 07 2017 07 13

Well, so far I’ve been able to read the data using this code:

arquivo = open('DATA.txt', 'r')
for linha in arquivo:
    print(linha)
arquivo.close()

How can I apply the slice in this context? Because I can do this in one sentence, but I don’t know how to apply it in several lines

  • print(line[1:17] + ';' + line[18:29] + ';' + line[30:38] + ';' + line[39:40] + ';' + line[41:42] + ';' + line[43:46] + ';' + line[47:48] + ';' + line[49:50] )

3 answers

2


Assuming each line will always have the same formatting, number of digits etc...

I don’t think it’s a good idea to separate by spaces, since the column names have spaces, in this case I’ll separate by ";".

You can do it like this:

cols = ['numero de serie do relógio', 'Numero do PIS', 'NSR', 'Dia', 'Mês', 'Ano', 'Hora', 'Minuto']
novos_dados = ''
with open('DATA.txt') as f:
    for l in f: # o slice vai ser feito na linha abaixo para cada linha
        novos_dados += '{};{};{};{};{};{};{};{}\n'.format(l[:17], l[17:29], l[29:38], l[38:40], l[40:42], l[42:46], l[46:48], l[48:])
content = '{}\n{}'.format(';'.join(cols), novos_dados)

# gravar content em um csv
print(content, file=open('novos_dados.csv', 'w'))

In principle the .csv will open by default in excel, in which case you should choose the ";" tab when importing the file novos_dados.csv

DEMONSTRATION

  • That’s just what I needed, thanks buddy. I can now do for other applications as well. Explain something to me, when you say stop separating using the ; in Excel it understands this as being columns?

  • And what is content?

  • 1

    Exact @WSS, when you open a csv file normally by default the associated application is excel, at that point you will be asked to choose the tab you want to split the columns, in this case the ;. content is the variable that will contain all the content that will be saved in the file, you can make a print(content) to see better

  • 1

    I tested it here. Now the code is clear. Thanks @Miguel.

0

import pandas as pd

df = pd.read_fwf('stack.txt', widths=[17,12,9,2,2,4,2,2], header=None, index_col=None)

df.to_csv('stack2.csv', index = False, header=False)

0

An alternative is to use regex in this way:

import re
with open(arquivo) as fp:
    data = list([
        re.match(r'^(?P<serial_number>\d{17})(?P<PIS>\d{12})(?P<NSR>\d{9})(?P<day>\d{2})(?P<month>\d{2})(?P<year>\d{4})(?P<hour>\d{2})(?P<minute>\d{2})', linha).groupdict()
        for linha in fp])

Browser other questions tagged

You are not signed in. Login or sign up in order to post.