Python/Pandas: Treatment of TXT

Asked

Viewed 85 times

0

The ERP I’m working on is generating a . txt in the following format:

ID:;1;Evento;Estado;Serial;Qtde
VM;LIBERADO;471;2
Data;id_m;id_c;Data inicial;Data final
01/01/2021  03:50:58;1623;280493;01/01/2021  03:50:50;01/01/2021  03:50:50
01/01/2021  14:27:19;3785;;01/01/2021  14:27:03;01/01/2021  14:27:03
VC;LIBERADO;471;3
Data;id_m;id_c;Data inicial;Data final
01/01/2021  03:51:15;1623;280493;01/01/2021  03:50:50;01/01/2021  03:50:50
02/01/2021  14:50:34;272474;3741;02/01/2021  14:47:53;02/01/2021  14:47:53
04/01/2021  03:46:17;280233;273461;04/01/2021  03:45:52;04/01/2021  03:45:52
VC;OCIOSO;471;1
Data;id_m;id_c;Data inicial;Data final
01/01/2021  10:27:25;1623;3714;01/01/2021  10:25:35;01/01/2021  10:25:35
ID:;2;Evento;Estado;Serial;Qtde
VC;OCIOSO;489;7
Data;id_m;id_c;Data inicial;Data final
01/01/2021  10:18:21;272889;;01/01/2021  10:18:14;01/01/2021  10:18:14
01/01/2021  13:44:40;283251;;01/01/2021  13:43:20;01/01/2021  13:43:20
02/01/2021  05:14:19;3199;;02/01/2021  05:14:12;02/01/2021  05:14:12
02/01/2021  16:47:04;2804;;02/01/2021  16:46:50;02/01/2021  16:46:50
03/01/2021  05:45:48;270;;03/01/2021  05:45:33;03/01/2021  05:45:33
03/01/2021  13:28:12;2591;;03/01/2021  13:28:01;03/01/2021  13:28:01
04/01/2021  05:38:42;2682;453;04/01/2021  05:38:34;04/01/2021  05:38:34
VM;LIBERADO;489;3
Data;id_m;id_c;Data inicial;Data final
04/01/2021  05:38:59;2682;453;04/01/2021  05:38:34;04/01/2021  05:38:34
04/01/2021  11:02:48;2006;3641;04/01/2021  10:58:38;04/01/2021  10:58:38
05/01/2021  16:13:38;2480;3275;05/01/2021  16:11:27;05/01/2021  16:11:27

We have great "Groups" of products, then some information about them.

In row 1, it is possible to observe the product group separations:

ID: 1 Event: VM State: Liberated Serial: 471 Qtof: 2 (The number of records that comes after)

And soon after comes the header with date, id_m, id_c, start date and end date.

I would like to treat this file so that the final result is as follows:

Data;id_m;id_c;Data inicial;Data final;ID;Evento;Estado;Serial;Qtde
01/01/2021  03:50:58;1623;280493;01/01/2021  03:50:50;01/01/2021  03:50:50;1;VM;LIBERADO;471;2
01/01/2021  14:27:19;3785;;01/01/2021  14:27:03;01/01/2021  14:27:031;VM;LIBERADO;471;2
01/01/2021  03:51:15;1623;280493;01/01/2021  03:50:50;01/01/2021  03:50:501;VC;LIBERADO;471;3
02/01/2021  14:50:34;272474;3741;02/01/2021  14:47:53;02/01/2021  14:47:531;VC;LIBERADO;471;3
04/01/2021  03:46:17;280233;273461;04/01/2021  03:45:52;04/01/2021  03:45:521;VC;LIBERADO;471;3
01/01/2021  10:27:25;1623;3714;01/01/2021  10:25:35;01/01/2021  10:25:35;1;VC;OCIOSO;471;1
01/01/2021  10:18:21;272889;;01/01/2021  10:18:14;01/01/2021  10:18:14;2;VC;OCIOSO;489;76
01/01/2021  13:44:40;283251;;01/01/2021  13:43:20;01/01/2021  13:43:20;2;VC;OCIOSO;489;7
02/01/2021  05:14:19;3199;;02/01/2021  05:14:12;02/01/2021  05:14:12;2;VC;OCIOSO;489;7
02/01/2021  16:47:04;2804;;02/01/2021  16:46:50;02/01/2021  16:46:50;2;VC;OCIOSO;489;7
03/01/2021  05:45:48;270;;03/01/2021  05:45:33;03/01/2021  05:45:33;2;VC;OCIOSO;489;7
03/01/2021  13:28:12;2591;;03/01/2021  13:28:01;03/01/2021  13:28:01;2;VC;OCIOSO;489;7
04/01/2021  05:38:42;2682;453;04/01/2021  05:38:34;04/01/2021  05:38:34;2;VC;OCIOSO;489;7
04/01/2021  05:38:59;2682;453;04/01/2021  05:38:34;04/01/2021  05:38:34;2;VM;LIBERADO;489;3
04/01/2021  11:02:48;2006;3641;04/01/2021  10:58:38;04/01/2021  10:58:38;2;VM;LIBERADO;489;3
05/01/2021  16:13:38;2480;3275;05/01/2021  16:11:27;05/01/2021  16:11:27;2;VM;LIBERADO;489;3

Where these totally non-standard headers become part of the columns, to facilitate my filters in excel.

Can anyone help me? I spent about 6 hours headbutting using python/pandas and did not have much success

1 answer

0


A possible solution using pandas

import pandas as pd

# carregando os dados e atribuindo nomes as colunas
colunas = ['Data','id_m','id_c','Data inicial','Data final','ID','Evento','Estado','Serial','Qtde']
dados = pd.read_csv('./tratamento.csv', sep = ';', names = colunas)

# removendo onde existe o texto data na coluna data
dados.drop(dados.loc[dados['Data'] == 'Data'].index, inplace = True)

# fazendo algumas substituições
dados.loc[dados['Data'] == 'ID:','ID'] = dados['id_m']
dados.loc[(dados['Data'] == 'VM') | (dados['Data'] == 'VC'),'Serial'] = dados['id_c']
dados.loc[(dados['Data'] == 'VM') | (dados['Data'] == 'VC'),'Estado'] = dados['id_m']
dados.loc[(dados['Data'] == 'VM') | (dados['Data'] == 'VC'),'Evento'] = dados['Data']
dados.loc[(dados['Data'] == 'VM') | (dados['Data'] == 'VC'),'Qtde'] = dados['Data inicial']

# aplicando o ffill somente nas colunas selecionadas
colunas = ['ID','Evento','Estado','Serial','Qtde']
dados.loc[:,colunas] = dados.loc[:,colunas].ffill()

# removendo as linhas
dados.drop(dados.loc[dados['Data'] == 'ID:'].index, inplace = True)
dados.drop(dados.loc[dados['Data'] == 'VM'].index, inplace = True)
dados.drop(dados.loc[dados['Data'] == 'VC'].index, inplace = True)

dados.reset_index(inplace=True, drop = True)

# mostrando o data frame
dados

# salvando o data frame em formato csv
dados.to_csv('tratamento2.csv')

Exit

          Data          id_m    id_c           Data inicial          Data final ID  Evento  Estado  Serial  Qtde
0   01/01/2021 03:50:58 1623    280493  01/01/2021 03:50:50 01/01/2021 03:50:50 1     VM    LIBERADO    471  2
1   01/01/2021 14:27:19 3785    NaN     01/01/2021 14:27:03 01/01/2021 14:27:03 1     VM    LIBERADO    471  2
2   01/01/2021 03:51:15 1623    280493  01/01/2021 03:50:50 01/01/2021 03:50:50 1     VC    LIBERADO    471  3
3   02/01/2021 14:50:34 272474  3741    02/01/2021 14:47:53 02/01/2021 14:47:53 1     VC    LIBERADO    471  3
4   04/01/2021 03:46:17 280233  273461  04/01/2021 03:45:52 04/01/2021 03:45:52 1     VC    LIBERADO    471  3
5   01/01/2021 10:27:25 1623    3714    01/01/2021 10:25:35 01/01/2021 10:25:35 1     VC      OCIOSO    471  1
6   01/01/2021 10:18:21 272889  NaN     01/01/2021 10:18:14 01/01/2021 10:18:14 2     VC      OCIOSO    489  7
7   01/01/2021 13:44:40 283251  NaN     01/01/2021 13:43:20 01/01/2021 13:43:20 2     VC      OCIOSO    489  7
8   02/01/2021 05:14:19 3199    NaN     02/01/2021 05:14:12 02/01/2021 05:14:12 2     VC      OCIOSO    489  7
9   02/01/2021 16:47:04 2804    NaN     02/01/2021 16:46:50 02/01/2021 16:46:50 2     VC      OCIOSO    489  7
10  03/01/2021 05:45:48 270     NaN     03/01/2021 05:45:33 03/01/2021 05:45:33 2     VC      OCIOSO    489  7
11  03/01/2021 13:28:12 2591    NaN     03/01/2021 13:28:01 03/01/2021 13:28:01 2     VC      OCIOSO    489  7
12  04/01/2021 05:38:42 2682    453     04/01/2021 05:38:34 04/01/2021 05:38:34 2     VC      OCIOSO    489  7
13  04/01/2021 05:38:59 2682    453     04/01/2021 05:38:34 04/01/2021 05:38:34 2     VM    LIBERADO    489  3
14  04/01/2021 11:02:48 2006    3641    04/01/2021 10:58:38 04/01/2021 10:58:38 2     VM    LIBERADO    489  3
15  05/01/2021 16:13:38 2480    3275    05/01/2021 16:11:27 05/01/2021 16:11:27 2     VM    LIBERADO    489  3

Browser other questions tagged

You are not signed in. Login or sign up in order to post.