How to extract specific data from a Python text file?

Asked

Viewed 3,096 times

1

I have a **text file of 49633 lines** (txt file) with the following format:

 -e  Tue Mar 28 20:17:01 -03 2017 

              total       used       free     shared    buffers     cached
Mem:        239956     126484     113472       4904      10292      52280
-/+ buffers/cache:      63912     176044
Swap:       496636          0     496636 

 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 113460  10292  52308    0    0  1706    67  532  828 15 10 74  1  0

-e  Tue Mar 28 20:18:01 -03 2017 

              total       used       free     shared    buffers     cached
Mem:        239956     132808     107148       4904      10796      54872
-/+ buffers/cache:      67140     172816
Swap:       496636          0     496636 

 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 107656  10796  54872    0    0   654    29  219  353  6  4 90  0  0

-e  Tue Mar 28 20:19:01 -03 2017 

              total       used       free     shared    buffers     cached
Mem:        239956     132136     107820       4904      10824      54892
-/+ buffers/cache:      66420     173536
Swap:       496636          0     496636 

 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 107776  10824  54892    0    0   400    19  147  243  3  2 94  0  0

I would like to extract the id value from the CPU field given a time interval. For example:

inicio=Mar 28 20:17:01

fim = Mar 28 20:19:01

would print:

data                  id           
Mar 28 20:17:01,      74
Mar 28 20:18:01,      90
Mar 28 20:19:01,      94

I’m trying but I couldn’t write any lines of code other than:

#!/usr/bin/env python


F = open(“arquivo.txt”,”r”) 

Could someone help?

  • Nor did you read the start and end date?

  • @Anderson Carlos Woss: I could only read the file!

  • @Anderson Carlos Woss: I really don’t know!

  • The simplest output I see is to use regular expressions. Read about the library re, python native.

1 answer

2


I’m still learning Python, but see if that’s what you need:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
arq = open('arquivo.txt', 'r')
texto = arq.readlines()
x = 0
Saida = ""
for linha in texto:
    Array = linha.split()
    if x != 1:
      print("data                  id")
      x+=1
    if (len(Array) == 7 and Array[0] == '-e'):
      Saida += Array[2] + ' ' + Array[3] + ' ' + Array[4]
    if (len(Array) > 7 and Array[14] in Array and Array[14] != 'id'):
      Saida += ',      ' + Array[14] + "\n"
print(Saida)
arq.close()

In case you want to do tests, here https://repl.it/Jy2Z

Updating: I made an improvement in the code, in your question you said that the file has about 49633 lines so I suppose you have several dates too, so I created a function to return the result between date range.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from datetime import *
# dia   mes   ano   horario
#  28    03   2017  20:17:01
#  %d    %m   %Y    %X
def retornaResultado(DataInicio, DataFinal):
  arq = open('arquivo.txt', 'r')
  Inicio = int(datetime.strptime(DataInicio, '%d %m %Y %X').timestamp())
  Final  = int(datetime.strptime(DataFinal,  '%d %m %Y %X').timestamp())
  texto = arq.readlines()
  x = 0
  Saida = ""
  DataArquivo = ""
  for linha in texto:
      Array = linha.split()
      if x != 1:
        print("data                  id")
        x+=1
      if (len(Array) == 7 and Array[0] == '-e'):
        DataArquivo = int(datetime.strptime(Array[3] + ' ' + Array[2] + ' ' + Array[6] + ' ' + Array[4],  '%d %b %Y %X').timestamp())
        if DataArquivo >= Inicio and DataArquivo <= Final:
          Saida += Array[2] + ' ' + Array[3] + ' ' + Array[4]
      if (len(Array) > 7 and Array[14] in Array and Array[14] != 'id'):
        if DataArquivo >= Inicio and DataArquivo <= Final:
          Saida += ',      ' + Array[14] + "\n"
  print(Saida)
  arq.close()

# Exemplo de uso
retornaResultado('28 03 2017 20:17:01', '28 03 2017 20:17:01')

Like utilise: perform the function: returnResulted, passing two dates in format: DAY MONTH AND YEAR HOURS

When using the function it is not necessary to provide the exact time, you can for example put 00:00:00 thus it returns the result starting with the time 00:00:00 until 23:59:59.

  • I’ll test it! Thank you

  • are various dates, each with its own data

  • Right, so the use of function is the best option.

  • here is the file: https://ufile.io/62qph

  • Excuse me now I can not see the file due to the settings of the machine I am, because has restrictions of access to websites.

  • errors: http://imgur.com/a/Xixur

  • Rsrs like I said, I don’t have access to all urls. Enter error here in comment

  • here you can (google drive): https://drive.google.com/open?id=0B8_itjSdUyTtM3RNR2ttVU5UVWs

  • put in google drive!

  • I can’t do it either.

  • Traceback (Most recent call last): File "/home/Gopala/Desktop/Ler_free_vmstat_output.py", line 33, in <module> returnResulted('28 03 2017 20:17:01', '28 03 2017 20:17:01') File "/home/Gopala/Desktop/Ler_free_vmstat_output.py", line 10, in retResulted Home = int(datetime.strptime(Start date, '%d %m %Y %X').timestamp() Attributeerror: 'datetime.datetime' Object has no attribute 'timestamp stamp'

  • Forgot to import datetime. Start: from datetime import ? .... Check out the second code I put in.

  • from datetime import * was already there yes!

  • To better help you, place the code here https://repl.it and save and send the url

  • https://repl.it/teacher/classrooms/26312/drafts

Show 11 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.