How to extract specific data from a Python text file?


Viewed 3,096 times


I have a **text file of 49633 lines** (txt file) with the following format:

 -e  Tue Mar 28 20:17:01 -03 2017 

              total       used       free     shared    buffers     cached
Mem:        239956     126484     113472       4904      10292      52280
-/+ buffers/cache:      63912     176044
Swap:       496636          0     496636 

 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 113460  10292  52308    0    0  1706    67  532  828 15 10 74  1  0

-e  Tue Mar 28 20:18:01 -03 2017 

              total       used       free     shared    buffers     cached
Mem:        239956     132808     107148       4904      10796      54872
-/+ buffers/cache:      67140     172816
Swap:       496636          0     496636 

 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 107656  10796  54872    0    0   654    29  219  353  6  4 90  0  0

-e  Tue Mar 28 20:19:01 -03 2017 

              total       used       free     shared    buffers     cached
Mem:        239956     132136     107820       4904      10824      54892
-/+ buffers/cache:      66420     173536
Swap:       496636          0     496636 

 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 107776  10824  54892    0    0   400    19  147  243  3  2 94  0  0

I would like to extract the id value from the CPU field given a time interval. For example:

inicio=Mar 28 20:17:01

fim = Mar 28 20:19:01

would print:

data                  id           
Mar 28 20:17:01,      74
Mar 28 20:18:01,      90
Mar 28 20:19:01,      94

I’m trying but I couldn’t write any lines of code other than:

#!/usr/bin/env python

F = open(“arquivo.txt”,”r”) 

Could someone help?

  • Nor did you read the start and end date?

  • @Anderson Carlos Woss: I could only read the file!

  • @Anderson Carlos Woss: I really don’t know!

  • The simplest output I see is to use regular expressions. Read about the library re, python native.

1 answer


I’m still learning Python, but see if that’s what you need:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
arq = open('arquivo.txt', 'r')
texto = arq.readlines()
x = 0
Saida = ""
for linha in texto:
    Array = linha.split()
    if x != 1:
      print("data                  id")
    if (len(Array) == 7 and Array[0] == '-e'):
      Saida += Array[2] + ' ' + Array[3] + ' ' + Array[4]
    if (len(Array) > 7 and Array[14] in Array and Array[14] != 'id'):
      Saida += ',      ' + Array[14] + "\n"

In case you want to do tests, here

Updating: I made an improvement in the code, in your question you said that the file has about 49633 lines so I suppose you have several dates too, so I created a function to return the result between date range.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from datetime import *
# dia   mes   ano   horario
#  28    03   2017  20:17:01
#  %d    %m   %Y    %X
def retornaResultado(DataInicio, DataFinal):
  arq = open('arquivo.txt', 'r')
  Inicio = int(datetime.strptime(DataInicio, '%d %m %Y %X').timestamp())
  Final  = int(datetime.strptime(DataFinal,  '%d %m %Y %X').timestamp())
  texto = arq.readlines()
  x = 0
  Saida = ""
  DataArquivo = ""
  for linha in texto:
      Array = linha.split()
      if x != 1:
        print("data                  id")
      if (len(Array) == 7 and Array[0] == '-e'):
        DataArquivo = int(datetime.strptime(Array[3] + ' ' + Array[2] + ' ' + Array[6] + ' ' + Array[4],  '%d %b %Y %X').timestamp())
        if DataArquivo >= Inicio and DataArquivo <= Final:
          Saida += Array[2] + ' ' + Array[3] + ' ' + Array[4]
      if (len(Array) > 7 and Array[14] in Array and Array[14] != 'id'):
        if DataArquivo >= Inicio and DataArquivo <= Final:
          Saida += ',      ' + Array[14] + "\n"

# Exemplo de uso
retornaResultado('28 03 2017 20:17:01', '28 03 2017 20:17:01')

Like utilise: perform the function: returnResulted, passing two dates in format: DAY MONTH AND YEAR HOURS

When using the function it is not necessary to provide the exact time, you can for example put 00:00:00 thus it returns the result starting with the time 00:00:00 until 23:59:59.

  • I’ll test it! Thank you

  • are various dates, each with its own data

  • Right, so the use of function is the best option.

  • here is the file:

  • Excuse me now I can not see the file due to the settings of the machine I am, because has restrictions of access to websites.

  • errors:

  • Rsrs like I said, I don’t have access to all urls. Enter error here in comment

  • here you can (google drive):

  • put in google drive!

  • I can’t do it either.

  • Traceback (Most recent call last): File "/home/Gopala/Desktop/", line 33, in <module> returnResulted('28 03 2017 20:17:01', '28 03 2017 20:17:01') File "/home/Gopala/Desktop/", line 10, in retResulted Home = int(datetime.strptime(Start date, '%d %m %Y %X').timestamp() Attributeerror: 'datetime.datetime' Object has no attribute 'timestamp stamp'

  • Forgot to import datetime. Start: from datetime import ? .... Check out the second code I put in.

  • from datetime import * was already there yes!

  • To better help you, place the code here and save and send the url


Show 11 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.