Reading an XML file and printing specific fields using the Python language

Asked

Viewed 1,178 times

-1

I have the following XML file (actually it’s just a piece of the file):

<!DOCTYPE sysstat PUBLIC "DTD v2.19 sysstat //EN"
        "http://pagesperso-orange.fr/sebastien.godard/sysstat-2.19.dtd">
        <sysstat>
            <sysdata-version>2.19</sysdata-version>
            <host nodename="ServerLabDoS">
                <sysname>Linux</sysname>
                <release>3.16.0-4-686-pae</release>
                <machine>i686</machine>
                <number-of-cpus>1</number-of-cpus>
                <file-date>2017-04-10</file-date>
                <file-utc-time>10:39:04</file-utc-time>
                <statistics>
                    <timestamp date="2017-04-10" time="07:50:12" utc="0" interval="119">
                        <memory per="second" unit="kB">
                            <memfree>1140168</memfree>
                            <memused>131440</memused>
                            <memused-percent>10.34</memused-percent>
                            <buffers>10928</buffers>
                            <cached>51716</cached>
                            <commit>510544</commit>
                            <commit-percent>28.87</commit-percent>
                            <active>56880</active>
                            <inactive>29832</inactive>
                            <dirty>44</dirty>
                        </memory>
                        <network per="second">
                            <net-dev iface="lo" rxpck="0.00" txpck="0.00" rxkB="0.00" txkB="0.00" rxcmp="0.00" txcmp="0.00" rxmcst="0.00" ifutil-percent="0.00"/>
                            <net-dev iface="eth0" rxpck="12.58" txpck="11.50" rxkB="11.95" txkB="0.85" rxcmp="0.00" txcmp="0.00" rxmcst="0.00" ifutil-percent="0.00"/>
                        </network>
                    </timestamp>
                    <timestamp date="2017-04-10" time="07:52:01" utc="0" interval="107">
                        <memory per="second" unit="kB">
                            <memfree>1140444</memfree>
                            <memused>131164</memused>
                            <memused-percent>10.31</memused-percent>
                            <buffers>11288</buffers>
                            <cached>51932</cached>
                            <commit>509260</commit>
                            <commit-percent>28.80</commit-percent>
                            <active>57024</active>
                            <inactive>29840</inactive>
                            <dirty>28</dirty>
                        </memory>
                        <network per="second">
                            <net-dev iface="lo" rxpck="0.00" txpck="0.00" rxkB="0.00" txkB="0.00" rxcmp="0.00" txcmp="0.00" rxmcst="0.00" ifutil-percent="0.00"/>
                            <net-dev iface="eth0" rxpck="13.89" txpck="12.69" rxkB="13.71" txkB="0.93" rxcmp="0.00" txcmp="0.00" rxmcst="0.00" ifutil-percent="0.00"/>
                        </network>
                    </timestamp>
                    <timestamp date="2017-04-10" time="07:54:01" utc="0" interval="119">
                        <memory per="second" unit="kB">
                            <memfree>1139716</memfree>
                            <memused>131892</memused>
                            <memused-percent>10.37</memused-percent>
                            <buffers>11664</buffers>
                            <cached>52192</cached>
                            <commit>509148</commit>
                            <commit-percent>28.79</commit-percent>
                            <active>57384</active>
                            <inactive>29948</inactive>
                            <dirty>76</dirty>
                        </memory>
                        <network per="second">
                            <net-dev iface="lo" rxpck="0.00" txpck="0.00" rxkB="0.00" txkB="0.00" rxcmp="0.00" txcmp="0.00" rxmcst="0.00" ifutil-percent="0.00"/>
                            <net-dev iface="eth0" rxpck="13.35" txpck="12.40" rxkB="13.68" txkB="0.91" rxcmp="0.00" txcmp="0.00" rxmcst="0.00" ifutil-percent="0.00"/>
                        </network>
                    </timestamp>
</statistics>
        </host>
    </sysstat>

My goal is given a timestamp interval, such as between date="2017-04-10" time="07:50:12" and date="2017-04-10" time="07:52:01", print out memused and rxpck using Python.

I started the code:

from xml.dom import minidom

doc = minidom.parse("arq.xml")

# doc.getElementsByTagName returns NodeList
timestamp = doc.getElementsByTagName("timestamp")[0]
print(timestamp.firstChild.data)

But I won’t. Someone could help?

Let’s assume that in XML you had a day with several different times. What I wanted was to print these values for all times contained in the XML file.

Example of such an XML: https://ufile.io/4yd3x

1 answer

2


First you need to set the date thresholds that your script has to work on. For this you should use the library datetime:

from datetime import datetime

begin = datetime(2017, 4, 10, 7, 50, 12)
end = datetime(2017, 4, 10, 7, 52, 1)

Then you have to iterate over all the tags timestamp, obtain their date and time and filter those that are not in the crease defined. Get the attributes date and time with the method getAttribute() and make the interpretation of strings with the datetime.strptime():

for timestamp in doc.getElementsByTagName('timestamp'):
    date = timestamp.getAttribute('date')
    time = timestamp.getAttribute('time')
    dt = datetime.strptime('%s %s' % (date, time), '%Y-%m-%d %H:%M:%S')
    if dt < begin or dt >= end:
        continue

Okay, now just get the contents of tag memused, iterate over all network interfaces (tags net-dev) and obtain the desired attributes (rxpck and maybe iface):

memused = timestamp.getElementsByTagName('memused')[0].firstChild.data
for netdev in timestamp.getElementsByTagName('net-dev'):
    iface = netdev.getAttribute('iface')
    rxpck = netdev.getAttribute('rxpck')
    print 'date:%s time:%s memused:%s iface:%s rxpck:%s' % (date, time, memused, iface, rxpck)

Below is the complete code for easy testing:

#!/usr/bin/env python

from xml.dom import minidom
from datetime import datetime

doc = minidom.parse('arq.xml')

begin = datetime(2017, 4, 10, 7, 50, 12)
end = datetime(2017, 4, 10, 7, 52, 1)

for timestamp in doc.getElementsByTagName('timestamp'):
    date = timestamp.getAttribute('date')
    time = timestamp.getAttribute('time')
    dt = datetime.strptime('%s %s' % (date, time), '%Y-%m-%d %H:%M:%S')
    if dt < begin or dt >= end:
        continue
    memused = timestamp.getElementsByTagName('memused')[0].firstChild.data
    for netdev in timestamp.getElementsByTagName('net-dev'):
        iface = netdev.getAttribute('iface')
        rxpck = netdev.getAttribute('rxpck')
        print 'date:%s time:%s memused:%s iface:%s rxpck:%s' % (date, time, memused, iface, rxpck)

For more information on handling XML files in Python I recommend reading manual of minidom (in English only, unfortunately) or the answers to that question.

  • if in addition to printing, you wanted to save the values neatly in text file, it would be difficult?

  • 1

    No, just use the functions open and write: open('arquivo.txt', 'a').write(string). If you don’t understand how to fit the code I suggest looking for examples in Stackoverflow or ask another question.

  • I think I misworded the question: let’s assume that in XML there was a day with several schedules. What I wanted was to print these figures for all the times contained in the file.

  • would print out all values for each existing time!

  • example of an XML like this: https://ufile.io/4yd3x

  • But this he does, just configure the variables begin and end with the correct values. I suggest assigning the first hour of the day to the variable begin and the last hour to end. Example: begin = datetime(2017, 4, 10, 0, 0, 0) and end = datetime(2017, 4, 11, 0, 0, 0).

  • I tried to modify here but it only printed two values...

  • Surely you have not modified it correctly. Just replace the two lines where you have begin = and end =.

  • sorry my ignorance! You are right. The script works! You could leave in your reply some reference link about XML with python?

Show 5 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.