8
The case is that I have a file produced by a Garmin (GPS exercise device) and I want to remove all fields related to the heartbeat to pass the file to an athlete who did the exercise with me. The file is in GPX format and is more or less like this:
<?xml version="1.0" encoding="UTF-8"?>
<gpx version="1.1" ...>
<metadata>...</metadata>
<trk>
<trkseg>
<trkpt lon="00" lat="00">
<ele>000</ele>
<time>2014-01-01T00:00:00.000Z</time>
<extensions>
<gpxtpx:TrackPointExtension>
<gpxtpx:hr>99</gpxtpx:hr>
</gpxtpx:TrackPointExtension>
</extensions>
</trkpt>
....
<trkpt ...>
...
<extensions>
...
</extensions>
</trkpt>
</trkseg>
</trk>
</gpx>
The system basically generates an element <trkpt>
every reading (geographical + physiological + other devices). I need to remove all instances of the element <extensions>
within the <trkpt>
(i.e., all the contents of it). I tried using the library ElementTree
with the following code:
import xml.etree.ElementTree as ET
tree = ET.parse('input.gpx')
root = tree.getroot()
for ext in root[1][2].iter('{http://www.topografix.com/GPX/1/1}trkpt'):
ext = trkpt.find('{http://www.topografix.com/GPX/1/1}extensions')
root.remove(ext)
tree.write('output.gpx')
The code even removes the elements, but I didn’t like 3 things here:
The first is that the library adds the XML schema Urls to the element names. I lost a lot of time without understanding why my algorithm couldn’t find the elements...
The second is this root[1][2]
to have a pointer to the father of the elements I want to remove. I could access the elements directly by invoking root.iter('{...}extensions')
.
And finally, the most serious issue is that when writing the result in the file I realized that the library renames the tags breaks the original format. The result was so:
<?xml version='1.0' encoding='UTF-8'?>
<ns0:gpx ...>
<ns0:metadata>...</ns0:metadata>
<ns0:trk>...</ns0:trk>
</ns0:gpx>
As I have no experience with this library maybe I’m missing some configuration I didn’t see in my superficial reading of documentation. So I’m looking for a solution to my problem with this or another library.
Does it need to be in Python? It is possible to do this with sed: sed '/<Extensions>/,/</Extensions>/d' input.gpx
– user568459
I appreciate the tip Francisco. I’ve even solved using something very similar inside Vim, but I’m trying my pythonese better. []
– Nigini
Ah, right. It’s just that you said it was just to pass a file to a friend, I thought you just wanted a quick fix. :)
– user568459
I have no time to write an example now, so I will comment and not create a response. But take a look at the lib Beautifulsoup, very used for this type of script.
– Thiago Silva
Thanks @Thiago-silva. I used your recommendation and posted a reply here. Very cool to Beautifulsoup.
– Nigini