Python. Picking up a substring in a formatted text

Asked

Viewed 132 times

0

Hello.

I have a string formatted with several attributes, I need to get all the "text" fields. In this example, I need to take "Gmail" and "Youtube" and discard everything else. Using Python

<node bounds="{0, 0, 540, 36}" 
checked="false" 
class="android.widget.FrameLayout" 
clickable="false" 
enabled="true" 
id="" 
index="1" 
indexId="-1" 
long-clickable="false" 
package="com.android.systemui" 
password="false" 
resource-id="com.android.systemui:id/hided_by_cover_group2" 
scrollable="false" 
selected="false" 
stringname="" 
talkback="" 
text="" 
><node bounds="{0, 0, 540, 36}" 
checked="false" 
class="android.widget.FrameLayout" 
clickable="false" 
enabled="true" 
id="" 
index="2" 
indexId="-1" 
long-clickable="false" 
package="com.android.systemui" 
password="false" 
resource-id="com.android.systemui:id/msim_panel_holder" 
scrollable="false" 
selected="false" 
stringname="" 
talkback="Gmail" 
text="Gmail" 
></node>

<node bounds="{0, 0, 540, 36}" 
checked="false" 
class="android.widget.FrameLayout" 
clickable="false" 
enabled="true" 
id="" 
index="1" 
indexId="-1" 
long-clickable="false" 
package="com.android.systemui" 
password="false" 
resource-id="com.android.systemui:id/hided_by_cover_group2" 
scrollable="false" 
selected="false" 
stringname="" 
talkback="" 
text="" 
><node bounds="{0, 0, 540, 36}" 
checked="false" 
class="android.widget.FrameLayout" 
clickable="false" 
enabled="true" 
id="" 
index="2" 
indexId="-1" 
long-clickable="false" 
package="com.android.systemui" 
password="false" 
resource-id="com.android.systemui:id/msim_panel_holder" 
scrollable="false" 
selected="false" 
stringname="" 
talkback="Youtube" 
text="Youtube" 
></node>

Thank you

  • Changed to Portuguese. Someone knows how to answer?

1 answer

0

Despite being a simple example of taking the data or filtering the lines or using regular expressions - as the text is a valid XML, the recommendation is to use the XML tools to get the desired values.

Thus, if at some future point the "external" file formatting changes, or if you need other fields, the code remains valid - beyond that XML has some other features that would be valid in the input data, and that reading as plain text would simply discard (use of entities as character names, etc...).

Now - your XML is poorly formatted - either because you didn’t paste the whole file, or because you edited it by hand - note that in the given snippet, there is no "root" element that is the parent of all the others, and some of the node doesn’t have the closing tag. With a well-formed xml, which is in the name variable "text" you can do:

from xml.elementtree import ElementTree as ET
xml = ET.fromstring(texto.strip())
atributos = [element.get("text") for element in xml.iter() if element.tag=="node" and element.get("text", None)]

The first line imports the Python Elementtree class to work with XML. The second constructs a "live" XML object using the formatted string (the "parse" method instead of "fromstring" can read a file directly). The third line uses a list comprehension - a for Python inline to visit all elements of your XML, and if the tag is node and there is some content in the attribute text this is used as part of the list. At the end, the "attributes" variable will have everything that would be in any "text" element in that structure

Browser other questions tagged

You are not signed in. Login or sign up in order to post.