Parse feed in Python

Asked

Viewed 70 times

1

Access a feed whose layout is like this:

<horoscope>
	<date>20170627</date>
	<listItem>
		<item>
		<signTitle begin="21/03" end="19/04">Áries</signTitle>
		<content>
			BNononononononononononon
		</content>
		<linktexto>
			<![CDATA[ 
			 <a href='' target='blank'></a> ]]>
		</linktexto>
		<textosaida>
			<![CDATA[ 
			 ]]>
		</textosaida>
		<linksaida>
			<![CDATA[ 
			 <a href='' target='blank'></a> ]]>
		</linksaida>
		</item>
	</listItem>
</horoscope>

When parsing using the feedparser library, I want to extract the value of the tag in the "Aries" case, but instead I get the following output:

{'Begin': '21/03', 'end': '19/04'}

Which are the "Begin" and "end" tag attributes. But the internal value really doesn’t come. My code goes below:

import feedparser
d = feedparser.parse(caminho_do_xml)
for post in d.entries:
  print(post.signtitle)

How can I access tag content instead of just attributes? Thank you.

  • You need only Aries or all the text of the elements?

2 answers

1

How about:

import feedparser

rssfeed = """
<horoscope>
    <date>20170627</date>
    <listItem>
        <item>
        <signTitle begin="21/03" end="19/04">Aries</signTitle>
        <content>
            BNononononononononononon
        </content>
        <linktexto>
            <![CDATA[
             <a href='' target='blank'></a> ]]>
        </linktexto>
        <textosaida>
            <![CDATA[
             ]]>
        </textosaida>
        <linksaida>
            <![CDATA[
             <a href='' target='blank'></a> ]]>
        </linksaida>
        </item>
    </listItem>
</horoscope>"""

d = feedparser.parse(rssfeed)

for e in d.entries:
    print e['content'][0].value

Exit:

BNononononononononononon

0

If I understand what you need, you will not need third-party libraries (as I think the feedparser is). Python natively has a library to work with XML. See:

import xml.etree.ElementTree

# Elemento raiz do XML:
root = xml.etree.ElementTree.parse("feed.xml").getroot()

# Itera sobre todos os elementos listItem:
for listItem in root.iter("listItem"):

    # Itera sobre todos os elementos item:
    for item in listItem.iter("item"):

        # Busca pelo elemento signTitle:
        signTitle = item.find("signTitle")

        # Imprime seu conteúdo:
        print(signTitle.text)

See working on Repl.it.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.