Xpath with Python

Asked

Viewed 1,293 times

14

I have the following XML (simplified):

<produto refid="cat01" idprod="tv01">
    <marca>xxx</marca>
    <modelo>xxxx</modelo>
    <genero>xxx</genero>
</produto>
<v:utilizador iduser="U00000" comercial="12344" loja="xxxxx">
    <nome>xxxxx</nome>
    <contacto>94xxxxxx</contacto>
    <email>loja@xxxxxxx</email>
    <morada>xxxxxx</morada>
    <localidade>xxxxxx</localidade>
    <cesto>
        <encomenda refidprod="rlg04" estado="enviado"/>
        <encomenda refidprod="tv04" estado="enviado"/>
    </cesto>
</v:utilizador>
<u:utilizador iduser="U00003" comercial="" loja="">
    <nome>xxxxx</nome>
    <contacto>93xxxxxx</contacto>
    <email>xxxxxxxx</email>
    <morada>xxxxxx</morada>
    <localidade>xxxxx</localidade>
    <cesto>
        <encomenda refidprod="tlf04" estado="em processo"/>
    </cesto>
</u:utilizador>

And I have to write a Python code that shows all products purchased by a particular user. I did the following:

from lxml import etree

u={'u':'Utilizador'} ##Declaração do namespace

file = "marketplace.xml"
treeDoc = etree.parse(file)

nome=str(raw_input('Insira o nome: '))
print("Cesto de " + nome)

elemList = treeDoc.xpath("//produto[./@idprod=//utilizadores/u:utilizador[./nome='"+ nome +"']/cesto/encomenda/@refidprod]/marca", namespaces=u)

for elem in elemList:
    print("Item: ", elem.get("idprod"))
    nameList = elem.xpath("produto")
    print(nameList[0].tag, ": ", nameList[0].text)

Just nothing comes back to me. Can anyone tell me what the problem is? Another question: How do I xpath return the two namespaces without having to querys different?

  • 3

    I tested it here, and it worked. I had to adapt his example, because: 1. The user U00003 did not buy tv01, so in this example it was not to return anything at all; 2. I put the missing elements (xml root and element utilizadores). As you stated in your xml the namespace u? That’s how: xmlns:u="Utilizador"? If the xmlns in the file is one and the code is another, it does not error, but returns nothing. When returning two namespaces, I’m not sure which is the best way, but should involve a or or a Union (|).

  • P.S. The xpath worked, the for not below. You are selecting the brand, not the product...

  • Yes I declared as you said, xmlns:u='User'. I didn’t understand the for part, how do I return the tag then?

  • 1

    The mark is rather being returned, is that the for tries to catch idprod which is an attribute of produto (that is just above the mark in the hierarchy), and then tries to catch elements produto, which in their example are not sub-elements of marca. When to the problem of not returning anything, would you have a more complete example of input, maybe a real xml file posted in Pastebin, for example? Because I can’t reproduce your problem, with me your xpath worked perfectly...

  • 1

    Hi, thanks for the help but I already figured out the error, in my xpath declaration I was asking for the brand when I should actually ask for the idprod to match the refidprod in the user’s basket. &#xA;&#xA;elemList = treeDoc.xpath("//produto[./@idprod=//utilizadores/u:utilizador[./nome='"+ nome +"']/cesto/encomenda/@refidprod]", namespaces=u)&#xA;&#xA;for elem in elemList:&#xA; print("Item: ", elem.get("idprod"))&#xA; nameList = elem.xpath("marca")&#xA; print(namelist[0].tag, ": ", namelist[0].text) . Thanks just the same

  • Good! But before closing the question, explain one thing to me: why there are two namespaces u and v when the structure of its users seems to me identical?

  • 1

    There are two namespaces that were to differentiate users (u:'users') from sellers (v:'sellers'). This xml was an adaptation of an online market where there were categories, products and users/sellers.

Show 2 more comments

2 answers

1

I know you don’t exactly answer your question, but have you tried using xmltodict?

Module link: https://pypi.python.org/pypi/xmltodict

```python
>>> doc = xmltodict.parse("""
... <mydocument has="an attribute">
... <and>
... <many>elements</many>
... <many>more elements</many>
... </and>
... <plus a="complex">
... element as well
... </plus>
... </mydocument>
... """)
>>>
>>> doc['mydocument']['@has']
u'an attribute'
>>> doc['mydocument']['and']['many']
[u'elements', u'more elements']
>>> doc['mydocument']['plus']['@a']
u'complex'
>>> doc['mydocument']['plus']['#text']
u'element as well'
```

0

This here can solve your problem:

import xml.etree.ElementTree as ET

#dicionario de namespaces
ns_dic = {'v' : 'http://exemplo.org',
          'u' : 'http://exemplo2.org'}

tree = ET.parse('marketplace.xml')
root = tree.getroot()

#usuario entra com nome
nome = "'%s'"%input() #python 2.7 use raw_input()

utilizadores = [] #declaro lista de utilizadores
#busco todos os utilizadores com o nome dado independente do namespace
for ns in ns_dic: 
    utilizadores.append(root.findall("./%s:utilizador[nome=%s]"%(ns, nome), ns_dic))

utilizadores = sum(utilizadores, []) #planifico a lista

cestos = {} #crio um dicionario do tipo {user_id : lista_de_encomendas}
for utilizador in utilizadores:
    user_id = utilizador.get('iduser')
    cesto = utilizador.findall("cesto/encomenda")
    cestos[user_id] = cesto
    for encomenda in cesto:
        print("id: %s, estado:%s"%(encomenda.get("refidprod"), encomenda.get("estado")))

In your example, if I search for the first user, the code returns it to me:

>>>
id: rlg04, estado:enviado
id: tv04, estado:enviado
>>> utilizadores
[<Element '{http://exemplo.org}utilizador' at 0x00000000035B1958>]
>>> cestos
{'U00000': [<Element 'encomenda' at 0x00000000035B1B88>, <Element 'encomenda' at 0x00000000035B1BD8>]}

Notice that I prefer to use more python than xpath, For me, the code becomes clearer.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.