Handling xml treat strings as if they were files

Asked

Viewed 360 times

2

I’m getting a xml in the format of string in my Sponse. I want to manipulate this xml and generate a list of tags <item> xml returned by web service. Here’s the snippet of my code:

if (op == 2):

    print '\n'*3
    print 'TESTE Viagem'; sleep(2);
    print '\n'*3

    response = oRodov.viagem() #xml

    tree = ET.parse(response)
    root = tree.getroot()
    iterador = root.getiterator()
    for x in iterador:
        if x.tag == "item":
            print x.items()

The code is returning me one IOError

Traceback (most recent call last):
  File "rodoviario.py", line 134, in ?
    tree = ET.parse(response)
  File "/usr/local/lib/python2.4/site-packages/elementtree/ElementTree.py", line 1120, in parse
    tree.parse(source, parser)
  File "/usr/local/lib/python2.4/site-packages/elementtree/ElementTree.py", line 642, in parse
    source = open(source, "rb")
IOError: [Errno 2] No such file or directory: '<?xml version="1.0" encoding="ISO-8859-1"?>\n\n<viagem origem="1" destino="2" data="2014-07-02" servico="2311" grupo="DEMON">\n <saida dia="0" hora="02:00" />\n <chegada dia="0" hora="12:15" />\n <empresa>DEMON</empresa>\n <mensagem-servico></mensagem-servico>\n <destino>2</destino>\n <moeda>R$</moeda>\n <preco>51.10</preco>\n <layout>\n  <secao nome="Unica">\n   <item x="0" y="0">\n    <assento ocupado="0" tipo=" " numero="01" />\n   </item>\n   <item x="1" y="0">\n    <assento ocupado="0" tipo=" " numero="05" />\n   </item>\n   <item x="2" y="0">\n    <assento ocupado="0" tipo=" " numero="09" />\n   </item>\n   <item x="3" y="0">\n    <assento ocupado="0" tipo=" " numero="13" />\n   </item>\n   <item x="4" y="0">\n    <assento ocupado="0" tipo=" " numero="17" />\n   </item>\n   <item x="5" y="0">\n    <assento ocupado="0" tipo=" " numero="21" />\n   </item>\n   <item x="6" y="0">\n    <assento ocupado="0" tipo=" " numero="25" />\n   </item>\n   <item x="7" y="0">\n    <assento ocupado="0" tipo=" " numero="29" />\n   </item>\n   <item x="8" y="0">\n    <assento ocupado="0" tipo=" " numero="33" />\n   </item>\n   <item x="9" y="0">\n    <assento ocupado="0" tipo=" " numero="37" />\n   </item>\n   <item x="10" y="0">\n    <assento ocupado="0" tipo=" " numero="42" />\n   </item>\n   <item x="0" y="1">\n    <assento ocupado="0" tipo=" " numero="02" />\n   </item>\n   <item x="1" y="1">\n    <assento ocupado="0" tipo=" " numero="06" />\n   </item>\n   <item x="2" y="1">\n    <assento ocupado="0" tipo=" " numero="10" />\n   </item>\n   <item x="3" y="1">\n    <assento ocupado="0" tipo=" " numero="14" />\n   </item>\n   <item x="4" y="1">\n    <assento ocupado="0" tipo=" " numero="18" />\n   </item>\n   <item x="5" y="1">\n    <assento ocupado="0" tipo=" " numero="22" />\n   </item>\n   <item x="6" y="1">\n    <assento ocupado="0" tipo=" " numero="26" />\n   </item>\n   <item x="7" y="1">\n    <assento ocupado="0" tipo=" " numero="30" />\n   </item>\n   <item x="8" y="1">\n    <assento ocupado="0" tipo=" " numero="34" />\n   </item>\n   <item x="9" y="1">\n    <assento ocupado="0" tipo=" " numero="38" />\n   </item>\n   <item x="10" y="1">\n    <assento ocupado="0" tipo=" " numero="41" />\n   </item>\n   <item x="12" y="2">\n    <assento ocupado="0" tipo=" " numero="04" />\n   </item>\n   <item x="1" y="3">\n    <assento ocupado="0" tipo=" " numero="08" />\n   </item>\n   <item x="2" y="3">\n    <assento ocupado="0" tipo=" " numero="12" />\n   </item>\n   <item x="3" y="3">\n    <assento ocupado="0" tipo=" " numero="16" />\n   </item>\n   <item x="4" y="3">\n    <assento ocupado="0" tipo=" " numero="20" />\n   </item>\n   <item x="5" y="3">\n    <assento ocupado="0" tipo=" " numero="24" />\n   </item>\n   <item x="6" y="3">\n    <assento ocupado="0" tipo=" " numero="28" />\n   </item>\n   <item x="7" y="3">\n    <assento ocupado="0" tipo=" " numero="32" />\n   </item>\n   <item x="8" y="3">\n    <assento ocupado="0" tipo=" " numero="36" />\n   </item>\n   <item x="9" y="3">\n    <assento ocupado="0" tipo=" " numero="40" />\n   </item>\n   <item x="10" y="3">\n    <assento ocupado="0" tipo=" " numero="44" />\n   </item>\n   <item x="0" y="4">\n    <assento ocupado="0" tipo=" " numero="03" />\n   </item>\n   <item x="1" y="4">\n    <assento ocupado="0" tipo=" " numero="07" />\n   </item>\n   <item x="2" y="4">\n    <assento ocupado="0" tipo=" " numero="11" />\n   </item>\n   <item x="3" y="4">\n    <assento ocupado="0" tipo=" " numero="15" />\n   </item>\n   <item x="4" y="4">\n    <assento ocupado="0" tipo=" " numero="19" />\n   </item>\n   <item x="5" y="4">\n    <assento ocupado="0" tipo=" " numero="23" />\n   </item>\n   <item x="6" y="4">\n    <assento ocupado="0" tipo=" " numero="27" />\n   </item>\n   <item x="7" y="4">\n    <assento ocupado="0" tipo=" " numero="31" />\n   </item>\n   <item x="8" y="4">\n    <assento ocupado="0" tipo=" " numero="35" />\n   </item>\n   <item x="9" y="4">\n    <assento ocupado="0" tipo=" " numero="39" />\n   </item>\n   <item x="10" y="4">\n    <assento ocupado="0" tipo=" " numero="43" />\n   </item>\n  </secao>\n </layout>\n</viagem>\n'

I identified that this error is happening because I am not informing a file directory xml and yes passing the xml in the format of string:

tree = ET.parse(response)

But I don’t want to have to save that string as file xml and then do the parse and manipulate the file.

How do I manipulate this string as xml without having q save it as a file in some directory?!

  • 1

    I won’t post yet as an answer because I’m not sure (and I’m running out of time now to run tests), but think that can be solved using StringIO. Try: import StringIO and tree = ET.parse(StringIO.StringIO(respose)). Later I come back here. P.S. On the line just below is root = response.getroot(), would not be tree.getroot()?

  • @mgibsonbr Yes it is tree.getroot(), I already made the correction :p

  • @mgibsonbr I managed to solve with the StringIO. Thank you :D

  • 1

    In that case, then, I’m posting as an answer. :)

1 answer

2


Many Python libraries expect files or "file-Likes" as parameters. Although this may seem restrictive, there is a built-in specific to treat strings as if they were files: StringIO (or its alternative implementation, cStringIO).

from StringIO import StringIO
tree = ET.parse(StringIO(response))

Explaining: StringIO(string) creates a "file-like" (i.e. object that behaves as if it were a file) whose "content" is the string passed as argument. This object implements the entire interface of a file, with methods to open, read, write, etc. Passing this object to ET.parse it is allowed to "open" and "read" it normally, accessing its content without the need to create an intermediate file.

  • 2

    Just adding - in Python 3 what was previously in Stringio.Stringio (and cStrincIO.Stringio) is now in "io.Bytesio"

  • @jsbueno Does this mean that in Python 3 it would be necessary to encode the string (which is Unicode by default in this version) in any encoding, instead of using it as it is? (inefficient, but on second thought, at first I would have to do this also in Python 2...) Vi in that documentation that Python 3 still supports StringIO - unless it is clear that his semantics have changed and he is no longer compatible with the methods that await a file-like... Would you happen to have some reference where I could read more on the subject?

  • 1

    If you have a Python 3 string you have text - text cannot be read or written in a file - you has to encode it with an encoding. Please read http://local.joelonsoftware.com/wiki/O_M%C3%Adnimo_absoluto_que_todos_programadores_de_software_precise,_Absolutely,Positivamente_de_Saber_Sobre_Unicode_e_Conjuntos_de_Caracteres(Always sorry! ) for your own good. Python3’s Stringio also works with text - and gives you a text buffer - that would have to be encoded if it were written to disk, or anything else.

  • @jsbueno I am aware of the encoding issues. What I wondered is that it is unnecessary to convert a large string to binary and vice versa, when one could simply "decorate" Stringio to behave like a file in a specific encoding (i.e. making the conversion char -> bytes on demand, instead of converting the entire text and wasting a lot of memory). I thought that maybe Python 3 had something like this built-in. The way I see it, this is a step back. But again, I realize that the situation in Python 2 is not much better than this...

Browser other questions tagged

You are not signed in. Login or sign up in order to post.