Poorly formatted XML

Asked

Viewed 105 times

1

I’m trying to read some XML files with the Element Tree of Python, but one of them, when I will do the parser give me that mistake:

xml.etree.Elementtree.Parseerror: not well-Formed (invalid token)

That’s the line that gives the error:

xml = ET.parse('./dados_apis/gamesdb/xml/infos_games/31758.xml')

This is XML file: http://thegamesdb.net/api/GetGame.php?id=31758

In Python, I’m reading from the disk because it’s already saved.

Does anyone know how I can solve this problem? Apparently it is some special character, but in the XML opening is declared there the UTF-8 encoding.

  • Could you ask the question your code? Because I did a test search straight from the link and it worked perfectly (https://repl.it/@acwoss/Scrawnyunlinedfunctions)

1 answer

1


Link XML is perfectly valid!

But look at the tag <overview>, where there is a text in which a kind of quote appears after the word Drake, see:

<Overview>Uncharted: The Nathan Drake Collection combines the three
PlayStation 3 blockbuster Nathan Drake adventures in one package.
Included are the single-player campaigns for Uncharted: Drake’s Fortune,
Uncharted 2: Among Thieves, and Uncharted 3: Drake’s Deception. Thanks to the
power of PlayStation 4 hardware, all three games have been upgraded to run at
1080p and 60fps with better lighting, textures, and models. Also added are a
range of improvements and additions including Photo Mode and new trophies
</Overview>

This character is a Right Single Quotation Mark and may be in a different encoding than UTF-8, causing this error.

Another possibility is that when copying the XML content online to a local file, the file encoding was affected, causing the same error.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.