Decode HTML entities in a Python string

Question

Decode HTML entities in a Python string

Asked 11 years, 3 months ago

Viewed 481 times

4

I’m using Python 3 to access a web API. The response to the requests comes in the JSON standard and my problem is that one of the Strings comes encoded with HTML entities (specifically accentuation).

For example:

"orienta&ccedil;&atilde;o-a-objetos"

Is there any parser return strings with solved HTML characters?

2 answers

Browser other questions tagged html python python-3.x htmlentities

You are not signed in. Login or sign up in order to post.

by Nigini • **1,224** points · Answer 1 · 2014-12-02T18:59:12+00:00

I found this one for Python 3.4+ :

>>> import html
>>> html.unescape('orienta&ccedil;&atilde;o-a-objetos')
'orientação-a-objetos'

In the case of Python 3 (versions prior to 3.4):

>>> import html.parser
>>> h = html.parser.HTMLParser()
>>> h.unescape('orienta&ccedil;&atilde;o-a-objetos')
'orientação-a-objetos'

by britodfbr • **688** points · Answer 2 · 2018-02-25T11:32:21+00:00

It is also possible to use Beautifulsoup, bs4 for Py3+ or Bs for Py3-, which in addition to converting the HTML encoding to ascii, also allows working with the HTML elements individually (if there is in the input string).

from bs4 import BeautifulSoup
s='orienta&ccedil;&atilde;o-a-objetos'
t = BeautifulSoup(s, 'html.parser')
print(t.get_text())