Question: how to fix the Unicodeencorror in Python?

Asked

Viewed 1,452 times

2

Since I changed machines, I’ve been having the following problem with the interpreter Python:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 17: ordinal not in range(128)

Every string you send and receive is turning into Unicode. How can I fix this?

  • 1

    You can post the code that is causing the problem?

  • Are you using python 2 or python 3? is there any chance you changed the version from 3 to 2 when you changed the machine? ever tried to put # -- coding: utf-8 -- at the top of the file ? , you could put part of the code here in the question (mainly where uses the Unicode character) ?

1 answer

1

Below is an adapted translation of a response made in Stackoverflow (the original) for a problem similar to yours. You can see the original question here with your answers.


Translation:

Probably you are trying to print a text that contains foreign Unicode characters - in relation to basic ASCII 128 - which may be our own language. Try encode the Unicode string as ASCII 256 first:

unicodeData.encode('ascii', 'ignore')

The 'ignore' will tell you to skip these characters. From the Python documentation:

>>> u = unichr(40960) + u'abcd' + unichr(1972)
>>> u.encode('utf-8')
'\xea\x80\x80abcd\xde\xb4'
>>> u.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128)
>>> u.encode('ascii', 'ignore')
'abcd'
>>> u.encode('ascii', 'replace')
'?abcd?'
>>> u.encode('ascii', 'xmlcharrefreplace')
'&#40960;abcd&#1972;'

It might be useful to read the article "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"*, I think it’s a good tutorial for what’s going on. After reading, you will stop feeling that you are only finding what the commands do (at least, that’s what happened to me).


NOTE:

There is a translation of the mentioned article as indicated by @jsbueno. Such article is authored by Joel Spolsky and the translation is by Paulo André de Andrade. See here.

  • 1

    The linked article in the answer is in fact a very complete introduction to Unicode - and everyone (everyone, no matter what language they’re using, or if they think they don’t need an accent) should read it. There is a translation of it here: http://local.joelonsoftware.com/wiki/O_M%C3%Adnimo_absoluto_que_todos_programadores_de_software_they need,_Absolutely,Positivamente_de_Saber_Sobre_Unicode_e_Conjuntos_de_Caracteres%28Sem_Desculpas! %29

  • (but that answer is far from enlightening, complete or good - addresses part of the same problem, yes - but only that)

  • @jsbueno I will edit the answer to put this translation! :)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.