This change is actually the main change from Python 2 to Python 3 - and
basically on account of her that they opted for the transition with breach of compatibility. All text in Python3 is now "text", does not have an automatic 1:1 mapping for byte values. In practice the class str
now behaves exactly like the "Unicode" class behaved in Python 2.
Starting with version 3.3 of Python, to facilitate the writing of programs that could work simultaneously in Python 2 and Python3, the prefix u
for strings was re-introduced. In Python 2, this prefix implied that the string was "Unicode", not "str". In Python 3, it does absolutely nothing - the string remains "Unicode". Ex.: u"maçã"
, b"nao pode ter acentos"
On the other hand, the prefix b
that did nothing in Python2, indicates that one is writing an object of the type bytes
in Python 3. This, on the other hand, although it can be used in Apis that actually expect values in bytes (with text already encoded according to some convention for accents, the so-called "encodings" (e.g. latin1, utf-8, cp-852)). But above all, in Python 3 if you try to recover a single element of an "str/bytes" object in Python 2, the result is a "str/bytes" object of length 1. In Python 3, you get a number between 0 and 255 - as with char pointer strings in C:
Python 2:
Python 2.7.17 (default, Nov 7 2019, 10:07:09)
[GCC 7.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a = b"apple"
>>> a[0]
'a'
>>> b = "maçã"
>>> len(b)
6
>>> b = u"maçã"
>>> len(b)
4
Python 3.8.0+ (heads/3.8:d04661f, Oct 24 2019, 09:19:45)
Python 3:
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = b"apple"
>>> a[0]
97
>>> b = "maçã"
>>> len(b)
4
>>> b = u"maçã"
>>> len(b)
4
>>> b[3]
'ã'
For those who are not yet familiar with "Unicode" and "text as bytes", I suggest reading an article written in 2003 by Joel (the founder of stackoverflow):https://www.scribd.com/document/3181016/Programacao-Joel-on-Unicode
Can you post what code snippet specifically? In time, https://docs.python.org/3/howto/unicode.html
– Leonardo Pessoa
No need for code. Anyone called
unicode(minha_string)
generates the error described.– Wallace Maxters