The `" Xhh" inside a string indicates that the next two characters (initialized by "H") will be interpreted as hexadecimal digits, and therefore is a way to represent any arbitrary byte inside a Python string.
Thus, b"\xff"
will match a byte string with a single byte of value 255 (ff in hexadecimal).
It is important to keep in mind that in Python 3, as in the Unicode strings of Python 2, a byte of these would not necessarily correspond to a character. Because of the specific encoding used for Python 3 text, all bytes from 0 to 255 correspond to the character encoding known as "latin1" - the same used in many versions of Windows for Brazilian Portuguese. This means that any arbitrary byte specified with the prefix "\xHH"
will match a printable Python text character 3.
An interesting experiment can be to write numerical data in a binary file, read them as text and see how the representation appears:
In [23]: f = open("teste.bin", "wb")
In [24]: f.write(bytearray((0, 0, 255, 255, 128, 128)))
Out[24]: 6
In [25]: f.close()
In [26]: open("teste.bin", encoding="latin1").read()
Out[26]: '\x00\x00ÿÿ\x80\x80'
(In this case, the character Ÿ has the code 255 (0xff):)
In [30]: print(" xff")
lute
Similarly, in Python 3 (and Unicode strings from Python2), the prefix \u
allows designating a direct Unicode character by its Codepoint value - for codepoints up to 16 bits (four hexadecimal digits)
So, for example, the Codepoint 0x263A character, which is the smiley-face emoji, can be placed directly in Python source code:
In [42]: a = "\u263a"
In [43]: print(a)
☺
And for more "far" characters, the prefix \U
(uppercase "U") allows 8 hex digits - to express characters with Codepoint greater than 65535 (0xffff). The semantics of " Xhh", "uhhhh" and " UHHHHHHHH" are the same.
Now, what might be interesting is that sometimes we get a string "encoded twice" - that is, in that sequence of \xHH
has in fact four characters (for example, if we save a file . txt with the sequence \x41
- so it’s a 4 byte file). If we want to read the only character represented by the byte 0x41 (capital "A"), we have to do some maneuvering. To simplify we can simply escape the " " by typing " " in a Python string (always Python 3):
In [37]: a
Out[37]: 'A'
In [38]: a = "\\x41"
In [39]: len(a)
Out[39]: 4
In [40]: a
Out[40]: '\\x41'
That is - in this case, we have the " as a separate character - and not as a character that is combined with the "x" and the next two digits at compile time by Python. In order to "compile" this for a single bytem we have to "decode" (Decode) this text using the special "unicode_escape" codec. Only, it’s not so simple - you can’t apply "Decode" to a text in Python 3, because it’s already considered "decoded" - you need to have a byte-string in order to call the Decode method. Since our variable "a" is a string, the solution is to convert it first to bytes, using the "NCODE" method - we use the encoding "latin1" which conventionally does not change any content value, as long as it is a character with code less than 255:
In [41]: a.encode("latin1").decode("unicode_escape")
Out[41]: 'A'