The "prefix b" indicates that the object you have at hand is not a text string - but a set of bytes -
In Python 3 the two things are fundamentally different, why you always need to know how the text is coded in bytes to be able to transform them into characters. Nowadays it is increasingly common for the text to be in the "utf-8" encoding, but some legacy systems and Windows use the "latin-1" encoding - which allows all characters in the Portuguese language to be in a single byte.
Python objects of type "bytes" have a "Decode" method - just call it and the result will be the text string (which is specified in Python without the prefix 'b'). but beyond the "Decode" method, the call str(xml, 'utf-8')
would also make this transformation - the error message changes. Since it is not the Python error saying that there is an invalid utf-8 sequence, the chance is that your XML is in utf-8 - only ODBC complains of an invalid character: utf-8 supports universal characters - other encodings, such as latin1, no - if there are characters in languages with Greek, Russian, Hebrew, or even punctuation signs that are not defined in Latin-1, an error will occur, which may well be that.
The remedy would be to force an Escaping encoding to pass the data to the driver - only, here’s another problem: the function does not accept bytes (the already encoded text). Result: you will have to maim the Python text, replacing all the characters outside of "latin1" with "?", turn it back into text and then make your call. There, if there is no other error in XML should work.
I’d recommend contacting whoever designed the bank you’re feeding to accept universal coding.
To understand more about these processes, now stop everything you’re doing and read http://local.joelonsoftware.com/wiki/O_M%C3%ADnimo_Absoluto_Que_Todos_os_Programadores_de_Software_Precisam,_Absolutamente,_Positivamente_de_Saber_Sobre_Unicode_e_Conjuntos_de_Caracteres_(Sem_Desculpas!)
To fix your problem and remove problematic characters from the text:
An error equivalent to this is what is now occurring within the ODBC code - if you send a text with Cyrillic characters, for example:
In [119]: a = "texto inválido: Ут пауло интерессет темпорибус пер"
In [120]: a.encode("latin-1")
UnicodeEncodeError Traceback (most recent call last)
So - you must: decode your data using utf-8, code it back to latin-1, swap the unknown characters for "?" ,and decode back to text - there will be data that can be sent to your database:
In [122]: dados
Out[122]: b'texto inv\xc3\xa1lido: \xd0\xa3\xd1\x82'
In [123]: dados_str = dados.decode("utf-8").encode("latin1", errors="replace").decode("latin1")
In [124]: dados_str
Out[124]: 'texto inválido: ??'
(The "data" variable in this example is equivalent to what you have there at the beginning: a bytes object representing text encoded in utf-8, with invalid characters in latin-1). If you keep having the same mistake não é possível alternar a codificação
, expriemnte filter out all not ASCII characters - use "ASCII" instead of "latin-1" in the above code.
He’s not complaining about that comma right at the beginning of XML?
– Giovanni Nunes
agree with @Giovanni, I think this comma should generate this error...
– aa_sp
I’m sorry, the comma at the beginning of XML was a typo. I added the error message to the topic when trying to save XML with b in front.
– Gustavo Primo
Include the code you use to generate this string and the type of the xml column in your database.
– Leandro Angelo