Convert encoding CP850 to UTF8

Asked

Viewed 1,875 times

4

I have a BD Paradox that returns me the following string after a query in a table of the sector that a certain worker is: Manutenþòo Elútrica. Actually it should be Electrical Maintenance. I need to return this string to a browser.

By my researches, this is a CP850 encoding that I need to convert to UTF-8, which is the encoding that is usually used. I saw this in the link:

http://codepage-encoding.online-domain-tools.com/run/? tool=codepage&inputType=frm-text&text=Manuten%C3%BE%C3%92o%20El%C3%9Atrica&sourceCodepage=UTF-8&targetCodepage=CP850&Convert=do

Here’s what I’m trying to do on C#:

Encoding utf8 = Encoding.UTF8;
Encoding cp = Encoding.GetEncoding(850);

byte[] cpBytes = cp.GetBytes(identifColaborador.setor);//aqui já vem como ManutenþÒo ElÚtrica 

byte[] utf8Bytes = Encoding.Convert(cp, utf8, cpBytes);
string msg = utf8.GetString(utf8Bytes);

But unfortunately I’m not succeeding. It still returns in the string msg Manutenþòo Elútrica

Where can I be missing?

1 answer

4


Its code has no effect on the returned string, as it is starting from an abstract representation and arriving at another abstract representation. I don’t know if I can explain, I’ll try to give a fictional example:

// Letra (code point)          Encoding A            Encoding B
// a                           0xAA 0xBB             0xCC
// b                           0xDD                  0xEE oxFF

string original = "aaba";

byte[] aBytes = a.GetBytes(original);
// aBytes = [0xAA 0xBB 0xAA 0xBB 0xCC 0xAA 0xBB]

byte[] bBytes = Encoding.Convert(a, b, aBytes);
// bBytes = [0xDD 0xDD 0xEE oxFF 0xDD ]

string msg = b.GetString(bBytes);
// msg = "aaba"

Any string that goes through this process will remain unchanged (unless one of the encodings does not support any of the characters). To fix your problem, you need the contents of the string identifColaborador.setor be interpreted in the correct encoding before turning into string.

If this is not possible, and you have to work with the string already in its abstract representation, then the correct is to try to interpret the bytes that make up the string without converting. I mean, just take it aBytes and turn into string according to encoding B. The code below worked on the ideone, but may not work on your system, so try different values for seuEncoding (UTF-8, UTF-16, Cp1252, ISO-Latin, Encoding.Default, etc.).

Encoding seuEncoding = Encoding.GetEncoding("Cp1252");
Encoding cp850 = Encoding.GetEncoding(850);

byte[] cpBytes = cp850.GetBytes("ManutenþÒo ElÚtrica");
string msg = seuEncoding.GetString(cpBytes);
  • Marcelo, thanks for the tip. I just needed to exchange Cp1252 for Windows-1252 and everything ran straight... Vlw

Browser other questions tagged

You are not signed in. Login or sign up in order to post.