3
I’m having trouble figuring out the encoding of a string.
The entrance is:
São Paulo
The original reading of this content is not my job, because the text goes through a Lua wrapper for Java.
On my side, I have already made the following attempt "brute force" and do not find the correct conversion:
byte[] bytes1 = entrada.getBytes();
System.out.println(Arrays.toString(bytes1));
System.out.println(new String(bytes1));
System.out.println(new String(bytes1, StandardCharsets.UTF_8));
System.out.println(new String(bytes1, StandardCharsets.ISO_8859_1));
System.out.println(new String(bytes1, StandardCharsets.US_ASCII));
byte[] bytes2 = entrada.getBytes(StandardCharsets.UTF_8);
System.out.println(Arrays.toString(bytes2));
System.out.println(new String(bytes2));
System.out.println(new String(bytes2, StandardCharsets.UTF_8));
System.out.println(new String(bytes2, StandardCharsets.ISO_8859_1));
System.out.println(new String(bytes2, StandardCharsets.US_ASCII));
byte[] bytes3 = entrada.getBytes(StandardCharsets.ISO_8859_1);
System.out.println(Arrays.toString(bytes3));
System.out.println(new String(bytes3));
System.out.println(new String(bytes3, StandardCharsets.UTF_8));
System.out.println(new String(bytes3, StandardCharsets.ISO_8859_1));
System.out.println(new String(bytes3, StandardCharsets.US_ASCII));
byte[] bytes4 = entrada.getBytes(StandardCharsets.US_ASCII);
System.out.println(Arrays.toString(bytes4));
System.out.println(new String(bytes4));
System.out.println(new String(bytes4, StandardCharsets.UTF_8));
System.out.println(new String(bytes4, StandardCharsets.ISO_8859_1));
System.out.println(new String(bytes4, StandardCharsets.US_ASCII));
And I got the next exit, all wrong:
[83, -29, -81, -96, 80, 97, 117, 108, 111]
S㯠Paulo
S㯠Paulo
S㯠Paulo
S���Paulo
[83, -29, -81, -96, 80, 97, 117, 108, 111]
S㯠Paulo
S㯠Paulo
S㯠Paulo
S���Paulo
[83, 63, 80, 97, 117, 108, 111]
S?Paulo
S?Paulo
S?Paulo
S?Paulo
[83, 63, 80, 97, 117, 108, 111]
S?Paulo
S?Paulo
S?Paulo
S?Paulo
Can anyone help me? I thank you in advance.
In fact, because the
o
of Paulo should be the same code aso
of São. The data is corrupted in the input. The best thing the author would do, would be to show byte to byte input in hexa, for analysis.– Bacco
Thank you very much for your attention. I agree with you, and I suspected it too. I will see what I can do about reading on the Moon.
– Jemerson Damásio