How to identify if an XML is GOOD?


I have the following problem regarding XML encoding:

Erro: Byte inválido 2 da sequência UTF-8 do byte 3.

This error occurs when trying to canonize an XML.

I don’t know exactly what the error might be, I imagine it is due to the String having the GOOD character, so someone would tell me if there is a function or library in java to identify if the XML is with the GOOD character? Or some function that removes the GOOD?

You can use the apache library Bominpuststream that she does this work for you, I’ve had this problem, and I can safely tell you that using this library makes it easier for you. A hint because I also manipulated XML, you must take the content with vector bytes, verify with the suggested API, and then transinform the String in the charset UTF-8, just so you won’t miss the graphic accent.

Stretch to transform the source in inputStream

String source = FileUtil.takeOffBOM(IOUtils.toInputStream(attachment.getValue()));

Method to take out the GOOD

public static String takeOffBOM(InputStream inputStream) throws IOException {
    BOMInputStream bomInputStream = new BOMInputStream(inputStream);
    return IOUtils.toString(bomInputStream, "UTF-8");


I adapted the class below the article found in the link: Removing BOM Character from a String in Java


public class BOM {
private String bomString = "";
private final static String ISO_ENCODING = "ISO-8859-1";
private final static String UTF8_ENCODING = "UTF-8";
private final static int UTF8_BOM_LENGTH = 3;

public void BOM(String text) throws UnsupportedEncodingException {
    this.bomString = text;

public String removeBOM() {
    final byte[] bytes = this.bomString.getBytes(ISO_ENCODING);
    if (isUTF8(bytes)) {
        return SkippedBomString(bytes);
    } else {
        return this.bomString;

private String getSkippedBomString(final byte[] bytes) throws UnsupportedEncodingException {
    int length = bytes.length - UTF8_BOM_LENGTH;
    byte[] barray = new byte[length];
    System.arraycopy(bytes, UTF8_BOM_LENGTH, barray, 0, barray.length);
    return new String(barray, ISO_ENCODING);

private boolean isUTF8(byte[] bytes) {
    if ((bytes[0] & 0xFF) == 0xEF &&
        (bytes[1] & 0xFF) == 0xBB &&
        (bytes[2] & 0xFF) == 0xBF) {
        return true;
    return false;


