How to identify if an XML is GOOD?

Asked

Viewed 1,164 times

4

I have the following problem regarding XML encoding:

Erro: Byte inválido 2 da sequência UTF-8 do byte 3.

This error occurs when trying to canonize an XML.

I don’t know exactly what the error might be, I imagine it is due to the String having the GOOD character, so someone would tell me if there is a function or library in java to identify if the XML is with the GOOD character? Or some function that removes the GOOD?

2 answers

3


You can use the apache library Bominpuststream that she does this work for you, I’ve had this problem, and I can safely tell you that using this library makes it easier for you. A hint because I also manipulated XML, you must take the content with vector bytes, verify with the suggested API, and then transinform the String in the charset UTF-8, just so you won’t miss the graphic accent.

Stretch to transform the source in inputStream

String source = FileUtil.takeOffBOM(IOUtils.toInputStream(attachment.getValue()));

Method to take out the GOOD

public static String takeOffBOM(InputStream inputStream) throws IOException {
    BOMInputStream bomInputStream = new BOMInputStream(inputStream);
    return IOUtils.toString(bomInputStream, "UTF-8");
}

1

I adapted the class below the article found in the link: Removing BOM Character from a String in Java

import java.io.UnsupportedEncodingException;

public class BOM {
private String bomString = "";
private final static String ISO_ENCODING = "ISO-8859-1";
private final static String UTF8_ENCODING = "UTF-8";
private final static int UTF8_BOM_LENGTH = 3;

public void BOM(String text) throws UnsupportedEncodingException {
    this.bomString = text;
}

public String removeBOM() {
    final byte[] bytes = this.bomString.getBytes(ISO_ENCODING);
    if (isUTF8(bytes)) {
        return SkippedBomString(bytes);
    } else {
        return this.bomString;
    } 
}

private String getSkippedBomString(final byte[] bytes) throws UnsupportedEncodingException {
    int length = bytes.length - UTF8_BOM_LENGTH;
    byte[] barray = new byte[length];
    System.arraycopy(bytes, UTF8_BOM_LENGTH, barray, 0, barray.length);
    return new String(barray, ISO_ENCODING);
}


private boolean isUTF8(byte[] bytes) {
    if ((bytes[0] & 0xFF) == 0xEF &&
        (bytes[1] & 0xFF) == 0xBB &&
        (bytes[2] & 0xFF) == 0xBF) {
        return true;
    }
    return false;
}

}

Browser other questions tagged

You are not signed in. Login or sign up in order to post.