How to read files with accents on Android?

Asked

Viewed 285 times

0

Hello, I’m having trouble reading files with accents on Android.

I am using the following method to perform the reading:

public String lerAquivo(File arquivo) {
    String texto;
    String linha;
    BufferedReader br;

    try {
        texto = "";
        br = new BufferedReader(new FileReader(arquivo));

        while ((linha = br.readLine()) != null) {
            if (!texto.equals("")) {
                texto += "\n";
            }

            texto += linha;
        }
    } catch (Exception e) {
        texto = "";
    }

    return texto;
}

It is working, but with a problem. For some files that contain accents and special characters these characters are not read correctly. Someone knows the solution?

I know it probably has to do with the encoding. I searched and saw that I had to set the file encoding at the time of reading. But if this is really how do I find the file encoding?

1 answer

1


After a long time researching I managed to solve the problem.

The problem was the file encoding, Filereader by default uses the default OS encoding, in the case of Android UTF-8. When the file did not come in UTF-8 accents were lost.

While doing a lot of research I found a library (which I believe is owned by Mozilla), the Juniversalchardet. That library can in most cases determine which encoding the file was saved to. I say in most cases, because from what I’ve researched, she can’t always identify the encoding and even when she does, she doesn’t always get it right. I tested for about 20 files created in different OS and different programs, she got it right every time, so I’m pretty satisfied with this "most of the time".

To add the library to the Android project simply import it into the Gradle dependencies:

compile group: 'com.googlecode.juniversalchardet', name: 'juniversalchardet', version: '1.0.3'

Here is worth a parentese, Juniversalchardet is a Java library, so you can download the JAR from it or add it from the repository Maven.

The method that performs the reading of the file starts to read according to the encoding identified by the method getEncoding():

public String lerAquivo(File arquivo) {
    String texto;
    String linha;
    BufferedReader br;

    try {
        texto = "";
        br = new BufferedReader(new InputStreamReader(new FileInputStream(arquivo), getEncoding(arquivo)));

        while ((linha = br.readLine()) != null) {
            if (!texto.equals("")) {
                texto += "\n";
            }

            texto += linha;
        }
    } catch (Exception e) {
        texto = "";
    }

    return texto;
}

And finally the getEncoding() method, which identifies in which encoding a file has been saved:

private String getEncoding(File arquivo) {
    UniversalDetector detector;
    String encoding;
    byte[] buf;
    java.io.FileInputStream fis;
    int nread;

    try {
        buf = new byte[4096];
        fis = new java.io.FileInputStream(arquivo);
        detector = new UniversalDetector(null);

        while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
            detector.handleData(buf, 0, nread);
        }

        detector.dataEnd();

        encoding = detector.getDetectedCharset();

        if (encoding == null) {
            encoding = "UTF-8";
        }

        detector.reset();
    } catch (Exception e) {
        encoding = "UTF-8";
    }

    return encoding;
}

Note that if the encoding is not found set the UTF-8 as default, because in my case reading without accents is better than reading nothing.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.