After a long time researching I managed to solve the problem.
The problem was the file encoding, Filereader by default uses the default OS encoding, in the case of Android UTF-8. When the file did not come in UTF-8 accents were lost.
While doing a lot of research I found a library (which I believe is owned by Mozilla), the Juniversalchardet. That library can in most cases determine which encoding the file was saved to. I say in most cases, because from what I’ve researched, she can’t always identify the encoding and even when she does, she doesn’t always get it right. I tested for about 20 files created in different OS and different programs, she got it right every time, so I’m pretty satisfied with this "most of the time".
To add the library to the Android project simply import it into the Gradle dependencies:
compile group: 'com.googlecode.juniversalchardet', name: 'juniversalchardet', version: '1.0.3'
Here is worth a parentese, Juniversalchardet is a Java library, so you can download the JAR from it or add it from the repository Maven.
The method that performs the reading of the file starts to read according to the encoding identified by the method getEncoding():
public String lerAquivo(File arquivo) {
String texto;
String linha;
BufferedReader br;
try {
texto = "";
br = new BufferedReader(new InputStreamReader(new FileInputStream(arquivo), getEncoding(arquivo)));
while ((linha = br.readLine()) != null) {
if (!texto.equals("")) {
texto += "\n";
}
texto += linha;
}
} catch (Exception e) {
texto = "";
}
return texto;
}
And finally the getEncoding() method, which identifies in which encoding a file has been saved:
private String getEncoding(File arquivo) {
UniversalDetector detector;
String encoding;
byte[] buf;
java.io.FileInputStream fis;
int nread;
try {
buf = new byte[4096];
fis = new java.io.FileInputStream(arquivo);
detector = new UniversalDetector(null);
while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
detector.handleData(buf, 0, nread);
}
detector.dataEnd();
encoding = detector.getDetectedCharset();
if (encoding == null) {
encoding = "UTF-8";
}
detector.reset();
} catch (Exception e) {
encoding = "UTF-8";
}
return encoding;
}
Note that if the encoding is not found set the UTF-8 as default, because in my case reading without accents is better than reading nothing.
The file name or content that has accents?
– Valdeir Psr
The content has accents.
– Jônatas Trabuco Belotti