Convert ISO-8859-1 string to UTF-8 in java

Asked

Viewed 4,117 times

1

My goal is to create a converter ISO-8859-1 for UTF-8.

I already have this code:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;
import org.apache.commons.lang3.StringUtils;

public class Converter {

    public static void main(String... args) throws IOException {

    BufferedReader in = null;
    try {
        File fileDir = new File("Mensagens.java");

        in = new BufferedReader(
           new InputStreamReader(new FileInputStream(fileDir), "ISO-8859-1"));

        String strISO;
        String strUTF8 = null;

        while ((strISO = in.readLine()) != null) {
            byte[] isoBytes = strISO.getBytes("ISO-8859-1");
            String value = new String(isoBytes, "UTF-8"); 
            if(strUTF8 == null ){
                strUTF8 = value;
            }else{
                strUTF8 += value;       
            }   
            System.out.println("ISO : "+strISO);
            System.out.println("UTF : "+value);
        }
        }
        catch (UnsupportedEncodingException e){System.out.println(e.getMessage());}
        catch (IOException e){System.out.println(e.getMessage());}
        catch (Exception e){System.out.println(e.getMessage());}
        finally{
            in.close(); 
        }
        //System.out.println(strUTF8);
    }
}

But the exit in UTF-8 doesn’t work.

I ask you:

What do I need to put on

byte[] isoBytes = strISO.getBytes("ISO-8859-1");
String value = new String(isoBytes, "UTF-8"); 
if(strUTF8 == null ){
    strUTF8 = value;
}else{
    strUTF8 += value;       
}   
System.out.println("ISO : "+strISO);
System.out.println("UTF : "+value);

to make the two ISO outputs UTF equal?

Current exit:

ISO : "Já existe lançamento com a mesma Nota Fiscal e Fornecedor.";
UTF : "J� existe lan�amento com a mesma Nota Fiscal e Fornecedor.";

Desired exit:

ISO : "Já existe lançamento com a mesma Nota Fiscal e Fornecedor.";
UTF : "Já existe lançamento com a mesma Nota Fiscal e Fornecedor.";

Testing:

PrintStream outISO = new PrintStream(System.out, true, "ISO-8859-1");
PrintStream outUTF8 = new PrintStream(System.out, true, "UTF-8");
outISO.println("ISO : " + strISO);
outUTF8.println("UTF : " + value);
ISO : "J� existe lan�amento com a mesma Nota Fiscal e Fornecedor."; 
UTF: "J� existe lan�amento com a mesma Nota Fiscal e Fornecedor.";


PrintStream outISO = new PrintStream(System.out, true, "UTF-8");
PrintStream outUTF8 = new PrintStream(System.out, true, "UTF-8");
outISO.println("ISO : " + strISO);
outUTF8.println("UTF : " + value);
ISO : "Já existe lançamento com a mesma Nota Fiscal e Fornecedor.";
UTF : "J� existe lan�amento com a mesma Nota Fiscal e Fornecedor.";



PrintStream outISO = new PrintStream(System.out, true, "UTF-8");
PrintStream outUTF8 = new PrintStream(System.out, true, "ISO-8859-1");
outISO.println("ISO : " + strISO);
outUTF8.println("UTF : " + value);
ISO : "Já existe lançamento com a mesma Nota Fiscal e Fornecedor.";
UTF : "J? existe lan?amento com a mesma Nota Fiscal e Fornecedor.";



PrintStream outISO = new PrintStream(System.out, true, "ISO-8859-1");
PrintStream outUTF8 = new PrintStream(System.out, true, "ISO-8859-1");
outISO.println("ISO : " + strISO);
outUTF8.println("UTF : " + value);
ISO : "J� existe lan�amento com a mesma Nota Fiscal e Fornecedor.";
UTF : "J? existe lan?amento com a mesma Nota Fiscal e Fornecedor.";
  • Would not be System.out.println("UTF : "+strUTF8);?

  • Not because strUTF8 is incremented strUTF8 += value. Otherwise every time I printed it would be duplicated (with progression).

  • In the terminal, I’m not using ide

  • Was the answer helpful to you? Don’t forget to mark it so it can be used if someone has a similar question!

  • Even with these answers has not yet been able to find a solution.

1 answer

3

The problem is that the System.out.println shows only in a encoding, then to show with encodings different you could use the PrintStream:

PrintStream outISO = new PrintStream(System.out, true, "ISO-8859-1");
PrintStream outUTF8 = new PrintStream(System.out, true, "UTF-8");

outISO.println("ISO : " + strISO);
outUTF8.println("UTF : " + value);

Or:

System.setOut(new PrintStream(System.out, true, "ISO-8859-1"));
System.out.println("ISO : " + strISO);

System.setOut(new PrintStream(System.out, true, "UTF-8"));
System.out.println("UTF : " + value);

Internally Java works with UTF-8 then when you read the file you are passing from ISO-8859-1 for UTF-8. Its variable strISO really should be strUTF8 and its conversion reversed:

byte[] utf8Bytes = strUTF8.getBytes("UTF-8");
String value = new String(utf8Bytes, "ISO-8859-1"); 
  • It didn’t. The two exits : ISO : "J� existe lan�amento com a mesma Nota Fiscal e Fornecedor."; UTF: "J� existe lan�amento com a mesma Nota Fiscal e Fornecedor.";

  • @Brunorozendo I’m thinking your variable strISO is in UTF-8, tries to print her on outUTF8 to see what you can do

  • I added the tests to the answer

  • @Brunorozendo Beauty, as I suspected, your file is on ISO-8859-1 But internally the Java works with UTF-8 then when you read the file you are passing from ISO-8859-1 for UTF-8.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.