Java 将UTF-8文件转换为ISO-8859-1 新问题:

Java 将UTF-8文件转换为ISO-8859-1 新问题:,java,Java,来源: C:\\temp\\test.csv "Русслэнд";"Ελλάς";"Réunion" C:\\temp\\test.properties "\u0420\u0443\u0441\u0441\u043b\u044d\u043d\u0434";"\u0395\u03bb\u03bb\u03ac\u03c2";"R\u00e9unio" C:\\temp\\test.properties "????????", "?????","R궮ion" try {

来源:

C:\\temp\\test.csv
"Русслэнд";"Ελλάς";"Réunion"
C:\\temp\\test.properties
"\u0420\u0443\u0441\u0441\u043b\u044d\u043d\u0434";"\u0395\u03bb\u03bb\u03ac\u03c2";"R\u00e9unio"
C:\\temp\\test.properties
"????????", "?????","R궮ion"
try {

            File file = new File("C:\\temp\\test.csv");

            FileInputStream is = new FileInputStream(file);

            InputStreamReader r = new InputStreamReader(is, Charset.forName("UTF-8"));

            FileOutputStream os = new FileOutputStream("C:\\temp\\test.properties");

            OutputStreamWriter ow = new OutputStreamWriter(os, "ISO-8859-1");

            char[] buffer = new char[1024];

            int x;
            while ((x = r.read(buffer)) == buffer.length) {
                ow.write(buffer);
            }

            ow.write(buffer, 0, x);
            ow.flush();

            ow.close();
            r.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
预期结果:

C:\\temp\\test.csv
"Русслэнд";"Ελλάς";"Réunion"
C:\\temp\\test.properties
"\u0420\u0443\u0441\u0441\u043b\u044d\u043d\u0434";"\u0395\u03bb\u03bb\u03ac\u03c2";"R\u00e9unio"
C:\\temp\\test.properties
"????????", "?????","R궮ion"
try {

            File file = new File("C:\\temp\\test.csv");

            FileInputStream is = new FileInputStream(file);

            InputStreamReader r = new InputStreamReader(is, Charset.forName("UTF-8"));

            FileOutputStream os = new FileOutputStream("C:\\temp\\test.properties");

            OutputStreamWriter ow = new OutputStreamWriter(os, "ISO-8859-1");

            char[] buffer = new char[1024];

            int x;
            while ((x = r.read(buffer)) == buffer.length) {
                ow.write(buffer);
            }

            ow.write(buffer, 0, x);
            ow.flush();

            ow.close();
            r.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
当前结果:

C:\\temp\\test.csv
"Русслэнд";"Ελλάς";"Réunion"
C:\\temp\\test.properties
"\u0420\u0443\u0441\u0441\u043b\u044d\u043d\u0434";"\u0395\u03bb\u03bb\u03ac\u03c2";"R\u00e9unio"
C:\\temp\\test.properties
"????????", "?????","R궮ion"
try {

            File file = new File("C:\\temp\\test.csv");

            FileInputStream is = new FileInputStream(file);

            InputStreamReader r = new InputStreamReader(is, Charset.forName("UTF-8"));

            FileOutputStream os = new FileOutputStream("C:\\temp\\test.properties");

            OutputStreamWriter ow = new OutputStreamWriter(os, "ISO-8859-1");

            char[] buffer = new char[1024];

            int x;
            while ((x = r.read(buffer)) == buffer.length) {
                ow.write(buffer);
            }

            ow.write(buffer, 0, x);
            ow.flush();

            ow.close();
            r.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
代码:

C:\\temp\\test.csv
"Русслэнд";"Ελλάς";"Réunion"
C:\\temp\\test.properties
"\u0420\u0443\u0441\u0441\u043b\u044d\u043d\u0434";"\u0395\u03bb\u03bb\u03ac\u03c2";"R\u00e9unio"
C:\\temp\\test.properties
"????????", "?????","R궮ion"
try {

            File file = new File("C:\\temp\\test.csv");

            FileInputStream is = new FileInputStream(file);

            InputStreamReader r = new InputStreamReader(is, Charset.forName("UTF-8"));

            FileOutputStream os = new FileOutputStream("C:\\temp\\test.properties");

            OutputStreamWriter ow = new OutputStreamWriter(os, "ISO-8859-1");

            char[] buffer = new char[1024];

            int x;
            while ((x = r.read(buffer)) == buffer.length) {
                ow.write(buffer);
            }

            ow.write(buffer, 0, x);
            ow.flush();

            ow.close();
            r.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
**

老问题: **

如何在Java1.6中将一个大的UTF-8.csv文件转换为ISO-8859-1?我想读取给定的文件,转换并保存它

private byte[] convertToISO(File file, Charset enc) {
    // enc = Charset.forName("UTF-8");
    try {

        FileInputStream is = new FileInputStream(file);
        InputStreamReader r = new InputStreamReader(is, enc);

        char[] buffer = new char[1024];
        StringWriter w = new StringWriter();

        int x = 0;
        while ((x = r.read(buffer)) == buffer.length) {
            w.write(buffer);
        }

        w.write(buffer, 0, x);
        w.flush();

        String res = w.toString();

        r.close();
        return res.getBytes("ISO-8859-1");

    } catch (IOException e) {
        System.err.println("Failed to read file: " + file.getPath());
        e.printStackTrace();

        return null;
    }
}

我假设您正在尝试将结果打印到控制台。默认情况下,任何jdk/JRE在控制台中打印任何内容时都将使用UTF-8

要使用ISO-8859-1字符集,可以在JVM参数中使用
-Dfile.encoding=ISO-8859-1


或者,您可以如下所示配置您的IDE

您不想从UTF-8转换为ISO-8859-1,而是想将unicode字符转义为ASCII流。这与仅仅重新编码不同

下面是一个函数,它在写入输出流时动态地转义unicode字符:

public class OutputEscapingStreamWriter extends OutputStreamWriter {

    public OutputEscapingStreamWriter(OutputStream out, Charset cs) {
        super(out, cs);
    }

    public OutputEscapingStreamWriter(OutputStream out) {
        super(out);
    }

    public OutputEscapingStreamWriter(OutputStream out, String cs) throws UnsupportedEncodingException {
        super(out, cs);
    }

    public OutputEscapingStreamWriter(OutputStream out, CharsetEncoder cs) {
        super(out, cs);
    }

    private static String HEX_DIGITS = "0123456789abcdef";

    @Override
    public void write(int c) throws IOException {
        if (c < 128) {
            super.write(c);
        }
        else {
            super.write(toHexString(c));
        }
    }

    @Override
    public void write(String str, int off, int len) throws IOException {
        for (int i = off; i < (off + len); i++) {
            write(str.charAt(i));
        }
    }

    @Override
    public void write(char cbuf[], int off, int len) throws IOException {
        for (int i = off; i < (off + len); i++) {
            write(cbuf[i]);
        }
    }

    private String toHexString(int c) {
        StringBuilder sb = new StringBuilder("\\u");
        sb.append(HEX_DIGITS.charAt((c & 0xF000) >> 12));
        sb.append(HEX_DIGITS.charAt((c & 0x0F00) >> 8));
        sb.append(HEX_DIGITS.charAt((c & 0x00F0) >> 4));
        sb.append(HEX_DIGITS.charAt((c & 0x000F) ));
        return sb.toString();
    }
}
一个快速而肮脏的单元测试,证明它产生了您期望的输出:

@Test
public void testConversion() throws Exception {
    ByteArrayOutputStream output = new ByteArrayOutputStream();
    OutputEscapingStreamWriter wrapper = new OutputEscapingStreamWriter(output);
    wrapper.write("\"Русслэнд\";\"Ελλάς\";\"Réunion\"");
    wrapper.flush();
    wrapper.close();
    String result = output.toString();

    assertEquals("\"\\u0420\\u0443\\u0441\\u0441\\u043b\\u044d\\u043d\\u0434\";\"\\u0395\\u03bb\\u03bb\\u03ac\\u03c2\";\"R\\u00e9union\"", 
            result);
}

问题是什么?输出文件不是iso。如何将转换后的数据从文件中再次保存到文件中?只需使用指定字符集的类而不是
StringWriter
,然后,像
OutputStreamWriter(outStream,“ISO-8859-1”)
当有非常好的工具(recode/unix2dos)时,为什么要使用Java 1.6执行此任务可在所有主要平台上使用?请发布真实代码(您没有显示调用
convertToISO
的代码,也没有显示写入文件的代码)。另外,请张贴您的输出有什么问题,包括一个例子。“is not iso”没有太多意义,因为信息太少,无法帮助您。感谢您如何迭代此字节数组返回值?我根据您的新问题描述更改了答案