Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/image-processing/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
java:如何将文件转换为utf8_Java_Utf 8 - Fatal编程技术网

java:如何将文件转换为utf8

java:如何将文件转换为utf8,java,utf-8,Java,Utf 8,我有一个文件,其中包含一些非utf8字符(如“ISO-8859-1”),因此我想将该文件(或读取)转换为utf8编码,我该如何做 代码是这样的: File file = new File("some_file_with_non_utf8_characters.txt"); /* some code to convert the file to an utf8 file */ ... 编辑:放置一个编码示例您只想将其读取为UTF-8? 我最近做了一件类似的事情,就是用-Dfile.encod

我有一个文件,其中包含一些非utf8字符(如“ISO-8859-1”),因此我想将该文件(或读取)转换为utf8编码,我该如何做

代码是这样的:

File file = new File("some_file_with_non_utf8_characters.txt");

/* some code to convert the file to an utf8 file */

...

编辑:放置一个编码示例

您只想将其读取为UTF-8? 我最近做了一件类似的事情,就是用-Dfile.encoding=UTF-8启动JVM,并正常读取/打印。我不知道这是否适用于你的情况

有了这一选择:

System.out.println("á é í ó ú")

正确打印字符。否则它会打印一个?symbol

您需要知道输入文件的编码。例如,如果文件是拉丁语-1,您可以这样做

        FileInputStream fis = new FileInputStream("test.in");
        InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1");
        Reader in = new BufferedReader(isr);
        FileOutputStream fos = new FileOutputStream("test.out");
        OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");
        Writer out = new BufferedWriter(osw);

        int ch;
        while ((ch = in.read()) > -1) {
            out.write(ch);
        }

        out.close();
        in.close();

你把文本解码了。您可以使用simmetric Writer/OutputStream方法,使用您喜欢的编码(例如UTF-8)来编写它。

以下代码将文件从src编码转换为tgtEncoding:

public static void transform(File source, String srcEncoding, File target, String tgtEncoding) throws IOException {
    BufferedReader br = null;
    BufferedWriter bw = null;
    try{
        br = new BufferedReader(new InputStreamReader(new FileInputStream(source),srcEncoding));
        bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding));
        char[] buffer = new char[16384];
        int read;
        while ((read = br.read(buffer)) != -1)
            bw.write(buffer, 0, read);
    } finally {
        try {
            if (br != null)
                br.close();
        } finally {
            if (bw != null)
                bw.close();
        }
    }
}
--编辑--

使用Try with resources(Java 7):


非UTF8?想把范围缩小一点吗?一旦你知道了输入编码,这就很容易了,如果你不知道,基本上是不可能的。考虑到一些因素,文件很大(比如1GB),所以我不能把它们放在字符串对象中…你的文件编码是什么?如果您使用的是Linux或OS X(以及其他Unx),您只需键入:*file some_file,它就会告诉您编码。顺便说一句,如果您在Unx上(至少在Linux和OS X上),您应该有*iconv命令行。“man iconv”说:“将给定文件的编码从一种编码转换为另一种编码”,这可以证明在1GB文件上比自编Java util做得更好。请注意,UTF-8编码可以表示每个Unicode编码点,因此说文件“有一些非utf8字符”听起来很可疑…@NoozNooz42:应用程序将在win32和unix/LinuxSummated中运行:用文件自己的编码读取它,然后用新的编码写入。@McD:我打算发表相同的评论。这是对
-Dfile.encoding
用法的误解。忽略我的评论,你是对的。顺便说一句,以前还没有见过这种风格的最后收场。聪明。逐行阅读的潜在问题是你可以改变行尾/分隔。例如,如果最后一行没有行尾,您将添加一行。这是完全正确的。事实上,这种效果通常是令人满意的(与其说是“改变”,不如说是“磨光”)。但是,是的,人们必须意识到这一点。嗨,如果我不知道源/输入编码格式怎么办?你能给它照点光吗。
public static void transform(File source, String srcEncoding, File target, String tgtEncoding) throws IOException {
    BufferedReader br = null;
    BufferedWriter bw = null;
    try{
        br = new BufferedReader(new InputStreamReader(new FileInputStream(source),srcEncoding));
        bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding));
        char[] buffer = new char[16384];
        int read;
        while ((read = br.read(buffer)) != -1)
            bw.write(buffer, 0, read);
    } finally {
        try {
            if (br != null)
                br.close();
        } finally {
            if (bw != null)
                bw.close();
        }
    }
}
public static void transform(File source, String srcEncoding, File target, String tgtEncoding) throws IOException {
    try (
      BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(source), srcEncoding));
      BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding)); ) {
          char[] buffer = new char[16384];
          int read;
          while ((read = br.read(buffer)) != -1)
              bw.write(buffer, 0, read);
    } 
}