Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/334.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
转换XML无效UTF-8 XMLOutputter Java_Java_Xml_Utf 8 - Fatal编程技术网

转换XML无效UTF-8 XMLOutputter Java

转换XML无效UTF-8 XMLOutputter Java,java,xml,utf-8,Java,Xml,Utf 8,我已经看到了关于同一个问题的其他问题,但是我仍然得到了一个错误。Hier是我试图修改exosting xml文件的代码的一小部分。但它会修改文本中的一些字符 import org.jdom2.Document; import org.jdom2.JDOMException; import org.jdom2.input.SAXBuilder; import org.jdom2.output.Format; import org.jdom2.output.XMLOutputter; import

我已经看到了关于同一个问题的其他问题,但是我仍然得到了一个错误。Hier是我试图修改exosting xml文件的代码的一小部分。但它会修改文本中的一些字符

import org.jdom2.Document;
import org.jdom2.JDOMException;
import org.jdom2.input.SAXBuilder;
import org.jdom2.output.Format;
import org.jdom2.output.XMLOutputter;
import java.io.FileOutputStream;
import java.io.IOException;

public class ModyfyXml {

public static void main(String[] args) throws JDOMException, IOException {

    try {

        SAXBuilder sax = new SAXBuilder();
        Document doc = sax.build("F:\\c\\test.xml");

        XMLOutputter xmlOutput = new XMLOutputter();
        Format format = Format.getPrettyFormat();
        format.setEncoding("UTF-8");
        xmlOutput.setFormat(format);
        xmlOutput.output(doc, (new FileOutputStream("F:\\c\\test2.xml")));

    }catch (IOException io) {
        io.printStackTrace();
    } catch (JDOMException e) {
        e.printStackTrace();
    }
}}
在本例中,我尝试修改一个小xml文件,只需复制

<?xml version="1.0" encoding="utf-8"?><page>
 䕶法喇嘛所居此處Dang I never noticed this bug in JDOM 2.

You will have the same results with any non-BMP character. You can try with the emoji mania of these latest years and see you get the same results.

It happens because of the escape strategy automatically set for UTF-whatever encodings. What it does is rather wrong.

That will be fixed if you replace the strategy with one that doesn't escape anything beside XML reserved chars:

format.setEscapeStrategy((c) -> false);

䕶法喇嘛所居此處 该死,我从来没有在JDOM 2中注意到这个bug

对于任何非BMP字符,您将获得相同的结果。你可以尝试一下最近几年的表情符号狂潮,看到同样的结果

发生这种情况是因为UTF编码自动设置了转义策略。它所做的是相当错误的

如果将策略替换为除XML保留字符外不转义任何内容的策略,则该问题将得到解决: