Java XML转换失败

Java XML转换失败,java,xml,xslt,encoding,utf-8,Java,Xml,Xslt,Encoding,Utf 8,我正在使用XML transformer以某种方式将一个XML转换为另一个XML。一些非英语字符转换失败 原始xml: <?xml version="1.0" encoding="UTF-8"?> <RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0 xmlns:RR_KeyPersonExpanded_2_0="http://apply.grants.gov/forms/RR_KeyPersonExpanded_2_0-V2.0

我正在使用XML transformer以某种方式将一个XML转换为另一个XML。一些非英语字符转换失败

原始xml:

<?xml version="1.0" encoding="UTF-8"?>
<RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0 xmlns:RR_KeyPersonExpanded_2_0="http://apply.grants.gov/forms/RR_KeyPersonExpanded_2_0-V2.0" xmlns:att="http://apply.grants.gov/system/Attachments-V1.0" xmlns:glob="http://apply.grants.gov/system/Global-V1.0" xmlns:globLib="http://apply.grants.gov/system/GlobalLibrary-V2.0" RR_KeyPersonExpanded_2_0:FormVersion="2.0">
   <RR_KeyPersonExpanded_2_0:KeyPerson>
      <RR_KeyPersonExpanded_2_0:Profile>
         <RR_KeyPersonExpanded_2_0:Name>
            <globLib:PrefixName>候.</globLib:PrefixName>
            <globLib:FirstName>Lakshmi</globLib:FirstName>
            <globLib:MiddleName>AB</globLib:MiddleName>
            <globLib:LastName>Sørensen</globLib:LastName>
         </RR_KeyPersonExpanded_2_0:Name>
      </RR_KeyPersonExpanded_2_0:Profile>
   </RR_KeyPersonExpanded_2_0:KeyPerson>
</RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0>
输出xml:

<RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0 xmlns:RR_KeyPersonExpanded_2_0="http://apply.grants.gov/forms/RR_KeyPersonExpanded_2_0-V2.0" xmlns:att="http://apply.grants.gov/system/Attachments-V1.0" xmlns:glob="http://apply.grants.gov/system/Global-V1.0" xmlns:globLib="http://apply.grants.gov/system/GlobalLibrary-V2.0" RR_KeyPersonExpanded_2_0:FormVersion="2.0">
<RR_KeyPersonExpanded_2_0:KeyPerson>
<RR_KeyPersonExpanded_2_0:Profile>
<RR_KeyPersonExpanded_2_0:Name>
<globLib:PrefixName>候.</globLib:PrefixName>
<globLib:FirstName>Lakshmi</globLib:FirstName>
<globLib:MiddleName>AB</globLib:MiddleName>
<globLib:LastName>Sørensen</globLib:LastName>
</RR_KeyPersonExpanded_2_0:Name>
</RR_KeyPersonExpanded_2_0:Profile>
</RR_KeyPersonExpanded_2_0:KeyPerson>
</RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0>

候.
拉克希米
AB
斯伦森

正如你所看到的,“非英语单词”变成了一堆废话。我尝试将xslt中的编码更改为“UTF-16”,但它不起作用。这里有人遇到过同样的问题吗

要获得这么多奇怪的字符,您似乎有多个编码问题

首先,将XML读入
XML
字符串时(代码未显示)。在这一点上没有什么帮助,因为我们不知道您是如何出错的,尽管您可能忘记了指定
UTF-8
编码

public String removeEmptyTags(String xml) {
    try (StringWriter out = new StringWriter()) {
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        StreamSource inputXMLSource = new StreamSource(new ByteArrayInputStream(xml.getBytes("UTF-8")));
        StreamSource xsltSource = new StreamSource(getClass().getClassLoader().getResourceAsStream("removeemptytags.xsl"));
        Transformer transformer = transformerFactory.newTransformer(xsltSource);

        transformer.transform(inputXMLSource, new StreamResult(out));
        return out.toString();
    } catch (Exception e) {
        logger.log(Level.SEVERE, "Exception while removing empty tags : ", e);
        throw new ParsingException(e.getMessage());
    }
}
第二,调用
bos.toString()
时。如果希望结果为
字符串
,请不要使用
输出流
。使用
StringWriter
(请参阅下面的代码)

第三,将字符串写入文件时(代码未显示)。同样,这一点也帮不上什么忙,因为我们不知道您是如何做到的,尽管您可能忘记了指定
UTF-8
编码

public String removeEmptyTags(String xml) {
    try (StringWriter out = new StringWriter()) {
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        StreamSource inputXMLSource = new StreamSource(new ByteArrayInputStream(xml.getBytes("UTF-8")));
        StreamSource xsltSource = new StreamSource(getClass().getClassLoader().getResourceAsStream("removeemptytags.xsl"));
        Transformer transformer = transformerFactory.newTransformer(xsltSource);

        transformer.transform(inputXMLSource, new StreamResult(out));
        return out.toString();
    } catch (Exception e) {
        logger.log(Level.SEVERE, "Exception while removing empty tags : ", e);
        throw new ParsingException(e.getMessage());
    }
}
实际上,最好是直接从/到文件,并让XML库找出编码:

public void removeEmptyTags(Path inFile, Path outFile) {
    try (InputStream in = Files.newInputStream(inFile);
         OutputStream out = Files.newOutputStream(outFile)) {
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        StreamSource xsltSource = new StreamSource(getClass().getClassLoader().getResourceAsStream("removeemptytags.xsl"));
        Transformer transformer = transformerFactory.newTransformer(xsltSource);

        transformer.transform(new StreamSource(in), new StreamResult(out));
    } catch (Exception e) {
        logger.log(Level.SEVERE, "Exception while removing empty tags : ", e);
        throw new ParsingException(e.getMessage());
    }
}

要获得这么多奇怪的字符,您似乎有多个编码问题

首先,将XML读入
XML
字符串时(代码未显示)。在这一点上没有什么帮助,因为我们不知道您是如何出错的,尽管您可能忘记了指定
UTF-8
编码

public String removeEmptyTags(String xml) {
    try (StringWriter out = new StringWriter()) {
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        StreamSource inputXMLSource = new StreamSource(new ByteArrayInputStream(xml.getBytes("UTF-8")));
        StreamSource xsltSource = new StreamSource(getClass().getClassLoader().getResourceAsStream("removeemptytags.xsl"));
        Transformer transformer = transformerFactory.newTransformer(xsltSource);

        transformer.transform(inputXMLSource, new StreamResult(out));
        return out.toString();
    } catch (Exception e) {
        logger.log(Level.SEVERE, "Exception while removing empty tags : ", e);
        throw new ParsingException(e.getMessage());
    }
}
第二,调用
bos.toString()
时。如果希望结果为
字符串
,请不要使用
输出流
。使用
StringWriter
(请参阅下面的代码)

第三,将字符串写入文件时(代码未显示)。同样,这一点也帮不上什么忙,因为我们不知道您是如何做到的,尽管您可能忘记了指定
UTF-8
编码

public String removeEmptyTags(String xml) {
    try (StringWriter out = new StringWriter()) {
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        StreamSource inputXMLSource = new StreamSource(new ByteArrayInputStream(xml.getBytes("UTF-8")));
        StreamSource xsltSource = new StreamSource(getClass().getClassLoader().getResourceAsStream("removeemptytags.xsl"));
        Transformer transformer = transformerFactory.newTransformer(xsltSource);

        transformer.transform(inputXMLSource, new StreamResult(out));
        return out.toString();
    } catch (Exception e) {
        logger.log(Level.SEVERE, "Exception while removing empty tags : ", e);
        throw new ParsingException(e.getMessage());
    }
}
实际上,最好是直接从/到文件,并让XML库找出编码:

public void removeEmptyTags(Path inFile, Path outFile) {
    try (InputStream in = Files.newInputStream(inFile);
         OutputStream out = Files.newOutputStream(outFile)) {
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        StreamSource xsltSource = new StreamSource(getClass().getClassLoader().getResourceAsStream("removeemptytags.xsl"));
        Transformer transformer = transformerFactory.newTransformer(xsltSource);

        transformer.transform(new StreamSource(in), new StreamResult(out));
    } catch (Exception e) {
        logger.log(Level.SEVERE, "Exception while removing empty tags : ", e);
        throw new ParsingException(e.getMessage());
    }
}

您是否将输出的编码设置为UTF-8?您是否将输出的编码设置为UTF-8?您是正确的!!。我做了多次编码。在输出中,我需要结果作为字符串。我只是简单地使用byte[]b=StringUtils.toBytesUTF8(filteredXML)进行编码。你是对的!!。我做了多次编码。在输出中,我需要结果作为字符串。我只是简单地使用byte[]b=StringUtils.toBytesUTF8(filteredXML)进行编码。