在Java中从XML中去除空白和换行符_Java_Xml_Newline_Cdata_Strip

在Java中从XML中去除空白和换行符

java xml

在Java中从XML中去除空白和换行符,java,xml,newline,cdata,strip,Java,Xml,Newline,Cdata,Strip,使用Java，我希望采用以下格式的文档： <tag1> <tag2> <![CDATA[ Some data ]]> </tag2> </tag1> 递归遍历文档。删除包含空白内容的所有文本节点。修剪包含非空白内容的任何文本节点 public static void trimWhitespace(Node node) { NodeList children = node.getChildNodes();

使用Java，我希望采用以下格式的文档：

<tag1>
 <tag2>
    <![CDATA[  Some data ]]>
 </tag2>
</tag1>

递归遍历文档。删除包含空白内容的所有文本节点。修剪包含非空白内容的任何文本节点

public static void trimWhitespace(Node node)
{
    NodeList children = node.getChildNodes();
    for(int i = 0; i < children.getLength(); ++i) {
        Node child = children.item(i);
        if(child.getNodeType() == Node.TEXT_NODE) {
            child.setTextContent(child.getTextContent().trim());
        }
        trimWhitespace(child);
    }
}

publicstaticvoidtrimmwhitespace（节点）
{
NodeList childrends=node.getChildNodes（）；
for（int i=0；i

试试这段代码read和

write

方法忽略空格和缩进

try {
    File f1 = new File("source.xml");
    File f2 = new File("destination.xml");
    InputStream in = new FileInputStream(f1);  
    OutputStream out = new FileOutputStream(f2);

    byte[] buf = new byte[1024];
    int len;
    while ((len = in.read(buf)) > 0){
    out.write(buf, 0, len);
}
in.close();
out.close();
System.out.println("File copied.");
} catch(FileNotFoundException ex){
    System.out.println(ex.getMessage() + " in the specified directory.");
    System.exit(0);
} catch(IOException e7){
    System.out.println(e7.getMessage());  
}

按照@Luiggi Mendoza在问题评论中的说明制定工作方案

public static String trim(String input) {
    BufferedReader reader = new BufferedReader(new StringReader(input));
    StringBuffer result = new StringBuffer();
    try {
        String line;
        while ( (line = reader.readLine() ) != null)
            result.append(line.trim());
        return result.toString();
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

如中所述，相关函数可能是，但正如这里已经指出的，该函数需要使用验证解析器，它需要XML模式或类似的模式

因此，您最好遍历从解析器获得的文档，并删除所有TEXT_NODE类型的节点（或那些只包含空格的TEXT_节点）。

Java8+transformer只创建Java10+transformer的空行。我还是想保留一个漂亮的缩进。这是我的助手函数，用于从任何DomeElement实例（如

doc.getDocumentElement（）

root节点）创建xml字符串

public static String createXML(Element elem) throws Exception {
        DOMSource source = new DOMSource(elem);
        StringWriter writer = new StringWriter();
        StreamResult result = new StreamResult(writer);
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        //transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
        //transformer.setOutputProperty("http://www.oracle.com/xml/is-standalone", "yes");
        transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC,"yes");
        transformer.setOutputProperty("http://www.oracle.com/xml/is-standalone", "yes");
        transformer.transform(source, result);

        // Java10-transformer adds unecessary empty lines, remove empty lines
        BufferedReader reader = new BufferedReader(new StringReader(writer.toString()));
        StringBuilder buf = new StringBuilder();
        try {
            final String NL = System.getProperty("line.separator", "\r\n");
            String line;
            while( (line=reader.readLine())!=null ) {
                if (!line.trim().isEmpty()) {
                    buf.append(line); 
                    buf.append(NL);
                }
            }
        } finally {
            reader.close();
        }
        return buf.toString();  //writer.toString();
    }

我支持@jtahlborn的回答。为了完整起见，我修改了他的解决方案，以完全删除只包含空格的元素，而不只是清除它们

public static void stripEmptyElements(Node node)
{
    NodeList children = node.getChildNodes();
    for(int i = 0; i < children.getLength(); ++i) {
        Node child = children.item(i);
        if(child.getNodeType() == Node.TEXT_NODE) {
            if (child.getTextContent().trim().length() == 0) {
                child.getParentNode().removeChild(child);
                i--;
            }
        }
        stripEmptyElements(child);
    }
}

publicstaticvoidstripeptyElements（节点）
{
NodeList childrends=node.getChildNodes（）；
for（int i=0；i

您可以将其视为文本文件，使用BufferedReader打开，读取每一行并将其修剪值保存在StringBuilder中，然后使用BufferedWriter将文件与StingBuilder的内容一起保存。如果您愿意使用Xerces-J之类的工具，您可以使用OutputFormat不漂亮地打印结果：顺便说一句，

setIgnoringElementContentWhitespace

无效的原因是您必须使用XML架构/DTD验证，以便解析器知道哪些空白是可忽略的。@LuiggiMendoza-您不应该像这样手动编辑XML数据。“你只是要求把事情搞砸。”我测试了你的解决方案。不幸的是，它也不起作用。这会删除节点中的空格-示例中的文本中没有空格nodes@Mark-事实上是的。“tag2”的内容包括前导的换行符和空格，以及尾随的换行符和空格。WOOOW代码质量与反编译一样，代码完全不带空格。请注意，在使用finally或Java 8后，应关闭BufferedReader。@RikH这会使代码分析工具看起来更干净，但在这种情况下（StringReader和BufferedReader），没有真正的资源可供释放。您只需将null分配给立即符合垃圾收集条件的对象的某些字段。话虽如此，我也会将其关闭。正如在问题的评论中提到的，如果CDATA部分包含应保留的换行符，则此解决方案将被破坏。非常好。这正是我想要的eded也是…答案是，并演示了如何做到这一点。

public static String createXML(Element elem) throws Exception {
        DOMSource source = new DOMSource(elem);
        StringWriter writer = new StringWriter();
        StreamResult result = new StreamResult(writer);
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        //transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
        //transformer.setOutputProperty("http://www.oracle.com/xml/is-standalone", "yes");
        transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC,"yes");
        transformer.setOutputProperty("http://www.oracle.com/xml/is-standalone", "yes");
        transformer.transform(source, result);

        // Java10-transformer adds unecessary empty lines, remove empty lines
        BufferedReader reader = new BufferedReader(new StringReader(writer.toString()));
        StringBuilder buf = new StringBuilder();
        try {
            final String NL = System.getProperty("line.separator", "\r\n");
            String line;
            while( (line=reader.readLine())!=null ) {
                if (!line.trim().isEmpty()) {
                    buf.append(line); 
                    buf.append(NL);
                }
            }
        } finally {
            reader.close();
        }
        return buf.toString();  //writer.toString();
    }

public static void stripEmptyElements(Node node)
{
    NodeList children = node.getChildNodes();
    for(int i = 0; i < children.getLength(); ++i) {
        Node child = children.item(i);
        if(child.getNodeType() == Node.TEXT_NODE) {
            if (child.getTextContent().trim().length() == 0) {
                child.getParentNode().removeChild(child);
                i--;
            }
        }
        stripEmptyElements(child);
    }
}