Java 删除XML字符元素中重复的换行符/制表符/空格_Java_Xml_Parsing_Sax

Java 删除XML字符元素中重复的换行符/制表符/空格

java xml parsing

Java 删除XML字符元素中重复的换行符/制表符/空格,java,xml,parsing,sax,Java,Xml,Parsing,Sax,有什么办法可以满足我的要求吗谢谢第一部分-替换多个空格-相对容易，但我不认为解析器会帮您： dbf = DocumentBuilderFactory.newInstance(); dbf.setIgnoringComments( true ); dbf.setNamespaceAware( namespaceAware ); db = dbf.newDocumentBuilder(); doc = db.parse( inputStream ); InputSource

有什么办法可以满足我的要求吗

谢谢

第一部分-替换多个空格-相对容易，但我不认为解析器会帮您：

  dbf = DocumentBuilderFactory.newInstance();
  dbf.setIgnoringComments( true );
  dbf.setNamespaceAware( namespaceAware );
  db = dbf.newDocumentBuilder();
  doc = db.parse( inputStream );

InputSource stream=新的InputSource（inputStream）；
XPath=XPathFactory.newInstance（）.newXPath（）；
Document doc=（Document）xpath.evaluate（“/”，stream，XPathConstants.NODE）；
NodeList节点=（NodeList）xpath.evaluate（“//text（）”，doc，
XPathConstants.NODESET）；
对于（int i=0；i


这是最难的部分：
如果节点包含XML编码的字符：制表符（&&x9；
）、换行符（&&xA；
）或空格（&&20；
）-它们应该保留
解析器总是将“	；”
转换为“\t”
——您可能需要编写自己的XML解析器
作者：
我认为任何XML解析器都不会向应用程序报告数字字符引用——它们总是会被扩展的。实际上，您的应用程序不应该关心这一点，就像它关心属性之间有多少空白一样
尝试添加此行dbf.setIgnoringElementContentWhitespace（true）不幸的是，这不起作用。此属性控制如何处理非文本元素中的空白
  dbf = DocumentBuilderFactory.newInstance();
  dbf.setIgnoringComments( true );
  dbf.setNamespaceAware( namespaceAware );
  db = dbf.newDocumentBuilder();
  doc = db.parse( inputStream );

InputSource stream = new InputSource(inputStream);
XPath xpath = XPathFactory.newInstance().newXPath();
Document doc = (Document) xpath.evaluate("/", stream, XPathConstants.NODE);

NodeList nodes = (NodeList) xpath.evaluate("//text()", doc,
    XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
  Text text = (Text) nodes.item(i);
  text.setTextContent(text.getTextContent().replaceAll("\\s{2,}", " "));
}

// check results
TransformerFactory.newInstance()
    .newTransformer()
    .transform(new DOMSource(doc), new StreamResult(System.out));