解析XML并在不绑定名称空间的情况下获取DOM树-Java

解析XML并在不绑定名称空间的情况下获取DOM树-Java,java,xml,dom,xslt,sax,Java,Xml,Dom,Xslt,Sax,我有一个类似XML的文件: <p>something</p> <ac:image> <ri:attachment ri:filename="IMAGE.PNG" /> </ac:image> <ac:macro ac:name="screenshot"> <ac:default-parameter>IMAGE.ss</ac:default-parameter> </ac:macr

我有一个类似XML的文件:

<p>something</p>
<ac:image>
    <ri:attachment ri:filename="IMAGE.PNG" />
</ac:image>
<ac:macro ac:name="screenshot">
    <ac:default-parameter>IMAGE.ss</ac:default-parameter>
</ac:macro>
<p>something</p>
示例XML包含三个名称空间—默认、
ac
ri
。由于代码将在客户指定的内容上运行,因此可能还有一些我不知道的名称空间。在解析XML之前,我无法绑定所有名称空间,因此遇到了一个异常:

Caused by: org.jdom.input.JDOMParseException: Error on line 1: Content is not allowed in prolog.
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
Caused by: org.jdom.IllegalNameException: The name "" is not legal for JDOM/XML elements: XML names cannot be null or empty.
    at org.jdom.Element.setName(Element.java:206)
    at org.jdom.Element.<init>(Element.java:140)
    at org.jdom.Element.<init>(Element.java:152)
    at org.jdom.DefaultJDOMFactory.element(DefaultJDOMFactory.java:138)
    at org.jdom.input.SAXHandler.startElement(SAXHandler.java:511)
    at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
    at org.apache.xerces.impl.dtd.XMLDTDValidator.startElement(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentScannerImpl$ContentDispatcher.scanRootElementHook(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at org.jdom.input.SAXBuilder.build(SAXBuilder.java:453)
    at org.jdom.input.SAXBuilder.build(SAXBuilder.java:770)
    at com.screensnipe.confluence.macro.XhtmlImageMacroReplacer.replaceImageMacroInText(XhtmlImageMacroReplacer.java:118)
我在互联网上的某个地方发现,SAX解析器能够在不解析名称空间的模式下解析XML。在默认模式下,您将获得
namespace=ac
element=macro
,而在非名称空间模式下,您将不获得名称空间和
element=ac:macro
。这是需要的。您只需要在解析器上设置SAX特性:
namespaces=false
名称空间前缀=true

final XMLReader sax = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
sax.setFeature("http://xml.org/sax/features/validation", false);
sax.setFeature("http://xml.org/sax/features/namespaces", false);
sax.setFeature("http://xml.org/sax/features/namespace-prefixes", true);
sax.parse(new InputSource(new StringReader(content))); // parse returns void
它不会抛出任何异常,因此看起来XML解析时没有错误。但是,我需要一个DOM树,以便可以使用XSLT对其进行转换。那么让我们使用JDOM:

// all classes are org.jdom.*
final SAXBuilder sax = new SAXBuilder(false); // validate=false
sax.setFeature("http://xml.org/sax/features/namespaces", false);
sax.setFeature("http://xml.org/sax/features/namespace-prefixes", true);
final Document document = sax.build(new StringInputStream(content));
不幸的是,我遇到了一个例外:

Caused by: org.jdom.input.JDOMParseException: Error on line 1: Content is not allowed in prolog.
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
Caused by: org.jdom.IllegalNameException: The name "" is not legal for JDOM/XML elements: XML names cannot be null or empty.
    at org.jdom.Element.setName(Element.java:206)
    at org.jdom.Element.<init>(Element.java:140)
    at org.jdom.Element.<init>(Element.java:152)
    at org.jdom.DefaultJDOMFactory.element(DefaultJDOMFactory.java:138)
    at org.jdom.input.SAXHandler.startElement(SAXHandler.java:511)
    at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
    at org.apache.xerces.impl.dtd.XMLDTDValidator.startElement(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentScannerImpl$ContentDispatcher.scanRootElementHook(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at org.jdom.input.SAXBuilder.build(SAXBuilder.java:453)
    at org.jdom.input.SAXBuilder.build(SAXBuilder.java:770)
    at com.screensnipe.confluence.macro.XhtmlImageMacroReplacer.replaceImageMacroInText(XhtmlImageMacroReplacer.java:118)
问题是:如何从我的XML中获取DOM树?(Java)而不编写我的JDOM版本。我希望有一个有效的解决办法。只需解析并获取DOM树。名称空间不会像TagSoup库那样被破坏的树


或者更注重目标的问题:如何用
替换
,而不触碰其他标签?(Java)
所有其他标记、名称空间或任何内容都不应受到影响。(不要建议使用任何regexp)

如果您愿意进行预处理,比如添加周围的根元素,那么您还可以在XML文件中查找名称空间前缀,并将每个前缀的伪声明添加到您要添加的根元素中


这样,您就不需要一个可以告诉您不要解析名称空间前缀的解析器。

Hmmm。。。这不是一个完美的方法,但它非常简单,可以完成这些任务。谢谢你,投了赞成票。如果没有更好的答案,我会接受你的回答。:)