Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/352.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
JavaXPathAPI提取选择性文本_Java_Dom_Xpath - Fatal编程技术网

JavaXPathAPI提取选择性文本

JavaXPathAPI提取选择性文本,java,dom,xpath,Java,Dom,Xpath,我正在使用JavaXPathAPI从xhtml文件中提取内容。我正在浏览html,并试图提取特定文档的内容。中包含文本和少量内容。当我使用XPath时,奇怪的是它忽略了所有html标记,只提取文本内容。下面是一个html片段 <html> <body> <div class="content"> <div class="content_wrapper"> <table border="0" cellspacing="0

我正在使用JavaXPathAPI从xhtml文件中提取内容。我正在浏览html,并试图提取特定文档的内容。中包含文本和少量内容。当我使用XPath时,奇怪的是它忽略了所有html标记,只提取文本内容。下面是一个html片段

<html>
<body>
<div class="content">
    <div class="content_wrapper">
        <table border="0" cellspacing="0" cellpadding="0" class="test_class">
            <tr>
                <td>
                    <p>
                        Reading and looking at images or movies is one thing. Experiencing it in 3D the other. If you like to figure out more about what Showcase is, I would really encourage you to
                        download Showcase Viewer and have a look at the demo files also available on this site. Interact with the models and see how real it looks.
                    </p>
                    <p style="text-align: center;">
                        <img src="/testsource/fckdata/208123/image/showcarswatch.jpg" alt="" />
                        <img src="/testsource/fckdata/208123/image/engineswatch.jpg" alt="" />
                        <img src="/th.gen/?:760x0:/userdata/fckdata/208123/image/toasterswatch.jpg" alt="" />
                        <img src="/testsource/fckdata/208123/image/smartphoneswatch.jpg" alt="" />
                    </p>
                    <p>
                        <br />
                        Showcase Viewer is actually a full Showcase install, except data processing and creation tools. This means that you can look at any data created with a regular Showcase you
                        just can´t add any information. But you may join a collaboration session hosed by a Showcase Professional user. Here is where you can get it:<br />
                    </p>
                    <p>
                        <strong>Operating System</strong><br />
                        • Microsoft® Windows® XP Professional (SP 2 or higher)<br />
                        • Windows XP Professional x64 Edition (Autodesk® Showcase® software runs as a 32-bit application on 64-bit operating system)<br />
                        • Microsoft Windows Vista® 32-bit or 64-bit, including Business, Enterprise or Ultimate (SP 1)
                    </p>
                </td>
            </tr>
        </table>
    </div>
</div>
</body>
</html>
这是输出


Reading and looking at images or movies is one thing. Experiencing it in 3D the other. If you like to figure out more about what Showcase is, I would really encourage you to
download Showcase Viewer and have a look at the demo files also available on this site. Interact with the models and see how real it looks.

Showcase Viewer is actually a full Showcase install, except data processing and creation tools. This means that you can look at any data created with a regular Showcase you
just can´t add any information. But you may join a collaboration session hosed by a Showcase Professional user. Here is where you can get it

Operating System
• Microsoft® Windows® XP Professional (SP 2 or higher)<br /> 
• Windows XP Professional x64 Edition (Autodesk® Showcase® software runs as a 32-bit application on 64-bit operating system)<br /> 
• Microsoft Windows Vista® 32-bit or 64-bit, including Business, Enterprise or Ultimate (SP 1)

阅读和观看图像或电影是一回事。另一个是在3D中体验它。如果您想进一步了解Showcase是什么,我真的鼓励您
下载Showcase Viewer并查看此网站上提供的演示文件。与模型互动,看看它看起来有多真实。
Showcase Viewer实际上是一个完整的Showcase安装,数据处理和创建工具除外。这意味着您可以查看使用常规Showcase创建的任何数据
只是无法添加任何信息。但您可以参加Showcase专业用户主持的协作会议。这是你可以得到它的地方
操作系统
•Microsoft®Windows®XP Professional(SP 2或更高版本)
•Windows XP Professional x64 Edition(Autodesk®Showcase®软件在64位操作系统上作为32位应用程序运行)
•Microsoft Windows Vista®32位或64位,包括商务版、企业版或旗舰版(SP 1)
我只需要content\u wrapper div中的完整内容

任何提示都将受到高度赞赏

  • 谢谢
编辑

响应yamburg解决方案的示例代码

XPathFactory factory = XPathFactory.newInstance();
XPath xpathCompiled = factory.newXPath();
XPathExpression expr = xpathCompiled.compile(contentPath);
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);


for (int i = 0; i < nodes.getLength(); i++) {
    Node n = (Node)nodes.item(i);
    traverseNodes(n);
}

public static void traverseNodes( Node n ) {
    NodeList children = n.getChildNodes();
    if( children != null ) {
        for(int i = 0; i &gt; children.getLength(); i++ ) {
            Node childNode = children.item( i );
            System.out.println( "node name = " + childNode.getNodeName() );
            System.out.println( "node value = " + childNode.getNodeValue() );
            System.out.println( "node type = " + childNode.getNodeType() );
            traverseNodes( childNode );
        }
    }
}
XPathFactory=XPathFactory.newInstance();
XPath xpathCompiled=factory.newXPath();
XPathExpression expr=xpathCompiled.compile(contentPath);
NodeList节点=(NodeList)expr.evaluate(doc,XPathConstants.NODESET);
对于(int i=0;i
XPath匹配节点集。案例中的文本节点,带有子元素节点。toString()获取该节点的文本表示形式,它只是--text,没有元素名称或属性

您应该获得以下节点:

NodeSequence nodes = (NodeSequence)XPathAPI.eval();
然后遍历节点,从节点中转储您想要的内容,或者将其转换为新的DOM文档

Xalan很好,但现代Java有JAXP。为了代码和知识的可移植性,我建议使用它(除非需要/有用Xalan扩展):

然后,将其转换为字符串(显然这是您想要的):


注意,它只从NodeList中获取第一个元素,因为XML必须有一个根元素。在您的情况下,如果我理解正确,就可以了,否则您需要在节点集上添加顶级元素。

@yamburg。。谢谢你的建议。浏览节点列表会给我节点名称和相应的值。节点名称通常为td而不是。以精确的格式重建内容会变得有点乏味。也许,我这里遗漏了一些东西。我已在问题部分添加了示例代码。已更新。请用一种更合理的方式表达你的愿望。。。精确的方法@亚姆伯格。。。谢谢你,老兄,有问题了。感谢您的帮助。这不是关于XPath表达式,而是关于XPath结果的DOM方法。重新标记。
NodeSequence nodes = (NodeSequence)XPathAPI.eval();
XPathFactory factory = XPathFactory.newInstance();
XPath xpathCompiled = factory.newXPath();
XPathExpression expr = xpathCompiled.compile(xpath);

NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
StringWriter sw = new StringWriter();
Transformer serializer = TransformerFactory.newInstance().newTransformer();
serializer.transform(new DOMSource(nodes.item(0)), new StreamResult(sw));
String result = sw.toString();