Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/391.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/73.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/mysql/66.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 使用XPath检索html标签的内容_Java_Html_Xpath - Fatal编程技术网

Java 使用XPath检索html标签的内容

Java 使用XPath检索html标签的内容,java,html,xpath,Java,Html,Xpath,我有以下html代码: <div id="ipsLayout_contentArea"> <div class="preContentPadding"> <div id="ipsLayout_contentWrapper"> <div id="ipsLayout_mainArea"> <a id="elContent"></a> <div class="cWidgetContainer " data-widgetare

我有以下html代码:

<div id="ipsLayout_contentArea">
<div class="preContentPadding">
<div id="ipsLayout_contentWrapper">
<div id="ipsLayout_mainArea">
<a id="elContent"></a>
<div class="cWidgetContainer " data-widgetarea="header" data-orientation="horizontal" data-role="widgetReceiver" data-controller="core.front.widgets.area">
<div class="ipsPageHeader ipsClearfix">
<div class="ipsClearfix">
<div class="cTopic ipsClear ipsSpacer_top" data-feedid="topic-100269" data-lastpage="" data-baseurl="https://forum.com/forum/topic/100269-topic/" data-autopoll="" data-controller="core.front.core.commentFeed,forums.front.topic.view">
<div class="" data-controller="core.front.core.moderation" data-role="commentFeed">
<form data-role="moderationTools" data-ipspageaction="" method="post" action="https://forum.com/forum/topic/100269-topic/?csrfKey=b092dccccee08fdbc06c26d350bf3c2b&do=multimodComment">
<a id="comment-626016"></a>
<article id="elComment_626016" class="cPost ipsBox ipsComment ipsComment_parent ipsClearfix ipsClear ipsColumns ipsColumns_noSpacing ipsColumns_collapsePhone " itemtype="http://schema.org/Comment" itemscope="">
<aside class="ipsComment_author cAuthorPane ipsColumn ipsColumn_medium">
<div class="ipsColumn ipsColumn_fluid">
<div id="comment-626016_wrap" class="ipsComment_content ipsType_medium ipsFaded_withHover" data-quotedata="{"userid":3859,"username":"Admin","timestamp":1453221383,"contentapp":"forums","contenttype":"forums","contentid":100269,"contentclass":"forums_Topic","contentcommentid":626016}" data-commentid="626016" data-commenttype="forums" data-commentapp="forums" data-controller="core.front.core.comment">
<div class="ipsComment_meta ipsType_light">
<div class="cPost_contentWrap ipsPad">
<div class="ipsType_normal ipsType_richText ipsContained" data-controller="core.front.core.lightboxedImages" itemprop="text" data-role="commentContent">
<p> Hi, </p>
<p>   </p>
<p> This is a post with multiple </p>
<p> lines of text </p>
检索每篇文章的每一行(由
分隔)。我怎样才能得到这篇文章的全部内容(内:

检索包含文章正文的div

// forumTemplate.getXpathElements().get(forumTemplate.XPATH_GET_THREAD_POSTS) = //div[@id='ipsLayout_contentArea']/div[2]/div/div[4]/div/form/article/div/div/div[2]/div
List<DomNode> posts = (List<DomNode>) firstPage.getByXPath(forumTemplate.getXpathElements().get(forumTemplate.XPATH_GET_THREAD_POSTS));
                for (DomNode post : posts) {
                    // Retrieve the contents of the post as a string
                    String postContentStr = post.getNodeValue();
//forumTemplate.getxpathements().get(forumTemplate.XPATH\u get\u THREAD\u POSTS)=//div[@id='ipsLayout\u contentArea']/div[2]/div/div/div/div/form/article/div/div/div/div[2]/div
List posts=(List)firstPage.getByXPath(forumTemplate.getXpathElements().get(forumTemplate.XPATH\u get\u THREAD\u posts));
for(DomNode post:posts){
//以字符串形式检索帖子的内容
字符串postContentStr=post.getNodeValue();
变量
postContentStr
始终为空。为什么?

您指定的
//text()
,将递归获取指定路径下的所有文本节点。根据您使用的内容,这可能会更好:

//div[@data-role='commentContent']

这将匹配您试图获取的注释节点。如果您使用代码进行求值,您可以从这里开始。但是,不要匹配
text()
,这将不匹配任何
标记。

这不能仅在XPath中完成。让您的XPath选择
div
,并从java中获取
div
的内容作为文本(尽管对java部分没有帮助)我可以将div作为dom节点获取,但无法获取其值(它下面的所有标签)。我不希望它以明文形式呈现,只希望它的内容(它可能包含的所有标签都以文本形式读取,java中的字符串)。该文档是一个html页面,不是xml。它是html,但在您使用xpath处理它并构建dom树时也是xml。因此,据我所见,您正在用html构建一个dom树,然后匹配此dom中的特定节点。现在,您正试图将dom子树恢复为html。关键是,xpath对“文本”不起作用虽然我知道这是你最终想要的。
//div[@id='ipsLayout_contentArea']/div[2]/div/div[4]/div/form/article/div/div/div[2]/div
// forumTemplate.getXpathElements().get(forumTemplate.XPATH_GET_THREAD_POSTS) = //div[@id='ipsLayout_contentArea']/div[2]/div/div[4]/div/form/article/div/div/div[2]/div
List<DomNode> posts = (List<DomNode>) firstPage.getByXPath(forumTemplate.getXpathElements().get(forumTemplate.XPATH_GET_THREAD_POSTS));
                for (DomNode post : posts) {
                    // Retrieve the contents of the post as a string
                    String postContentStr = post.getNodeValue();
//div[@data-role='commentContent']