使用PHP和XPATH,如何获取最接近的'h3'的内容? 使用PHP和XPATH,如何正确获取包含斯托克城与水晶宫比赛日期的最接近的h3标记的文本?(如10月4日星期六) 本质上,我是在寻找比赛日期,我的输入是主队和客队
HTML代码片段清单4(剩余321场)2014/15赛季英超联赛足球比赛使用PHP和XPATH,如何获取最接近的'h3'的内容? 使用PHP和XPATH,如何正确获取包含斯托克城与水晶宫比赛日期的最接近的h3标记的文本?(如10月4日星期六) 本质上,我是在寻找比赛日期,我的输入是主队和客队,php,html,xpath,domdocument,Php,Html,Xpath,Domdocument,HTML代码片段清单4(剩余321场)2014/15赛季英超联赛足球比赛 <div class="fixtures"> <h3>Monday 29 September</h3> <dl class="matches"> <dt class="match"> <span class="match-time">20:00</span> &
<div class="fixtures">
<h3>Monday 29 September</h3>
<dl class="matches">
<dt class="match">
<span class="match-time">20:00</span>
<span class="home-side">
<span>
<img src="http://omo.akamai.opta.net/image.php?&sport=football&entity=team&description=badges&dimensions=20&id=110" alt="Stoke City">
</span>
<a href="http://www.dailymail.co.uk/sport/teampages/stoke-city.html">Stoke City</a>
</span>
<span>
<span> </span>
<span>vs</span>
<span> </span>
</span>
<span class="away-side">
<span>
<img src="http://omo.akamai.opta.net/image.php?&sport=football&entity=team&description=badges&dimensions=20&id=4" alt="Newcastle United">
</span>
<a href="http://www.dailymail.co.uk/sport/teampages/newcastle-united.html">Newcastle United</a>
</span>
</dt>
</dl>
<dl class="matches">
<dt class="match">
<span class="match-time">15:00</span>
<span class="home-side">
<span>
<img src="http://omo.akamai.opta.net/image.php?&sport=football&entity=team&description=badges&dimensions=20&id=13" alt="Leicester City">
</span>
<a href="http://www.dailymail.co.uk/sport/teampages/leicester.html">Leicester City</a>
</span>
<span>
<span> </span>
<span>vs</span>
<span> </span>
</span>
<span class="away-side">
<span>
<img src="http://omo.akamai.opta.net/image.php?&sport=football&entity=team&description=badges&dimensions=20&id=90" alt="Burnley">
</span>
<a href="http://www.dailymail.co.uk/sport/teampages/burnley.html">Burnley</a>
</span>
</dt>
</dl>
<h3>Saturday 4 October</h3>
<dl class="matches">
<dt class="match">
<span class="match-time">15:00</span>
<span class="home-side">
<span>
<img src="http://omo.akamai.opta.net/image.php?&sport=football&entity=team&description=badges&dimensions=20&id=14" alt="Liverpool">
</span>
<a href="http://www.dailymail.co.uk/sport/teampages/liverpool.html">Liverpool</a>
</span>
<span>
<span> </span>
<span>vs</span>
<span> </span>
</span>
<span class="away-side">
<span>
<img src="http://omo.akamai.opta.net/image.php?&sport=football&entity=team&description=badges&dimensions=20&id=35" alt="West Bromwich Albion">
</span>
<a href="http://www.dailymail.co.uk/sport/teampages/west-bromwich-albion.html">West Bromwich Albion</a>
</span>
</dt>
</dl>
<dl class="matches">
<dt class="match">
<span class="match-time">15:00</span>
<span class="home-side">
<span>
<img src="http://omo.akamai.opta.net/image.php?&sport=football&entity=team&description=badges&dimensions=20&id=110" alt="Stoke City">
</span>
<a href="http://www.dailymail.co.uk/sport/teampages/stoke-city.html">Stoke City</a>
</span>
<span>
<span> </span>
<span>vs</span>
<span> </span>
</span>
<span class="away-side">
<span>
<img src="http://omo.akamai.opta.net/image.php?&sport=football&entity=team&description=badges&dimensions=20&id=31" alt="Crystal Palace">
</span>
<a href="http://www.dailymail.co.uk/sport/teampages/crystal-palace.html">Crystal Palace</a>
</span>
</dt>
</dl>
</div>
9月29日星期一
20:00
vs
15:00
vs
10月4日星期六
15:00
vs
15:00
vs
如果结构总是相同的,您可以先将它指向具有该alt值的img标记,然后向后遍历它 例如:
$dom = new DOMDocument();
$dom->loadHTML($markup);
$xpath = new DOMXpath($dom);
$needle = 'Hull City';
$element = $xpath->query("//span/img[contains(@alt, '$needle')]");
if($element->length > 0) {
$img = $element->item(0);
$header = $xpath->query('ancestor::node()/preceding-sibling::h3[1]', $img);
if($header->length > 0) {
echo $header->item(0)->nodeValue; // Saturday 4 October
}
}
您可以尝试以下XPath:
//h3[following-sibling::dl[1][.//span[contains(concat(' ', normalize-space(@class), ' '), ' home-side ') and span/img[@alt='Hull City']]]]
基本上,在XPath上面选择
元素,该元素具有下一个同级
元素,该元素包含一个
,另一个
元素包含
(格式化版本):
更新:
以下是检查主队和客队的XPath示例:
//h3[
following-sibling::dl[1][
.//span[
contains(concat(' ', normalize-space(@class), ' '), ' home-side ')
and
span/img[@alt='Hull City']
]
and
.//span[
contains(concat(' ', normalize-space(@class), ' '), ' away-side ')
and
span/img[@alt='Crystal Palace']
]
]
]
更新2:
为了能够计算多个
,我认为首先找到满足主客场球队标准的
会更容易,然后向后移动,从这些
中找到最近的
元素:
//dl[
.//span[
contains(concat(' ', normalize-space(@class), ' '), ' home-side ')
and
span/img[@alt='Stoke City']
]
and
.//span[
contains(concat(' ', normalize-space(@class), ' '), ' away-side ')
and
span/img[@alt='Crystal Palace']
]
]/preceding-sibling::h3[1]
这个结构总是一样的吗?是的,只是重复它自己。e、 g.
div.fixtures
然后h3
或1个或多个dl.matches
我的建议是删除[4]
。它只会对文档结构产生更高的依赖性,不需要它。@jlrise这是您的一个很好的建议,感谢您的这一见解,非常感谢@har07的建议,但它似乎也允许我在同一个查询中定义客场。@Ghost:如果
用于多个匹配,那么会发生什么情况?e、 g.如何在我的代码片段中获得莱斯特城主场比赛的比赛日期?@u01jmg3是的,它仍然说明了这一点,因为它找到了前面的
,所以它向后看,然后[1]
得到了第一个
,您的回答实际上有助于解决我遇到的另一个问题,即扩展我的查询以定义远端:[.//span[contains(concat(“”,normalize space(@class),“”),“away side”)和span/img[@alt='West Bromwich Albion']]
。我将如何将其包括在您建议的查询中?谢谢我没有提到的另一件事是,
有时会引用多个
,这样您就可以:
然后
然后
然后
然后
。将更新我的问题,以及在出现多个
的情况下您希望发生什么?这个答案中的XPath只检查最近的
…它必须遍历,直到在
之前找到最接近的
,作为同级-有关示例,请参见HTML代码段。e、 斯托克城对水晶宫的比赛将在星期六进行October@u01jmg3检查更新2部分
//dl[
.//span[
contains(concat(' ', normalize-space(@class), ' '), ' home-side ')
and
span/img[@alt='Stoke City']
]
and
.//span[
contains(concat(' ', normalize-space(@class), ' '), ' away-side ')
and
span/img[@alt='Crystal Palace']
]
]/preceding-sibling::h3[1]