PHP文档解析HTML
我有以下HTML标记PHP文档解析HTML,php,xpath,html-parsing,domdocument,Php,Xpath,Html Parsing,Domdocument,我有以下HTML标记 <div contenteditable="true" class="text"></div> <div contenteditable="true" class="text"></div> <div style="display: block;" class="ui-draggable">
<div contenteditable="true" class="text"></div>
<div contenteditable="true" class="text"></div>
<div style="display: block;" class="ui-draggable">
<img class='avatar' src=""/>
<p style="">
<img class='pic' src=""/><br>
<span class='fulltext' style="display:none"></span>
</p>-<span class='create'></span>
<a class='permalink' href=""></a>
</div>
<div contenteditable="true" class="text"></div>
<div style="display: block;" class="ui-draggable">
<img class='avatar' src=""/>
<p style="">
<img class='pic' src=""/><br>
<span class='fulltext' style="display:none"></span>
</p><span class='create'></span><a class='permalink' href=""></a>
</div>
-
父div可以更多-
$dom = new DOMDocument();
$dom->loadHTML($xml);
$xpath = new DOMXPath($dom);
$div = $xpath->query('//div');
$i=0;
$q=1;
foreach($div as $book) {
$attr = $book->getAttribute('class');
//if div contenteditable
if($attr == 'text') {
echo '</br>'.$book->nodeValue."</br>";
}
else {
$new = new DOMDocument();
$newxpath = new DOMXPath($new);
$avatar = $xpath->query("(//img[@class='avatar']/@src)[$q]");
$picture = $xpath->query("(//p/img[@class='pic']/@src)[$q]");
$fulltext = $xpath->query("(//p/span[@class='fulltext'])[$q]");
$permalink = $xpath->query("(//a[@class='permalink'])[$q]");
echo $permalink->item(0)->nodeValue; //date
echo $permalink->item(0)->getAttribute('href');
echo $fulltext->item(0)->nodeValue;
echo $avatar->item(0)->value;
echo $picture->item(0)->value;
$q++;
}
$i++;
}
$dom=newdomdocument();
$dom->loadHTML($xml);
$xpath=newdomxpath($dom);
$div=$xpath->query('//div');
$i=0;
$q=1;
foreach($div作为$book){
$attr=$book->getAttribute('class');
//如果div内容是可编辑的
如果($attr=='text'){
回显“”.$book->nodeValue.”;
}
否则{
$new=新文档();
$newxpath=newdomxpath($new);
$avatar=$xpath->query(//img[@class='avatar']/@src)[$q]”;
$picture=$xpath->query(//p/img[@class='pic']/@src)[$q]”;
$fulltext=$xpath->query(//p/span[@class='fulltext'])[$q];
$permalink=$xpath->query(//a[@class='permalink'])[$q]”;
echo$permalink->item(0)->nodeValue;//日期
echo$permalink->item(0)->getAttribute('href');
echo$fulltext->item(0)->nodeValue;
echo$avatar->item(0)->value;
echo$picture->item(0)->值;
$q++;
}
$i++;
}
但是我认为有更好的方法来解析HTML。有?提前感谢您事实上,您的做法是正确的:必须使用DOM对象解析html。 然后可以进行一些优化:
$div = $xpath->query('//div');
非常贪婪,getElementsByTagName应该更合适:
$div = $dom->getElementsByTagName('div');
请注意,它支持另一个名为contextparam
的参数。此外,在循环中不需要第二个DOMDocument和DOMXPath。使用:
$avatar = $xpath->query("img[@class='avatar']/@src", $book);
获取相对于div节点的
属性节点。如果你听从我的建议,你的榜样应该是好的
下面是您的代码的一个版本,它遵循上述说明:
$dom = new DOMDocument();
$dom->loadHTML($xml);
$xpath = new DOMXPath($dom);
$divs = $xpath->query('//div');
foreach($divs as $book) {
$attr = $book->getAttribute('class');
if($attr == 'text') {
echo '</br>'.$book->nodeValue."</br>";
} else {
$avatar = $xpath->query("img[@class='avatar']/@src", $book);
$picture = $xpath->query("p/img[@class='pic']/@src", $book);
$fulltext = $xpath->query("p/span[@class='fulltext']", $book);
$permalink = $xpath->query("a[@class='permalink']", $book);
echo $permalink->item(0)->nodeValue; //date
echo $permalink->item(0)->getAttribute('href');
echo $fulltext->item(0)->nodeValue;
echo $avatar->item(0)->value;
echo $picture->item(0)->value;
}
}
$dom=newdomdocument();
$dom->loadHTML($xml);
$xpath=newdomxpath($dom);
$divs=$xpath->query('//div');
foreach($divs作为$book){
$attr=$book->getAttribute('class');
如果($attr=='text'){
回显“”.$book->nodeValue.”;
}否则{
$avatar=$xpath->query(“img[@class='avatar']/@src”,$book);
$picture=$xpath->query(“p/img[@class='pic']/@src”,$book);
$fulltext=$xpath->query(“p/span[@class='fulltext']”,$book);
$permalink=$xpath->query(“a[@class='permalink']”,$book);
echo$permalink->item(0)->nodeValue;//日期
echo$permalink->item(0)->getAttribute('href');
echo$fulltext->item(0)->nodeValue;
echo$avatar->item(0)->value;
echo$picture->item(0)->值;
}
}
$avatar=$avatar代码>没有用是的,我错过了。Thanks我对$q
@artragis的用法表示怀疑。请注意,这两条语句将返回相同的值。在任何情况下,都会缓冲.getElementsByTagName,因此它在内存中的贪婪程度较低。让我在@internals list上找到消息并将其作为证据显示给您。“尝试获取非对象的属性”-echo$picture->..
,echo$fulltext->..
您能将完整的HTML发布到pastebin吗?非常好。非常感谢你。最后一个问题-nodeValue
、value
和textValue
之间的区别是什么?在上面的示例中,您有时会选择doElement节点->nodeValue、DOMAttribute节点->值。。我不确定textValue。应该是DOMTextNode的值,或者是DOMElementNode的子节点的文本、扁平表示