Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
PHP文档解析HTML_Php_Xpath_Html Parsing_Domdocument - Fatal编程技术网

PHP文档解析HTML

PHP文档解析HTML,php,xpath,html-parsing,domdocument,Php,Xpath,Html Parsing,Domdocument,我有以下HTML标记 <div contenteditable="true" class="text"></div> <div contenteditable="true" class="text"></div> <div style="display: block;" class="ui-draggable">

我有以下HTML标记

<div contenteditable="true" class="text"></div>
<div contenteditable="true" class="text"></div>
<div style="display: block;" class="ui-draggable">
    <img class='avatar' src=""/>
    <p style="">
    <img class='pic' src=""/><br>
    <span class='fulltext' style="display:none"></span>
    </p>-<span class='create'></span>
    <a class='permalink' href=""></a>
    </div>
 <div contenteditable="true" class="text"></div>
 <div style="display: block;" class="ui-draggable">
    <img class='avatar' src=""/>
    <p style="">
    <img class='pic' src=""/><br>
    <span class='fulltext' style="display:none"></span>
    </p><span class='create'></span><a class='permalink' href=""></a>
    </div>


-


父div可以更多-

$dom = new DOMDocument();
$dom->loadHTML($xml);
$xpath = new DOMXPath($dom);
$div = $xpath->query('//div');
$i=0;
$q=1;
foreach($div as $book) {
    $attr = $book->getAttribute('class');
    //if div contenteditable
    if($attr == 'text') {
        echo '</br>'.$book->nodeValue."</br>";  
    }
    
    else {
        $new = new DOMDocument();
        $newxpath = new DOMXPath($new);
        $avatar = $xpath->query("(//img[@class='avatar']/@src)[$q]");
        
        $picture = $xpath->query("(//p/img[@class='pic']/@src)[$q]");
        $fulltext = $xpath->query("(//p/span[@class='fulltext'])[$q]");
        $permalink = $xpath->query("(//a[@class='permalink'])[$q]");
        echo $permalink->item(0)->nodeValue; //date
        echo $permalink->item(0)->getAttribute('href');
        echo $fulltext->item(0)->nodeValue;
        echo $avatar->item(0)->value;
        echo $picture->item(0)->value;
        $q++;
    }
    $i++;
}
$dom=newdomdocument();
$dom->loadHTML($xml);
$xpath=newdomxpath($dom);
$div=$xpath->query('//div');
$i=0;
$q=1;
foreach($div作为$book){
$attr=$book->getAttribute('class');
//如果div内容是可编辑的
如果($attr=='text'){
回显“
”.$book->nodeValue.
”; } 否则{ $new=新文档(); $newxpath=newdomxpath($new); $avatar=$xpath->query(//img[@class='avatar']/@src)[$q]”; $picture=$xpath->query(//p/img[@class='pic']/@src)[$q]”; $fulltext=$xpath->query(//p/span[@class='fulltext'])[$q]; $permalink=$xpath->query(//a[@class='permalink'])[$q]”; echo$permalink->item(0)->nodeValue;//日期 echo$permalink->item(0)->getAttribute('href'); echo$fulltext->item(0)->nodeValue; echo$avatar->item(0)->value; echo$picture->item(0)->值; $q++; } $i++; }

但是我认为有更好的方法来解析HTML。有?提前感谢您

事实上,您的做法是正确的:必须使用DOM对象解析html。 然后可以进行一些优化:

$div = $xpath->query('//div');
非常贪婪,getElementsByTagName应该更合适:

$div = $dom->getElementsByTagName('div');
请注意,它支持另一个名为
contextparam
的参数。此外,在循环中不需要第二个DOMDocument和DOMXPath。使用:

$avatar = $xpath->query("img[@class='avatar']/@src", $book);
获取相对于div节点的
属性节点。如果你听从我的建议,你的榜样应该是好的


下面是您的代码的一个版本,它遵循上述说明:

$dom = new DOMDocument();
$dom->loadHTML($xml);

$xpath = new DOMXPath($dom);
$divs = $xpath->query('//div');

foreach($divs as $book) {
    $attr = $book->getAttribute('class');
    if($attr == 'text') {
        echo '</br>'.$book->nodeValue."</br>";  
    } else {
        $avatar = $xpath->query("img[@class='avatar']/@src", $book);
        $picture = $xpath->query("p/img[@class='pic']/@src", $book);
        $fulltext = $xpath->query("p/span[@class='fulltext']", $book);
        $permalink = $xpath->query("a[@class='permalink']", $book);
        echo $permalink->item(0)->nodeValue; //date
        echo $permalink->item(0)->getAttribute('href');
        echo $fulltext->item(0)->nodeValue;
        echo $avatar->item(0)->value;
        echo $picture->item(0)->value;
    }
}
$dom=newdomdocument();
$dom->loadHTML($xml);
$xpath=newdomxpath($dom);
$divs=$xpath->query('//div');
foreach($divs作为$book){
$attr=$book->getAttribute('class');
如果($attr=='text'){
回显“
”.$book->nodeValue.
”; }否则{ $avatar=$xpath->query(“img[@class='avatar']/@src”,$book); $picture=$xpath->query(“p/img[@class='pic']/@src”,$book); $fulltext=$xpath->query(“p/span[@class='fulltext']”,$book); $permalink=$xpath->query(“a[@class='permalink']”,$book); echo$permalink->item(0)->nodeValue;//日期 echo$permalink->item(0)->getAttribute('href'); echo$fulltext->item(0)->nodeValue; echo$avatar->item(0)->value; echo$picture->item(0)->值; } }
$avatar=$avatar没有用是的,我错过了。Thanks我对
$q
@artragis的用法表示怀疑。请注意,这两条语句将返回相同的值。在任何情况下,都会缓冲.getElementsByTagName,因此它在内存中的贪婪程度较低。让我在@internals list上找到消息并将其作为证据显示给您。“尝试获取非对象的属性”-
echo$picture->..
echo$fulltext->..
您能将完整的HTML发布到pastebin吗?非常好。非常感谢你。最后一个问题-
nodeValue
value
textValue
之间的区别是什么?在上面的示例中,您有时会选择doElement节点->nodeValue、DOMAttribute节点->值。。我不确定textValue。应该是DOMTextNode的值,或者是DOMElementNode的子节点的文本、扁平表示