Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/296.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用domphp解析器_Php_Dom_Html Parsing - Fatal编程技术网

如何使用domphp解析器

如何使用domphp解析器,php,dom,html-parsing,Php,Dom,Html Parsing,我不熟悉PHP中的DOM解析: 我有一个HTML文件,我正试图解析它。它有一堆像这样的div: <div id="interestingbox"> <div id="interestingdetails" class="txtnormal"> <div>Content1</div> <div>Content2</div> </div> </div> &l

我不熟悉PHP中的DOM解析:
我有一个HTML文件,我正试图解析它。它有一堆像这样的div:

<div id="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div id="interestingbox"> 
......

内容1
内容2
......
我正在尝试使用php获取许多div框的内容。 如何使用DOM解析器来实现这一点

谢谢

我从以下几个方面入手:

$html = file_get_html('example.com');
foreach ($html->find('div[id=interestingbox]') as $result)
{
    echo $result->innertext;
}

首先我必须告诉你,你不能在两个不同的div上使用相同的id;对于这一点,有一些课程。每个元素都应该有一个唯一的id

获取id为=“interestingbox”的div内容的代码

$html='1!'
内容1
内容2
';
$dom_document=新的DOMDocument();
$dom_document->loadHTML($html);
//使用DOMXpath通过DOM导航html
$dom_xpath=newdomxpath($dom_document);
//如果要获取id=interestingbox的div
$elements=$dom\u xpath->query(“*/div[@id='interestingbox']”);
如果(!为null($elements)){
foreach($elements作为$element){
echo“\n[”$element->nodeName.]”;
$nodes=$element->childNodes;
foreach($node作为$node){
echo$node->nodeValue。“\n”;
}
}
}
//输出
[部门]{
内容1
内容2
}
类的示例:

$html = '
<html>
<head></head>
<body>
<div class="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div class="interestingbox"><a href="#">a link</a></div>
</body>
</html>';

//the same as before.. just change the xpath

[...]

$elements = $dom_xpath->query("*/div[@class='interestingbox']");

[...]

//OUTPUT
[div]  {
        Content1
        Content2
}

[div]  {
a link
}
$html='1!'
内容1
内容2
';
//和以前一样。。只需更改xpath
[...]
$elements=$dom\u xpath->query(“*/div[@class='interestingbox']”);
[...]
//输出
[部门]{
内容1
内容2
}
[部门]{
链接
}
有关更多详细信息,请参阅本页。

函数innerXML($node)
{ 
$doc=$node->ownerDocument;
$frag=$doc->createDocumentFragment();
foreach($node->childNodes作为$child)
{ 
$frag->appendChild($child->cloneNode(TRUE));
} 
返回$doc->saveXML($frag);
}  
$dom=新的DOMDocument();
$dom->loadXML('
我想要的第一点数据

我想要的第二位数据
我想要的第三位数据 '); $xpath=newdomxpath($dom); $node=$xpath->evaluate(“/html/body//td[@id='foo']”); $dataString=innerXML($node->item(0)); $dataArr=explode(“
,$dataString”); $dataUno=$dataArr[0]; $dataDos=$dataArr[1]; $dataTres=$dataArr[2]; echo“firstdata=$nameUno
seconddata=$nameDos
thirddata=$nameTres
网络提取器: 它可以用css、正则表达式、xpath选择器解析页面

查看软件包和测试中的示例:

使用WebExtractor\DataExtractor\DataExtractor工厂;使用 WebExtractor\DataExtractor\DataExtractor类型;使用 WebExtractor\Client\Client

$factory=DataExtractorFactory::getFactory()$提取器= $factory->createDataExtractor(DataExtractorTypes::CSS)$客户端=新 客户$内容= $client->get(“”); $extractor->setContent($content)$h1= $extractor->setSelector('h1')->extract()


这很容易使用
$html = '
<html>
<head></head>
<body>
<div class="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div class="interestingbox"><a href="#">a link</a></div>
</body>
</html>';

//the same as before.. just change the xpath

[...]

$elements = $dom_xpath->query("*/div[@class='interestingbox']");

[...]

//OUTPUT
[div]  {
        Content1
        Content2
}

[div]  {
a link
}
function innerXML($node) 

{ 

    $doc  = $node->ownerDocument; 

    $frag = $doc->createDocumentFragment(); 

    foreach ($node->childNodes as $child) 

    { 

        $frag->appendChild($child->cloneNode(TRUE)); 

    } 

    return $doc->saveXML($frag); 

}  


$dom = new DOMDocument(); 

$dom->loadXML(' 

<html> 

<body> 

<table> 

<tr> 

    <td id="foo">  

        The first bit of Data I want 

        <br />The second bit of Data I want 

        <br />The third bit of Data I want 

    </td> 

</tr> 

</table> 

<body> 

<html> 



'); 

$xpath = new DOMXPath($dom); 

$node = $xpath->evaluate("/html/body//td[@id='foo' ]"); 

$dataString = innerXML($node->item(0)); 
$dataArr = explode("<br />", $dataString); 

$dataUno = $dataArr[0]; 
$dataDos = $dataArr[1]; 
$dataTres = $dataArr[2]; 

echo "firstdata = $nameUno<br />seconddata = $nameDos<br />thirddata = $nameTres<br />"