简单HTML Dom解析器-跳过具有特定ID的元素_Dom_Simple Html Dom

简单HTML Dom解析器-跳过具有特定ID的元素

dom

简单HTML Dom解析器-跳过具有特定ID的元素,dom,simple-html-dom,Dom,Simple Html Dom,我使用简单的HTMLDOM解析器来查询Google中的特定关键字，然后循环浏览内容。但是，我不想查询广告或新闻框。它很容易排除广告，因为列表元素有不同的类，但newsbox元素有相同的类，但有一个额外的id 结果锂元素有没有其他的想法或者有人以前解决过这个问题使现代化我仍然在做这件事，我几乎走到了死胡同。这是我的最新代码： include('simple_html_dom.php'); $html = file_get_html('https://www.google.co.uk/sea

我使用简单的HTMLDOM解析器来查询Google中的特定关键字，然后循环浏览内容。但是，我不想查询广告或新闻框。它很容易排除广告，因为列表元素有不同的类，但newsbox元素有相同的类，但有一个额外的id

结果锂元素

有没有其他的想法或者有人以前解决过这个问题

使现代化我仍然在做这件事，我几乎走到了死胡同。这是我的最新代码：

include('simple_html_dom.php');

$html = file_get_html('https://www.google.co.uk/search?q=football');

// Find all article blocks
foreach($html->find('#res h3.r') as $article) {
    $item['title']     = $article->plaintext;
    $item['intro']    = $article->find('a', 0)->href;
    $articles[] = $item;
}

print_r($articles);

这是打印阵列

我不明白为什么第二个结果array[1][title]存储在数组中，因为根据这一行$html->find'res h3.r'as$article，它不应该存储在数组中。它既不包含在id为res的div中，也不包含在h3标记中

有什么想法吗？

不幸的是，简单的HTML Dom解析器不支持这种灵活性，但是可以找到一个工作区

您可以先删除不需要的块，然后检索正确的块：

$query->find'linewsbox'，0->outertext=； $li_elements=$query->find'li.g'；编辑：下面是一个示例代码，展示了它的工作原理：

$input =  <<<_DATA_
<div class="g" id="newsbox">Bad node</div>
<div class="g">Usefull node</div>
_DATA_;

// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($input);

// Remove the bad node
$html->find('div#newsbox', 0)->outertext = ''; // Comment this line to print the original html content

echo $html;

simple_html_dom声称支持这一点，因此它似乎是一个bug

正确的css选择方法是li.g:notnewsbox，它不受simple支持，但受支持。

我刚刚尝试了这个方法，但得到了错误警告：尝试在…中分配非对象的属性。。。。它仍然查询所有列表元素。有什么想法吗？应该行的，检查我添加的演示。。。也许这个错误来自其他地方…它对我不起作用。请看我的最新答案。。。出于某种原因，新闻部分也总是被删掉。

$query = file_get_html('https://google.com/search?q=test');    
$li_elements = $query->find('li[class=g id!=newsbox]');

include('simple_html_dom.php');

$html = file_get_html('https://www.google.co.uk/search?q=football');

// Find all article blocks
foreach($html->find('#res h3.r') as $article) {
    $item['title']     = $article->plaintext;
    $item['intro']    = $article->find('a', 0)->href;
    $articles[] = $item;
}

print_r($articles);

$input =  <<<_DATA_
<div class="g" id="newsbox">Bad node</div>
<div class="g">Usefull node</div>
_DATA_;

// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($input);

// Remove the bad node
$html->find('div#newsbox', 0)->outertext = ''; // Comment this line to print the original html content

echo $html;