简单HTML Dom解析器-跳过具有特定ID的元素

简单HTML Dom解析器-跳过具有特定ID的元素,dom,simple-html-dom,Dom,Simple Html Dom,我使用简单的HTMLDOM解析器来查询Google中的特定关键字,然后循环浏览内容。但是,我不想查询广告或新闻框。它很容易排除广告,因为列表元素有不同的类,但newsbox元素有相同的类,但有一个额外的id 结果锂元素 有没有其他的想法或者有人以前解决过这个问题 使现代化 我仍然在做这件事,我几乎走到了死胡同。这是我的最新代码: include('simple_html_dom.php'); $html = file_get_html('https://www.google.co.uk/sea

我使用简单的HTMLDOM解析器来查询Google中的特定关键字,然后循环浏览内容。但是,我不想查询广告或新闻框。它很容易排除广告,因为列表元素有不同的类,但newsbox元素有相同的类,但有一个额外的id

结果锂元素

有没有其他的想法或者有人以前解决过这个问题

使现代化 我仍然在做这件事,我几乎走到了死胡同。这是我的最新代码:

include('simple_html_dom.php');

$html = file_get_html('https://www.google.co.uk/search?q=football');

// Find all article blocks
foreach($html->find('#res h3.r') as $article) {
    $item['title']     = $article->plaintext;
    $item['intro']    = $article->find('a', 0)->href;
    $articles[] = $item;
}

print_r($articles);
这是打印阵列

我不明白为什么第二个结果array[1][title]存储在数组中,因为根据这一行$html->find'res h3.r'as$article,它不应该存储在数组中。它既不包含在id为res的div中,也不包含在h3标记中


有什么想法吗?

不幸的是,简单的HTML Dom解析器不支持这种灵活性,但是可以找到一个工作区

您可以先删除不需要的块,然后检索正确的块:

$query->find'linewsbox',0->outertext=; $li_elements=$query->find'li.g'; 编辑: 下面是一个示例代码,展示了它的工作原理:

$input =  <<<_DATA_
<div class="g" id="newsbox">Bad node</div>
<div class="g">Usefull node</div>
_DATA_;

// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($input);

// Remove the bad node
$html->find('div#newsbox', 0)->outertext = ''; // Comment this line to print the original html content

echo $html;

simple_html_dom声称支持这一点,因此它似乎是一个bug


正确的css选择方法是li.g:notnewsbox,它不受simple支持,但受支持。

我刚刚尝试了这个方法,但得到了错误警告:尝试在…中分配非对象的属性。。。。它仍然查询所有列表元素。有什么想法吗?应该行的,检查我添加的演示。。。也许这个错误来自其他地方…它对我不起作用。请看我的最新答案。。。出于某种原因,新闻部分也总是被删掉。
$query = file_get_html('https://google.com/search?q=test');    
$li_elements = $query->find('li[class=g id!=newsbox]');
include('simple_html_dom.php');

$html = file_get_html('https://www.google.co.uk/search?q=football');

// Find all article blocks
foreach($html->find('#res h3.r') as $article) {
    $item['title']     = $article->plaintext;
    $item['intro']    = $article->find('a', 0)->href;
    $articles[] = $item;
}

print_r($articles);
Array
(
[0] => Array
    (
        [title] => BBC Sport - Football
        [intro] => /url?q=http://www.bbc.co.uk/sport/0/football/&amp;sa=U&amp;ei=NkblU-s8h6nQBcCJgOAI&amp;ved=0CBQQFjAA&amp;usg=AFQjCNGHTFqXJoRjHKBSCdKFiW_BX6eGDw
    )

[1] => Array
    (
        [title] => News for football
        [intro] => /search?q=football&amp;ie=UTF-8&amp;prmd=ivnsl&amp;source=univ&amp;tbm=nws&amp;tbo=u&amp;sa=X&amp;ei=NkblU-s8h6nQBcCJgOAI&amp;ved=0CB8QqAI
    )

[2] => Array
    (
        [title] => Football Games, Results, Scores, Transfers, News | Sky Sports
        [intro] => /url?q=http://www1.skysports.com/football/&amp;sa=U&amp;ei=NkblU-s8h6nQBcCJgOAI&amp;ved=0CCgQFjAE&amp;usg=AFQjCNE4VP4WAHIYJAoPIBJoUx1pC-1jBA
    )

[3] => Array
    (
        [title] => Local business results for football near London NW5
        [intro] => https://maps.google.co.uk/maps?um=1&amp;ie=UTF-8&amp;fb=1&amp;gl=uk&amp;q=football&amp;hq=football&amp;hnear=0x48761a535791ef6f:0x493f677c231558c8,London+NW5&amp;sa=X&amp;ei=NkblU-s8h6nQBcCJgOAI&amp;ved=0CC4QtQM
    )

[4] => Array
    (
        [title] => Football news, match reports and fixtures | Football | The Guardian
        [intro] => /url?q=http://www.theguardian.com/football&amp;sa=U&amp;ei=NkblU-s8h6nQBcCJgOAI&amp;ved=0CE4QFjAM&amp;usg=AFQjCNHPhgIljb53cFPRHlb1vCa1fmWJag
    )

[5] => Array
    (
        [title] => NewsNow: Football News | Breaking News &amp; Search 24/7
        [intro] => /url?q=http://www.newsnow.co.uk/h/Sport/Football&amp;sa=U&amp;ei=NkblU-s8h6nQBcCJgOAI&amp;ved=0CFQQFjAN&amp;usg=AFQjCNEmmlrEayvHdebKTfPykGhHxRioLA
    )

[6] => Array
    (
        [title] => Football365 - Football News, Views, Gossip and much more...
        [intro] => /url?q=http://www.football365.com/&amp;sa=U&amp;ei=NkblU-s8h6nQBcCJgOAI&amp;ved=0CFoQFjAO&amp;usg=AFQjCNFKIP3xgtxw9DhNtOhVfpT4pbpLPw
    )

[7] => Array
    (
        [title] => Football - Wikipedia, the free encyclopedia
        [intro] => /url?q=http://en.wikipedia.org/wiki/Football&amp;sa=U&amp;ei=NkblU-s8h6nQBcCJgOAI&amp;ved=0CGAQFjAP&amp;usg=AFQjCNF2Fk8WH4rzEvWzmYIEUycZnjvjpg
    )

[8] => Array
    (
        [title] => Football in London - Things To Do - visitlondon.com
        [intro] => /url?q=http://www.visitlondon.com/things-to-do/whats-on/sport/football&amp;sa=U&amp;ei=NkblU-s8h6nQBcCJgOAI&amp;ved=0CGYQFjAQ&amp;usg=AFQjCNEdSNJc-mlVpaWEY9yPjcoDSaDLIw
    )

[9] => Array
    (
        [title] => London Football Leagues - 5-a-side - 7-a-side - 11-a-side - Midweek ...
        [intro] => /url?q=http://www.londonfootball.co.uk/&amp;sa=U&amp;ei=NkblU-s8h6nQBcCJgOAI&amp;ved=0CHMQFjAR&amp;usg=AFQjCNGnZtZQxUmUYQtDF0Tj5nJRnR2Yig
    )

[10] => Array
    (
        [title] => Football Tickets and Event Details | Ticketmaster UK Sport
        [intro] => /url?q=http://www.ticketmaster.co.uk/browse/football-catid-11/sport-rid-10004&amp;sa=U&amp;ei=NkblU-s8h6nQBcCJgOAI&amp;ved=0CHkQFjAS&amp;usg=AFQjCNFwTfpq-klboIEf0EbhlMQWvzHeKQ
    )
$input =  <<<_DATA_
<div class="g" id="newsbox">Bad node</div>
<div class="g">Usefull node</div>
_DATA_;

// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($input);

// Remove the bad node
$html->find('div#newsbox', 0)->outertext = ''; // Comment this line to print the original html content

echo $html;