简单HTML Dom解析器-跳过具有特定ID的元素
我使用简单的HTMLDOM解析器来查询Google中的特定关键字,然后循环浏览内容。但是,我不想查询广告或新闻框。它很容易排除广告,因为列表元素有不同的类,但newsbox元素有相同的类,但有一个额外的id 结果锂元素 有没有其他的想法或者有人以前解决过这个问题 使现代化 我仍然在做这件事,我几乎走到了死胡同。这是我的最新代码:简单HTML Dom解析器-跳过具有特定ID的元素,dom,simple-html-dom,Dom,Simple Html Dom,我使用简单的HTMLDOM解析器来查询Google中的特定关键字,然后循环浏览内容。但是,我不想查询广告或新闻框。它很容易排除广告,因为列表元素有不同的类,但newsbox元素有相同的类,但有一个额外的id 结果锂元素 有没有其他的想法或者有人以前解决过这个问题 使现代化 我仍然在做这件事,我几乎走到了死胡同。这是我的最新代码: include('simple_html_dom.php'); $html = file_get_html('https://www.google.co.uk/sea
include('simple_html_dom.php');
$html = file_get_html('https://www.google.co.uk/search?q=football');
// Find all article blocks
foreach($html->find('#res h3.r') as $article) {
$item['title'] = $article->plaintext;
$item['intro'] = $article->find('a', 0)->href;
$articles[] = $item;
}
print_r($articles);
这是打印阵列
我不明白为什么第二个结果array[1][title]存储在数组中,因为根据这一行$html->find'res h3.r'as$article,它不应该存储在数组中。它既不包含在id为res的div中,也不包含在h3标记中
有什么想法吗?不幸的是,简单的HTML Dom解析器不支持这种灵活性,但是可以找到一个工作区 您可以先删除不需要的块,然后检索正确的块: $query->find'linewsbox',0->outertext=; $li_elements=$query->find'li.g'; 编辑: 下面是一个示例代码,展示了它的工作原理:
$input = <<<_DATA_
<div class="g" id="newsbox">Bad node</div>
<div class="g">Usefull node</div>
_DATA_;
// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($input);
// Remove the bad node
$html->find('div#newsbox', 0)->outertext = ''; // Comment this line to print the original html content
echo $html;
simple_html_dom声称支持这一点,因此它似乎是一个bug
正确的css选择方法是li.g:notnewsbox,它不受simple支持,但受支持。我刚刚尝试了这个方法,但得到了错误警告:尝试在…中分配非对象的属性。。。。它仍然查询所有列表元素。有什么想法吗?应该行的,检查我添加的演示。。。也许这个错误来自其他地方…它对我不起作用。请看我的最新答案。。。出于某种原因,新闻部分也总是被删掉。
$query = file_get_html('https://google.com/search?q=test');
$li_elements = $query->find('li[class=g id!=newsbox]');
include('simple_html_dom.php');
$html = file_get_html('https://www.google.co.uk/search?q=football');
// Find all article blocks
foreach($html->find('#res h3.r') as $article) {
$item['title'] = $article->plaintext;
$item['intro'] = $article->find('a', 0)->href;
$articles[] = $item;
}
print_r($articles);
Array
(
[0] => Array
(
[title] => BBC Sport - Football
[intro] => /url?q=http://www.bbc.co.uk/sport/0/football/&sa=U&ei=NkblU-s8h6nQBcCJgOAI&ved=0CBQQFjAA&usg=AFQjCNGHTFqXJoRjHKBSCdKFiW_BX6eGDw
)
[1] => Array
(
[title] => News for football
[intro] => /search?q=football&ie=UTF-8&prmd=ivnsl&source=univ&tbm=nws&tbo=u&sa=X&ei=NkblU-s8h6nQBcCJgOAI&ved=0CB8QqAI
)
[2] => Array
(
[title] => Football Games, Results, Scores, Transfers, News | Sky Sports
[intro] => /url?q=http://www1.skysports.com/football/&sa=U&ei=NkblU-s8h6nQBcCJgOAI&ved=0CCgQFjAE&usg=AFQjCNE4VP4WAHIYJAoPIBJoUx1pC-1jBA
)
[3] => Array
(
[title] => Local business results for football near London NW5
[intro] => https://maps.google.co.uk/maps?um=1&ie=UTF-8&fb=1&gl=uk&q=football&hq=football&hnear=0x48761a535791ef6f:0x493f677c231558c8,London+NW5&sa=X&ei=NkblU-s8h6nQBcCJgOAI&ved=0CC4QtQM
)
[4] => Array
(
[title] => Football news, match reports and fixtures | Football | The Guardian
[intro] => /url?q=http://www.theguardian.com/football&sa=U&ei=NkblU-s8h6nQBcCJgOAI&ved=0CE4QFjAM&usg=AFQjCNHPhgIljb53cFPRHlb1vCa1fmWJag
)
[5] => Array
(
[title] => NewsNow: Football News | Breaking News & Search 24/7
[intro] => /url?q=http://www.newsnow.co.uk/h/Sport/Football&sa=U&ei=NkblU-s8h6nQBcCJgOAI&ved=0CFQQFjAN&usg=AFQjCNEmmlrEayvHdebKTfPykGhHxRioLA
)
[6] => Array
(
[title] => Football365 - Football News, Views, Gossip and much more...
[intro] => /url?q=http://www.football365.com/&sa=U&ei=NkblU-s8h6nQBcCJgOAI&ved=0CFoQFjAO&usg=AFQjCNFKIP3xgtxw9DhNtOhVfpT4pbpLPw
)
[7] => Array
(
[title] => Football - Wikipedia, the free encyclopedia
[intro] => /url?q=http://en.wikipedia.org/wiki/Football&sa=U&ei=NkblU-s8h6nQBcCJgOAI&ved=0CGAQFjAP&usg=AFQjCNF2Fk8WH4rzEvWzmYIEUycZnjvjpg
)
[8] => Array
(
[title] => Football in London - Things To Do - visitlondon.com
[intro] => /url?q=http://www.visitlondon.com/things-to-do/whats-on/sport/football&sa=U&ei=NkblU-s8h6nQBcCJgOAI&ved=0CGYQFjAQ&usg=AFQjCNEdSNJc-mlVpaWEY9yPjcoDSaDLIw
)
[9] => Array
(
[title] => London Football Leagues - 5-a-side - 7-a-side - 11-a-side - Midweek ...
[intro] => /url?q=http://www.londonfootball.co.uk/&sa=U&ei=NkblU-s8h6nQBcCJgOAI&ved=0CHMQFjAR&usg=AFQjCNGnZtZQxUmUYQtDF0Tj5nJRnR2Yig
)
[10] => Array
(
[title] => Football Tickets and Event Details | Ticketmaster UK Sport
[intro] => /url?q=http://www.ticketmaster.co.uk/browse/football-catid-11/sport-rid-10004&sa=U&ei=NkblU-s8h6nQBcCJgOAI&ved=0CHkQFjAS&usg=AFQjCNFwTfpq-klboIEf0EbhlMQWvzHeKQ
)
$input = <<<_DATA_
<div class="g" id="newsbox">Bad node</div>
<div class="g">Usefull node</div>
_DATA_;
// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($input);
// Remove the bad node
$html->find('div#newsbox', 0)->outertext = ''; // Comment this line to print the original html content
echo $html;