Php 使用getlementbyclass名称或getlementbytag从html内容中提取数据
在这里,我从网页上获取了源代码snipper: 我想为页面上的每一个区块抓取日期、评论、评分 @: 我不太熟悉DOM元素,如果有人能纠正这一点,我将不胜感激 代码如下:Php 使用getlementbyclass名称或getlementbytag从html内容中提取数据,php,regex,dom,Php,Regex,Dom,在这里,我从网页上获取了源代码snipper: 我想为页面上的每一个区块抓取日期、评论、评分 @: 我不太熟悉DOM元素,如果有人能纠正这一点,我将不胜感激 代码如下: <?php // your code goes here $html = <<< EOF <div class="review-wrapper"> <div class="review-content">
<?php
// your code goes here
$html = <<< EOF
<div class="review-wrapper">
<div class="review-content">
<div class="biz-rating biz-rating-very-large clearfix">
<div itemprop="reviewRating" itemscope itemtype="http://schema.org/Rating">
<div class="rating-very-large">
<i class="star-img stars_5" title="5.0 star rating">
<img alt="5.0 star rating" class="offscreen" height="303" src="http://s3-media3.ak.yelpcdn.com/assets/2/www/img/c2252a4cd43e/ico/stars/v2/stars_map.png" width="84">
</i>
<meta itemprop="ratingValue" content="5.0">
</div>
</div>
<span class="rating-qualifier">
<meta itemprop="datePublished" content="2013-10-28">
10/28/2013
</span>
</div>
<p class="review_comment ieSucks" itemprop="description" lang="en">The reason I started a yelp account, was to write a review for Franchinos. This is my favorite restaurant in the city of San Francisco, and especially, North Beach. <br><br>Where do I start... I take every friend, family member and acquaintance to Franchinos in every opportunity I can. I am a Italy-nut and have been over three times - the mood + atmosphere is almost identical. It is a 100% family-run restaurant and you can taste the expertise and 'home-cooking'. <br><br>Each time, I get a large bottle of wine (One time - they ran out of the wine I had ordered - and instead gave me a larger, more expensive bottle - same price), a wonderful pasta dish (Alfredo, carbonara.. etc.) and a Caesar salad.<br><br>Need I say more? Buenisimo. I look forward to the next time.. and the times after that again and again. <br><br>è perfetto!</p>
</div>
<div class="review-footer clearfix">
<div class="rateReview ufc-feedback clearfix" data-review-id="SnZ4Q97nJdR7a-fot-Slcw">
<p class="review-intro review-message">
Was this review …?
</p>
EOF;
$dom = new DOMDocument();
@$dom->loadHTML($html);
$classname = 'review-content'
$finder = new DomXPath($dom);
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
$tmp_dom = new DOMDocument();
foreach($nodes as $result) {
//getting rate value from '<meta itemprop="ratingValue" content="5.0">'
//getting date from <span class="rating-qualifier"> <meta itemprop="datePublished" content="2013-10-28"> 10/28/2013 </span>
//getting review from ' <p class="review_comment ieSucks" itemprop="description" lang="en">The reason I started a yelp account, was to write a review for Franchinos. This is my favorite restaurant in the city of San Francisco, and especially, North Beach. <br><br>Where do I start... I take every friend, family member and acquaintance to Franchinos in every opportunity I can. I am a Italy-nut and have been over three times - the mood + atmosphere is almost identical. It is a 100% family-run restaurant and you can taste the expertise and 'home-cooking'. <br><br>Each time, I get a large bottle of wine (One time - they ran out of the wine I had ordered - and instead gave me a larger, more expensive bottle - same price), a wonderful pasta dish (Alfredo, carbonara.. etc.) and a Caesar salad.<br><br>Need I say more? Buenisimo. I look forward to the next time.. and the times after that again and again. <br><br>è perfetto!</p> '
}
您可以循环查看类
值或标记
名称,如下所示:
$classname = 'rating-qualifier';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='" . $classname . "']");
if ($results->length > 0) {
echo $review = $results->item(0)->nodeValue;
}
$classname = 'review_comment ieSucks';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='" . $classname . "']");
if ($results->length > 0) {
echo $review = $results->item(0)->nodeValue;
}
$meta = $dom->documentElement->getElementsByTagName("meta");
echo $meta->item(0)->getAttribute('content');
显然,您可以使用一个简单的for
循环来循环评级部分,以获得页面上的所有评级
此处演示:或者可以先卷曲页面,然后使用命令读取内容好的,如果认为html中有字符,字符串会在两者之间终止,您必须找到这样的”
,并使用反斜杠将其转义,以便整个html源代码都能正常工作string@user123是的,伙计,我删除了/div和/li,然后解析了代码谢谢你的帮助。你提出了解决方案。我伸手去拿我想要的东西want@user123由于php5循环引用内存泄漏,在创建DOM对象后,如果多次调用file_get_DOM(),则必须调用$DOM->clear()以释放内存。