Php 匹配元标记的正则表达式_Php_Regex_Meta Tags

Php 匹配元标记的正则表达式

php regex

Php 匹配元标记的正则表达式,php,regex,meta-tags,Php,Regex,Meta Tags,嗨，我想从页面源中提取og:image内容。如何从源代码中提取og:image元标记内容这是元标记： <meta property="og:image" content="http://www.moneycontrol.com/news_image_files/2013/s/Syrian_diesel_trucks_190.jpg" /> 如何使用正则表达式识别元标记这是我当前的函数从img标签抓取图像url。使用og:image meta标记需要进行哪些修改 functi

嗨，我想从页面源中提取og:image内容。如何从源代码中提取og:image元标记内容

这是元标记：

<meta property="og:image" content="http://www.moneycontrol.com/news_image_files/2013/s/Syrian_diesel_trucks_190.jpg" />

如何使用正则表达式识别元标记

这是我当前的函数从img标签抓取图像url。使用og:image meta标记需要进行哪些修改

function feeds_imagegrabber_scrape_images($content, $base_url, array $options = array(), &$error_log = array()) {

// Merge the default options.
$options += array(
  'expression' => '//img',
  'getsize' => TRUE,
  'max_imagesize' => 512000,
  'timeout' => 10,
  'max_redirects' => 3,
  'feeling_lucky' => 0,
);

$doc = new DOMDocument();
if (@$doc->loadXML($content) === FALSE && @$doc->loadHTML($content) === FALSE) {
  $error_log['code'] = -5;
  $error_log['error'] = "unable to parse the xml//html content";
  return FALSE;
}

$xpath = new DOMXPath($doc);
$hrefs = @$xpath->evaluate($options['expression']);//echo '<pre> HREFS : ';print_r($hrefs->length);exit;

if ($options['getsize']) {
  timer_start(__FUNCTION__);
}

$images = array();
$imagesize = 0;
for ($i = 0; $i < $hrefs->length; $i++) {
  $url = $hrefs->item($i)->getAttribute('src');
  if (!isset($url) || empty($url) || $url == '') {
    continue;
  }
  if(function_exists('encode_url')) {
    $url = encode_url($url);
  }
  $url = url_to_absolute($base_url, $url);

  if ($url == FALSE) {
    continue;
  }

  if ($options['getsize']) {
    if (($imagesize = feeds_imagegrabber_validate_download_size($url, $options['max_imagesize'], ($options['timeout'] - timer_read(__FUNCTION__) / 1000))) != -1)   {
      $images[$url] = $imagesize;
      if ($settings['feeling_lucky']) {
        break;
      }
    }
    if (($options['timeout'] - timer_read(__FUNCTION__) / 1000) <= 0) {
      $error_log['code'] = FIG_HTTP_REQUEST_TIMEOUT;
      $error_log['error'] = "timeout occured while scraping the content";
      break;
    }
  }
  else {
    $images[$url] = $imagesize;
    if ($settings['feeling_lucky']) {
      break;
    }
  }
}
echo '<pre>';print_r($images);exit;
return $images;
}

函数feeds\u imagegrabber\u scrape\u图像（$content，$base\u url，array$options=array（），&$error\u log=array（））{
//合并默认选项。
$options+=数组(
'表达式'=>'//img'，
“getsize”=>TRUE，
“最大图像大小”=>512000，
“超时”=>10，
“最大重定向”=>3，
“感到幸运”=>0，
);
$doc=新的DOMDocument（）；
如果（@$doc->loadXML（$content）==FALSE&&$doc->loadHTML（$content）==FALSE）{
$error_log['code']=-5；
$error_log['error']=“无法解析xml//html内容”；
返回FALSE；
}
$xpath=新的DOMXPath（$doc）；
$hrefs=@$xpath->evaluate（$options['expression']）；//echo'hrefs:'；print_r（$hrefs->length）；退出；
如果（$options['getsize']））{
定时器启动（功能）；
}
$images=array（）；
$imagesize=0；
对于（$i=0；$i<$hrefs->length；$i++）{
$url=$hrefs->item（$i）->getAttribute（'src'）；
如果（！isset（$url）| |空（$url）| |$url==“”）{
持续
}
if（函数_存在（'encode_url'））{
$url=编码url（$url）；
}
$url=url\u至绝对值（$base\u url，$url）；
如果（$url==FALSE）{
持续
}
如果（$options['getsize']））{
如果（$imagesize=feeds\u imagegrabber\u validate\u download\u size（$url，$options['max\u imagesize']，（$options['timeout']-timer\u read（\uuuuuu函数）/1000））！=-1）{
$images[$url]=$imagesize；
如果（$settings['feeling_lucky']））{
打破
}
}
如果（$options['timeout']-timer_read（_函数__）/1000）使用DOMDocument
类
<meta.*property="og:image".*content="(.*)".*\/>

使用DOMDocument
类
<meta.*property="og:image".*content="(.*)".*\/>

如果必须使用正则表达式，则可以使用：
http://www.moneycontrol.com/news_image_files/2013/s/Syrian_diesel_trucks_190.jpg

如果必须使用正则表达式，则可以使用：
http://www.moneycontrol.com/news_image_files/2013/s/Syrian_diesel_trucks_190.jpg

您可以创建一个复杂的正则表达式来解析HTML，也可以使用php XPath://meta[@property='og:image']
，似乎比正则表达式方便得多。@Wrikken我想要这样的东西。我尝试了你的解决方案，它不起作用。我已经用我当前的函数更新了这个问题。请检查。我有疑问。meta标记在head
标记中。有什么问题吗？你可以创建一个复杂的正则表达式来解析HTML，也可以使用php X路径：//meta[@property='og:image']
，似乎比正则表达式方便得多。@Wrikken我想要这样的东西。我尝试了你的解决方案，它不起作用。我用我当前的函数更新了这个问题。请检查。我有疑问。meta标记在头标记内。有什么问题吗？
http://www.moneycontrol.com/news_image_files/2013/s/Syrian_diesel_trucks_190.jpg