Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/263.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Php 从rss源中的html标记提取文本_Php_Xml_Rss - Fatal编程技术网

Php 从rss源中的html标记提取文本

Php 从rss源中的html标记提取文本,php,xml,rss,Php,Xml,Rss,我们有以下rss提要 <title>THIS IS THE TITLE</title> <link>http://www.website.com/....</link> <description> <div class="primary-image"> <img typeof="foaf:Image" src="http://website.com/" alt="Drink driving" title="D

我们有以下rss提要

<title>THIS IS THE TITLE</title>
<link>http://www.website.com/....</link>
<description>
  <div class="primary-image">
   <img typeof="foaf:Image" src="http://website.com/" alt="Drink driving" title="Drink driving" />
  </div>
  <div class="field-group-format group_meta field-group-div group-meta  speed-fast effect-none">
   <span class="field field-name-field-published-date field-type-datetime field-label-hidden">
      <span class="field-item even">
    <span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2014-01-29T17:43:00+00:00">29 Jan, 2014 5:43pm</span>
      </span>
   </span>
   <span class="field field-name-field-author field-type-node-reference field-label-hidden">
      <span class="field-item even"><a href="/authors/joe-finnerty">Joe Finnerty</a></span>
   </span>
  </div>
  <p class="short-desc">TEXT THAT I WANT TO EXTRACT FROM HERE</p>
</description>
$item=array(
之前,在
foreach
循环中,但不起作用

但是没有做好工作。也不是
正在取代


请帮助我,我几天来一直试图找到答案,但都没有找到。

如果我理解正确,您希望从提要中删除标记,以便可以这样尝试:

<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
?>

有关详细信息:

假设您正在将上述HTML内容传递给
$HTML
变量

 $dom = new DOMDocument;
    @$dom->loadHTML($html);
    foreach ($dom->getElementsByTagName('p') as $tag) {
        if ($tag->getAttribute('class') === 'short-desc') {
            echo $tag->nodeValue; //"prints" TEXT THAT I WANT TO EXTRACT FROM HERE
         }
    }
为什么不使用正则表达式呢

$strRegex = '%<p class="short-desc">(.+?)</p>%s';

if (preg_match_all($strRegex, $strContent, $arrMatches))
{
var_dump($arrMatches[1][0]);
}

你只需要文本?或者连同
标记一起?只需要我想从这里从指定的p类标记中提取的
文本。你能不能请你发布基于该类标记的整个脚本,我不知道如何链接它们。我是OOP中的noob,DOMI想说基于该类标记
 $dom = new DOMDocument;
    @$dom->loadHTML($html);
    foreach ($dom->getElementsByTagName('p') as $tag) {
        if ($tag->getAttribute('class') === 'short-desc') {
            echo $tag->nodeValue; //"prints" TEXT THAT I WANT TO EXTRACT FROM HERE
         }
    }
$strRegex = '%<p class="short-desc">(.+?)</p>%s';

if (preg_match_all($strRegex, $strContent, $arrMatches))
{
var_dump($arrMatches[1][0]);
}
$path = 'path/to/file';
$strContent = file_get_contents($path);