Php stristr不使用html标记_Php_Html_String_Dom_Html Parsing

Php stristr不使用html标记

php html string dom

Php stristr不使用html标记,php,html,string,dom,html-parsing,Php,Html,String,Dom,Html Parsing,我正在创建一个RSS提要聚合器，通过访问每个链接不仅可以检索文章的描述，还可以检索文章的全部内容。我正在使用stristr从文章中过滤不需要的信息，如facebook、twitter追随者和其他内容。它非常适合一个提要，而不适用于其他提要。这是我的代码： <?php function getcontent($l,$b,$c) { $dom=file_get_html($l); $atitle=$dom->find($b); $content=$dom->

我正在创建一个RSS提要聚合器，通过访问每个链接不仅可以检索文章的描述，还可以检索文章的全部内容。我正在使用stristr从文章中过滤不需要的信息，如facebook、twitter追随者和其他内容。它非常适合一个提要，而不适用于其他提要。这是我的代码：

<?php
function getcontent($l,$b,$c)
{
    $dom=file_get_html($l);
    $atitle=$dom->find($b);
    $content=$dom->find($c);
    $contents=implode(" ",$content);
foreach($atitle as $t)
            {
                echo "<b>".$t."</b>";

            }
            echo "<br /><br />";
        echo $contents;
        echo "<br />";
}
function filtercontent($strip,$l,$b,$c)
{
    $dom=file_get_html($l);
    $atitle=$dom->find($b);
    $content=$dom->find($c);
    $contents=implode(" ",$content);
    $contents=stristr($contents,$strip,true);
    foreach($atitle as $t)
            {
                echo "<b>".$t."</b>";

            }
            echo "<br />";
            echo $contents;
            echo "<br /><br />";

}
ini_set('default_charset', 'UTF-8');
ini_set('max_execution_time',0);
ini_set('memory_limit', -1);
include("simple_html_dom.php");

$url=array("http://www.deccanherald.com/rss/news.rss","http://syndication.indianexpress.com/rss/798/latest-news.xml");

$atitle=NULL;
$content=NULL;
foreach($url as $feed)
{
    $f=$feed;
    $feed=simplexml_load_file($feed);
    //echo $feed;
    if($feed)
    {
        //$feed_title=$feed->channel->title;
        //echo "<br />".$feed_title."<br />";
        $items=$feed->channel->item;
        foreach($items as $item)
        {
            //foreach($keywords as $key)
            //{
            //if(strtolower($item->description)==$key || strtolower($item->title)==$key)
            //{

        $title=$item->title;
        //echo "<h1><b>".$title."</b></h1><br />";
        $link=$item->link;
        //echo "<a href='".$link."'>".$link."</a><br />";
        $des=$item->description;
        //echo "<br />".$des."<br />";


            if($f=="http://beta.thehindu.com/news/?service=rss")
            {
            $title_class=".detail-title";
            $content_class=".body";
            getcontent($link,$title_class,$content_class);

            }
            if($f=="http://in.news.yahoo.com/rss/national/")
            {
            $title_class=".headline";
            $content_class=".yom-art-content";
            getcontent($link,$title_class,$content_class);
            }


        if($f=="http://syndication.indianexpress.com/rss/798/latest-news.xml")
            {

            $link=$link."0";
            $title_class=".headstory";
            $content_class=".contentLeftbigstory";
            $strip='<div class="paginationNew">';
            filtercontent($strip,$link,$title_class,$content_class);

            }
            if($f=="http://www.indiatvnews.com/rssfeed/india_news.xml")
            {

            $title_class=".topstorytitsub";
            $content_class=".standard";
            foreach($link as $post)
            {
                $dom=file_get_html($link);
                $title=$dom->find($title_class);
                $content=$dom->find('div[style=min-height:350px]');
                foreach($title as $t)
                echo "<b>".$t."</b><br />";
                foreach($content as $c)
                {
                    echo $c;

                }

            }


            }
            if($f=="http://beta.thehindu.com/news/?service=rss")
            {
            $title_class=".detail-title";
            $content_class=".body";
            getcontent($link,$title_class,$content_class);

            }
            if($f=="http://www.deccanherald.com/rss/news.rss")
            {
            $title_class=".newsText";
            $content_class=".postedBy";
            $strip='<a href="#top" class="gototop">Go to Top</a>';
            filtercontent($strip,$link,$title_class,$content_class);            
            }


            }
    }
        }


?>

我认为问题在于$content\u class=“.postedBy”。该类中唯一的内容是mystore，2012年9月28日，DHNS:
，它与$strip
不匹配
编辑：
postedBy DIV看起来像：
<div class="postedBy">Mysore, September 28, 2012, DHNS:</div>

Mysore，2012年9月28日，DHNS:

它不包括文章的正文。
代码太多了，你能举一个输入和输出的例子说明你的函数失败了吗？@jeroen:好的，我会添加一个截图。截图帮助不大。请只发布导致问题的部分代码、输出（最终是错误消息）和所需的输出/结果。@Havelock：只需查看filtercontent函数和带有$f==“…”的if循环即可。我还更新了我的屏幕截图。对不起，你错了。div.postedBy没有关闭。因此它一直延伸到整个帖子，而且我也正确地使用它获得了输出。唯一的问题是剪切不需要的内容。无论如何，感谢你的回复。你在开玩笑吗？你为什么添加。它不在源代码中。我只是从你的问题。我保证，我没有添加任何内容。请其他人告诉我这不是我的想象。在包含粘贴网页的文本区域向右滚动，您将看到。很抱歉，这是我错误地添加的。原始帖子中没有关闭标签。
<div class="postedBy">Mysore, September 28, 2012, DHNS:</div>