Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/264.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/url/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用PHP和HTMLPurifier、SimpleXmlElement或DOM从XML中筛选段落_Php_Dom_Xpath - Fatal编程技术网

使用PHP和HTMLPurifier、SimpleXmlElement或DOM从XML中筛选段落

使用PHP和HTMLPurifier、SimpleXmlElement或DOM从XML中筛选段落,php,dom,xpath,Php,Dom,Xpath,我试图从这个XML的描述字段中删除社交媒体按钮,只留下段落(它太大了,无法在这里发布) 编辑:由于有些人无法访问XML,请遵循其中一个描述标记的部分内容: <description> <!-- TWITTER https://twitter.com/about/resources/buttons#tweet --> <script> document.write('<a href="https://www.twitter.com/tst_ofi

我试图从这个XML的描述字段中删除社交媒体按钮,只留下段落(它太大了,无法在这里发布)

编辑:由于有些人无法访问XML,请遵循其中一个描述标记的部分内容:

    <description>
 <!-- TWITTER https://twitter.com/about/resources/buttons#tweet --> <script> document.write('<a href="https://www.twitter.com/tst_oficial" class="twitter-follow-button" data-show-count="false" data-lang="pt">Seguir</a>'); !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
 <!-- CURTIR SITE FACEBOOK (Enviar) --> <iframe class="fb_ltr" src="http://www.facebook.com/plugins/like.php?href=https://www.facebook.com/TSTJus&layout=button_count&show_faces=false&action=like&colorscheme=light&width=25&height=25&locale=pt_BR" scrolling="no" frameborder="0" style="border:0px; margin-left:30px; overflow:hidden; width:120px; height:25px;vertical-align:bottom;" allowTransparency="true"></iframe>
 <!-- GOOGLE PLUS +1--> <script type="text/javascript" src="https://apis.google.com/js/plusone.js"></script> 
 <g:plusone size="medium" href="https://plus.google.com/103151838647081346830" style="border-left:-200px"></g:plusone>
 </div> </br></br> 
 <div class="modelo_noticia">
  <div>
   <div style="float: left; width:47%; text-align:center; margin: 0 9px 0 0;"><a href="/image/journal/article?img_id=5733388&t=1377023456174" target="_blank" style="text-decoration:none; color:black;"><img src="/image/journal/article?img_id=5733388&t=1377023456174" style="margin: 0 5px; width:98%;"/><span style="font-style:italic;"></span> </a></div>
   <p> &nbsp;</p>
   <p style="text-align: justify;"> <span style="font-size:12px;">"A CLT continua atual enq...a.</span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">...or.</span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">O min...do".</span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">Ca...as".</span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">Ao enc...izou.</span></p> 
   <p style="text-align: justify;"> <span style="font-size:12px;">Também parti...o.</span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">Ao a...ócio".</span></p> 
   <p style="text-align: justify;"> <span style="font-size:12px;"><strong>Debate: reforma na CLT</strong></span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">O min...s.</span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">Ao...disse.</span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">O m...o o país". &nbsp;&nbsp;</span></p>  <p style="text-align: justify;"> <span style="font-size:12px;">(Fernanda Loureiro)</span></p>
  </div>
  <div style="clear:both;"></div>
 </div>
 <DIV style="vertical-align:bottom !important">
  <!-- FACEBOOK CURTIR --> <!-- <script src="http://connect.facebook.net/pt_BR/all.js#xfbml=1"></script>
  <fb:like layout="button_count" show_faces="true" width="80"></fb:like>-->
  <iframe class="fb_ltr" src="http://www.facebook.com/plugins/like.php?href=http://www.tst.jus.br/noticias/-/asset_publisher/89Dk/content/{rss=true}&layout=button_count&show_faces=false&action=like&colorscheme=light&width=25&height=25&locale=pt_BR" scrolling="no" frameborder="0" style="border:none;border:0;margin-left:0; overflow:hidden; width:95px; height:25px;horizontal-align:left;vertical-align:bottom;" allowTransparency="true"></iframe>
  <!-- TWITTAR --> <span style="margin-left:20px;"> <script type="text/javascript"> var endereco; endereco = window.location.href; document.write('<a href="http://twitter.com/share?url=' + endereco + '" class="twitter-share-button" data-text="Presidente do TST diz que trabalho precisa ser valorizado sem perda de competitividade" data-count="horizontal" data-via="tst_oficial">Tweet</a>') </script><script type="text/javascript" src="http://platform.twitter.com/widgets.js"></script> </span>
  <!-- OK FACEBOOK Recomendar --> <!--<iframe id="f2ee48257c" name="f1f8d54994" frameborder="0" scrolling="no" style="border: none; overflow: hidden; height: 20px; width: 200px;" title="Like this content on Facebook." class="fb_ltr" src="http://www.facebook.com/plugins/like.php?api_key=228619377180035&amp;locale=pt_BR&sdk=joey&channel_url=http://www.facebook.com/TSTJus?fref=ts&version=18%23cb%3Df360a99c9c&origin=http://www.tst.jus.br/noticias&href=http://www.tst.jus.br/noticias%26relation%3Dparent.parent&node_type=link&width=150&font=arial&layout=button_count&colorscheme=light&show_faces=false&send=true&extended_social_context=false&action=recommend" allowTransparency="true"></iframe>-->
  <iframe border="0" frameborder="0" scrolling="no" class="fb_ltr" id="f2ee48257c" name="f1f8d54994" style="border:none;margin-left:0; overflow:hidden; width:200px; height:25px;horizontal-align:left;vertical-align:bottom;" allowTransparency="true" title="Enviar notícia no Facebook" class="fb_ltr" src="http://www.facebook.com/plugins/like.php?api_key=228619377180035&locale=pt_BR&sdk=joey&channel_url=http://www.tst.jus.br/noticias%3Fversion%3D18%23cb%3Df360a99c9c%26origin%3Dhttp://www.tst.jus.br/noticias%26relation%3Dparent.parent&amp;href=http://www.tst.jus.br/noticias&node_type=link&amp;width=150&amp;font=arial&amp;layout=button_count&amp;colorscheme=light&show_faces=false&send=true&amp;extended_social_context=false&action=recommend"></iframe> 
  <!-- YOUTUBE --> <a href="http://www.youtube.com/tst" target="_blank"> <img src="http://www.tst.jus.br/image/image_gallery?uuid=49d1dfeb-fba6-48be-9984-c2ba7dac709e&groupId=10157&t=1359131490760" border="0" title="Inscrição no Canal Youtube do TST" alt="Inscrição no Canal Youtube do TST"></a>
 </DIV> </br>
</description>

文件。写(“”)!函数(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=“//platform.twitter.com/widgets.js”;fjs.parentNode.insertBefore(js,fjs);}(文档,“脚本”,“twitter wjs”);


”一个连续的CLT atual enq…A

…或

O min…do”

Ca…as“

Ao enc…izou.

Também parti…o

Ao a…O cio”

辩论:重新格式化

O最小…s

Ao…dise

O m…O país”

(费尔南达·卢雷罗)

var endereco;endereco=window.location.href;document.write(“”)
我已经尝试过使用正则表达式,但只能得到第一段(
“#]*>(.*)

#isU'
)。使用SimpleXmlElement、DOM,我不断地得到错误(我对它们不太了解,但它们似乎是最好的方法),最后是HTMLPurifier,它过滤所有内容,不返回任何相关内容

下面是我在最后是如何做到这一点的(按照Puggan Se的建议):

$i=0;
$feed='';//此处显示整个XML字符串
$dom=new DOMDocument();//声明DOMDocument
$dom->preserveWhiteSpace=false;//删除空格
$dom->loadXML($feed,LIBXML\u parseging);//对于长xml,LIBXML\u parsegig
$dom->formatOutput=true;//要获得好的输出??
$xml=new-DOMXPath($dom);//声明XPath
$xml->registerNamespace('a','http://purl.org/dc/elements/1.1/“);//从XML获取名称空间
//评估
$source=$xml->evaluate(//channel/title”);
$titles=$xml->evaluate(//item/title”);
$links=$xml->evaluate(//item/link”);
$dates=$xml->evaluate(“//item/dc:date”);
$descriptions=$xml->evaluate(//item/description”);
//回声频道的标题
如果($source->length>0){
$source=$source->item(0)->nodeValue;
echo$source.“

”; } //重复项目 foreach($title作为$title){ echo“{$titles->item($i)->nodeValue}

”; echo“{$links->item($i)->nodeValue}

”; echo“{$dates->item($i)->nodeValue}

”; //仅从中筛选文本 $description=“{$descriptions->item($i)->nodeValue}”; $description=mb_convert_编码($conteudo,'html实体,'utf-8'); unset($domtmp); $domtmp=新的DOMDocument(); $domtmp->loadHTML($description); $xmltmp=newdomxpath($domtmp); $desc=$xmltmp->evaluate(“//p/span”); foreach($desc as$node){ echo“{$node->nodeValue}

”; } $i++; }
你知道我怎样才能改进它吗


非常感谢您的帮助!

是$description XML吗?您能解析它,然后使用xpath获取所有的p,然后只回显每个PP的内容吗?请至少回显您的XML的一个有效片段,我无法访问该链接。对不起,我不知道您为什么无法访问它,但我已编辑了该问题,以包含XML中最重要的部分谢谢你!
$i=0;
$feed= '<XML STRING>'; //The whole XML string here
$dom = new DOMDocument(); //declaring DOMDocument
$dom->preserveWhiteSpace = false; //removing spaces
$dom->loadXML($feed, LIBXML_PARSEHUGE); //LIBXML_PARSEHUGE for long XMLs
$dom->formatOutput = true; // for a nice output ??

$xml = new DOMXPath($dom); //declaring the XPath

$xml->registerNamespace('a','http://purl.org/dc/elements/1.1/'); //getting the namespace from the XML

//evaluates
$source = $xml->evaluate("//channel/title");
$titles = $xml->evaluate("//item/title");
$links = $xml->evaluate("//item/link");
$dates = $xml->evaluate("//item/dc:date");
$descriptions = $xml->evaluate("//item/description");

//echoing channel's title
 if($source->length > 0) {
 $source= $source->item(0)->nodeValue;
 echo $source. '<br /><br />';
 }

//echoing the items
 foreach($titles as $title) {
  echo "{$titles->item($i)->nodeValue}<br /><br />";
  echo "{$links->item($i)->nodeValue}<br /><br />";
  echo "{$dates->item($i)->nodeValue}<br /><br />";
  //filtering only <p><span> text from <description>
  $description = "{$descriptions->item($i)->nodeValue} ";
  $description = mb_convert_encoding($conteudo, 'html-entities', 'utf-8'); 
  unset($domtmp);
  $domtmp = new DOMDocument();
  $domtmp->loadHTML($description );
  $xmltmp = new DOMXPath($domtmp);
  $desc= $xmltmp->evaluate("//p/span");
   foreach($desc as $node) {
    echo "<p>{$node->nodeValue}</p>";
   }
  $i++;
 }