Php 从标记html获取数据_Php_Html_Simple Html Dom

Php 从标记html获取数据

php html

Php 从标记html获取数据,php,html,simple-html-dom,Php,Html,Simple Html Dom,我想在标记中获取数据，例如： 1A, a ab 2A 但是我想得到数据ab 我试图将注意力集中在标记中： foreach($html->find('b') as $word) { $words = $word->innertext; echo

我想在

标记中获取数据，例如：

<b>
    <sup>1</sup>A, a
</b>
<b> ab </b>
<b><sup>2</sup>A</b>

但是我想得到数据

ab

我试图将注意力集中在

标记中：

foreach($html->find('b') as $word) {
   $words = $word->innertext;
       echo $words.'<br>';}

foreach（$html->find（'b'）as$word）{
$words=$word->innertext；
回显$words。“
”；}

但是当有

标签时，

标签内部的文本也会被打印出来。如何不获取标签中的数据？谢谢

您可以使用方法获取父元素内的文本，即，

并忽略

或其内的任何其他元素。

$('b')
.clone()      //clone the element
.children()   //select all childrens
.remove()     //remove all the children
.end()        //return to the matched element
.text();      //get the text

尝试：

如果没有sup标签，则无法从sup标签中获取数据。如果我没弄错你的问题，你是在尝试逻辑上不可能的事情。除非

和

之间有一些区别，你没有把它们放在你的问题中。你怎么知道，

ab

的哪个部分通常在

sup

-标签中？你想得到a，a，ab？？？@Salim是的，我想得到a，a和ab

$('b')
.clone()      //clone the element
.children()   //select all childrens
.remove()     //remove all the children
.end()        //return to the matched element
.text();      //get the text

<?php
$html = "<b>
            <sup>1</sup>A, a
        </b>
        <b> ab </b>
        <b><sup>2</sup>A</b>";
//remove all html tags without <sup>
$html = strip_tags($html,"<sup>");
//remove <sup> tag with its content
$html = preg_replace('#\<sup>[{\w},\s\d"]+\</sup>#', "", $html);
//remove \t, \n and \r (tabs, newline etc)
$html = str_replace(array("\t","\n","\r"),"",$html);
//also you can remove space from string
$html = str_replace(" ","",$html);
echo $html;
?>