Web scraping 简单的HTMLDOM解析器获取不带标记的文本_Web Scraping_Web Crawler_Simple Html Dom

Web scraping 简单的HTMLDOM解析器获取不带标记的文本

web-scraping web-crawler

Web scraping 简单的HTMLDOM解析器获取不带标记的文本,web-scraping,web-crawler,simple-html-dom,Web Scraping,Web Crawler,Simple Html Dom,我想使用PHP使用简单的HTML DOM解析器解析HTML对象。我想要提取的特定部分没有正确地包装在任何标签中 <li class="tags"> Required text: <span itemprop="testCat"><a href="/topics/new-topic/index.html" title="New Topic" onclick="s_objectID="http://www.example.com/topics/n

我想使用PHP使用简单的HTML DOM解析器解析HTML对象。我想要提取的特定部分没有正确地包装在任何标签中

<li class="tags">
   Required text: <span itemprop="testCat"><a href="/topics/new-topic/index.html" title="New Topic" onclick="s_objectID=&quot;http://www.example.com/topics/new-topic/index.html_1&quot;;return this.s_oc?this.s_oc(e):true">New Topic</a></span>, <span itemprop="testCat"><a href="/topics/new-topic-2/index.html" title="New Topic" onclick="s_objectID=&quot;http://www.example.com/topics/new-topic-2/index.html_1&quot;;return this.s_oc?this.s_oc(e):true">New Topic</a></span>, <span itemprop="testCat"><a href="/topics/new-topic-3/index.html" title="New Topic 3" onclick="s_objectID=&quot;http://www.example.com/topics/new-topic-3/index.html_1&quot;;return this.s_oc?this.s_oc(e):true">New Topic 3</a></span>, 
   <div class="more">
      <a href="javascript: void(0);" class="more-trigger" onclick="s_objectID=&quot;javascript: void(0);_1&quot;;return this.s_oc?this.s_oc(e):true">more</a>
      <div class="more-tags" style="top: 15px; left: 0px; display: none;">
         <div class="hd"></div>
         <div class="bd">
            <ul id="topic-filedin">
               <li>Another Required Text :
                  <a href="/topics/new-topic-4/index.html" onclick="s_objectID=&quot;http://www.example.com/topics/new-topic-4/index.html_1&quot;;return this.s_oc?this.s_oc(e):true">New Topic 4</a>
               </li>
               <li>Topic Intended For :
                  <a href="/topics/for-kids/index.html" onclick="s_objectID=&quot;http://www.example.com/topics/for-kids/index.html_1&quot;;return this.s_oc?this.s_oc(e):true">For Kids</a>
               </li>
            </ul>
         </div>
         <div class="ft"></div>
      </div>
      <script type="text/javascript">
         SNI.Node.ArticleInfo.moreTags();
      </script> 
   </div>
</li>


所需文本：，

另一个所需文本：

主题旨在：


SNI.Node.ArticleInfo.moreTags（）；

我可以使用

$categories = $single_content->find('li[class=tags] span');
foreach ($categories as $key) {
  echo $key->plaintext . '<br>';
}

$categories=$single_content->find（'li[class=tags]span'）；
foreach（$类别作为$key）{
echo$key->纯文本。“
”；
}

我无法找到

所需文本

和

另一个所需文本

和

主题

要获取“另一个所需文本”和“主题”您可以使用此-

$text=$single_content->find（“ui[class=topic filedin]/li”）

$textArray=array（）

foreach（$ta文本）

{

}

您将获得数组中所需的文本

$textArray[] = $ta->plaintext;