Php 从html中提取文本？_Php - Fatal编程技术网

Php 从html中提取文本？

php

Php 从html中提取文本？,php,Php,我有一个字符串如下 <p> Hello World, this is StackOverflow's question details page</p> strip\u tags（）将去除标记，trim（）应去除空白。不过，我不确定它是否适用于不间断空格。首先，您必须在HTML上调用trim（）来删除空白然后strip\u tags，然后html\u entity\u decode 所以：html_实体_解码（strip_标签（tri

我有一个字符串如下

<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>

strip\u tags（）

将去除标记，

trim（）

应去除空白。不过，我不确定它是否适用于不间断空格。

首先，您必须在HTML上调用trim（）来删除空白

然后

strip\u tags

，然后

html\u entity\u decode

所以：

html_实体_解码（strip_标签（trim（html））
实现这一点的最好和最可靠的方法可能是使用正版（X | HT）ML解析函数，如DOMDocument
类：
<?php

$str = "<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>";

$dom = new DOMDocument;
$dom->loadXML(str_replace('&nbsp;', ' ', $str));

echo trim($dom->firstChild->nodeValue);
// "Hello World, this is StackOverflow's question details pages"

下面的内容对我很有用……不过我不得不在不间断的空格上做一个str\u替换
$string = "<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>";
echo htmlspecialchars_decode(trim(strip_tags(str_replace('&nbsp;', '', $string))), ENT_QUOTES);

$string=“你好，这里是StackOverflow'；的问题详细信息页面”；
echo htmlspecialchars_解码（trim（strip_标签（str_replace（“”，，$string））），ENT_引号）；
什么条件，不要让我们猜测！？正如@jakenoble所说，如果您发布了示例代码、输出和错误，这会有所帮助。如果显示的字符串是整个HTML页面或包含其他标记的更大片段的一部分，请查看添加了我的代码的人，请检查@Gordon这不是一个大的html，我只想用简单的方法来实现：（似乎是一个很长的方法，因为我正在运行一个循环，所以我认为这将是广泛的。是的，这对我来说也很有效。如果没有的解决方案，那么它很好，我们可以使用replace。谢谢你的帮助！
$dom = new DOMDocument;
while ($rs = mysql_fetch_assoc($result)) { // or whatever
    $dom->loadHTML(str_replace('&nbsp;', ' ', $rs['description']));
    $TMP_DESCR = $dom->firstChild->nodeValue;

    // do something with $TMP_DESCR
}

$string = "<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>";
echo htmlspecialchars_decode(trim(strip_tags(str_replace('&nbsp;', '', $string))), ENT_QUOTES);