Php DOMDocument:：loadHTML（）：警告-HTMLParserEntityRef:实体中没有名称_Php_Warnings_Domdocument

Php DOMDocument:：loadHTML（）：警告-HTMLParserEntityRef:实体中没有名称

php

Php DOMDocument:：loadHTML（）：警告-HTMLParserEntityRef:实体中没有名称,php,warnings,domdocument,Php,Warnings,Domdocument,我发现了几个类似的问题，但到目前为止，没有一个能帮助我我试图在一个HTML块中输出所有图像的“src”，因此我使用DOMDocument（）。这种方法确实有效，但我在一些页面上得到了警告，我不知道为什么。一些帖子建议取消警告，但我更愿意找出产生警告的原因警告：DOMDocument:：loadHTML（）：HTMLParserEntityRef:中没有名称实体，行：10 生成错误的post->post\u content的一个示例是- On Wednesday 21st November

我发现了几个类似的问题，但到目前为止，没有一个能帮助我

我试图在一个HTML块中输出所有图像的“src”，因此我使用

DOMDocument（）

。这种方法确实有效，但我在一些页面上得到了警告，我不知道为什么。一些帖子建议取消警告，但我更愿意找出产生警告的原因

警告：DOMDocument:：loadHTML（）：HTMLParserEntityRef:中没有名称实体，行：10

生成错误的

post->post\u content

的一个示例是-

On Wednesday 21st November specialist rights of way solicitor Jonathan Cheal of Dyne Drewett will be speaking at the Annual Briefing for Rural Practice Surveyors and Agricultural Valuers in Petersfield.
<br>
Jonathan is one of many speakers during the day and he is specifically addressing issues of public rights of way and village greens.
<br>
Other speakers include:-
<br>
<ul>
<li>James Atrrill, Chairman of the Agricultural Valuers Associates of Hants, Wilts and Dorset;</li>
<li>Martin Lowry, Chairman of the RICS Countryside Policies Panel;</li>
<li>Angus Burnett, Director at Martin & Company;</li>
<li>Esther Smith, Partner at Thomas Eggar;</li>
<li>Jeremy Barrell, Barrell Tree Consultancy;</li>
<li>Robin Satow, Chairman of the RICS Surrey Local Association;</li>
<li>James Cooper, Stnsted Oark Foundation;</li>
<li>Fenella Collins, Head of Planning at the CLA; and</li>
<li>Tom Bodley, Partner at Batcheller Monkhouse</li>
</ul>

这个正确答案来自@lonesomeday的评论

我最好的猜测是HTML中的某个地方有一个未替换的符号（&）。这将使解析器认为我们在实体引用中（例如）。当它到达时；，它认为实体结束了。然后它意识到它所拥有的内容与实体不符，因此它发出警告并以纯文本形式返回内容。

只需将字符串中的“&”替换为“and”。对此处提到的所有其他符号执行此操作

您可以使用：

libxml_use_internal_errors(true);

请参见

我没有在上面留下评论所需的声誉，但使用

htmlspecialchars

解决了我的问题：

$inputHTML = htmlspecialchars($post->post_content);
$dom = new DOMDocument();
$dom->loadHTML(apply_filters('the_content', $inputHTML)); // Have tried stripping all tags but <img>, still generates warning
$nodes = $dom->getElementsByTagName('img');
foreach($nodes as $img) :
    $images[] = $img->getAttribute('src');
endforeach;

$inputHTML=htmlspecialchars（$post->post\u内容）；
$dom=新的DOMDocument（）；
$dom->loadHTML（应用过滤器（'the_content'，$inputtml））；//已尝试剥离所有标签，但
出于我的目的，我还使用了strip_标记（$inputHTML，
”）
，因此所有图像标记也都被剥离了-我不确定这是否会是一个问题。
我最终用tidy解决了这个问题
// Configuration
$config = array(
    'indent'         => true,
    'output-xhtml'   => true,
    'wrap'           => 200);

// Tidy to avoid errors during load html
$tidy = new tidy;
$tidy->parseString($bill->bill_text, $config, 'utf8');
$tidy->cleanRepair();

$domDocument = new DOMDocument();
$domDocument->loadHTML(mb_convert_encoding($tidy, 'HTML-ENTITIES', 'UTF-8'));

在HTML代码中的任何地方检查“&”字符。我遇到这个问题是因为那个场景
 对于拉威尔
用{{}代替{！！！！}
我面对这个问题，并设法解决了它。
我发现我的表标签中有一个错误。有一个额外的
显示导致错误的行肯定会使调试更容易。？？？警告出现在DOMDocument:：loadHTML（）上
，因此导致错误的行是dom->loadHTML（apply_filters（'the_content'，$post->post_content））正在解析的内容的第10行…好的，随你。在一个案例中，它是<代码> James Cooper，斯坦斯塔克-奥克基金会；<代码>。我确实认为可能是导致了问题，但将它们全部删除（以前有几个）没有帮助。@DavidGard我最好的猜测是HTML中的某个地方有一个未替换的符号（&
）。这将使解析器认为我们在实体引用中（例如，©；
）。当它到达时，它认为实体已结束。然后它意识到它所拥有的东西与实体不符，所以它发出警告并以纯文本形式返回内容。那么我该如何修复它呢？我不能在整个html字符串上调用htmlentities。@MavWolverine我知道这是很多年后的事了，但我只是插嘴到了同一个问题上。我发现的最简单的选择就是执行字符串替换str_replace（'&'，'&；'，$string）
，因为htmlentities
和htmlspecialcharacters
导致HTML标记的
被转换。现在我100%确信有更好的方法可以做到这一点，但这就解决了我在一个简单的一次性解析工作中所需要的问题。@PanPipes有一点限制：preg\u replace（“/&（？！\S+）/”，“&；”，$string）
。不，这是一个糟糕的建议。和
的使用是为了特定的目的，简单地用和
替换它在大多数情况下并不符合要求。公司名称就是一个明显的例子。加载html的方式如下@$dom->loadHTML（$html）帮助我。这解决了我的问题很好，stackoverflow再次救了我；）欢迎来到StackOverflow。请解释你的代码是如何解决这个问题的。我相信loadHTML方法在处理格式错误的HTML时有困难。使用tidy帮助我解决了这个问题。并将&
替换为&
// Configuration
$config = array(
    'indent'         => true,
    'output-xhtml'   => true,
    'wrap'           => 200);

// Tidy to avoid errors during load html
$tidy = new tidy;
$tidy->parseString($bill->bill_text, $config, 'utf8');
$tidy->cleanRepair();

$domDocument = new DOMDocument();
$domDocument->loadHTML(mb_convert_encoding($tidy, 'HTML-ENTITIES', 'UTF-8'));