Php 警告：DOMDocument:：loadHTML（）：HTMLParserEntityRef:应为''；在实体上，_Php

Php 警告：DOMDocument:：loadHTML（）：HTMLParserEntityRef:应为''；在实体上，

php

Php 警告：DOMDocument:：loadHTML（）：HTMLParserEntityRef:应为''；在实体上，,php,Php,投掷 $html = file_get_contents("http://www.somesite.com/"); $dom = new DOMDocument(); $dom->loadHTML($html); echo $dom; 导致致命错误的原因是没有_toString（）方法，因此无法回送你可能在找 Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, Catchabl

投掷

$html = file_get_contents("http://www.somesite.com/");

$dom = new DOMDocument();
$dom->loadHTML($html);

echo $dom;

导致致命错误的原因是没有_toString（）方法，因此无法回送

你可能在找

Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,
Catchable fatal error: Object of class DOMDocument could not be converted to string in test.php on line 10

有两个错误：第二个错误是因为$dom不是字符串，而是对象，因此无法“回音”。第一个错误是loadHTML发出的警告，这是由于要加载的html文档的语法无效（可能是用作参数分隔符的&（与号），而不是用&作为实体屏蔽）

通过使用错误控制运算符“@”（）调用函数，可以忽略并抑制此错误消息（不是错误，只是消息！）

这是不正确的，请改用此选项：

$dom->@loadHTML($html);

我敢打赌如果你看一下

http://www.somesite.com/

您会发现尚未转换为HTML的特殊字符。也许是这样的：

@$dom->loadHTML($html);

应该是

<a href="/script.php?foo=bar&hello=world">link</a>

无论回声（需要用print\r或var\u dump替换）如何，如果抛出异常，对象应保持为空：

<a href="/script.php?foo=bar&amp;hello=world">link</a>

解决方案

将

recover

设置为true，并将

criterrorchecking

设置为false

DOMNodeList Object
(
)

对标记的内容使用php的实体编码，这是最常见的错误源

要消除警告，可以使用

另一个可能的解决方案是

// create new DOMDocument
$document = new \DOMDocument('1.0', 'UTF-8');

// set error level
$internalErrors = libxml_use_internal_errors(true);

// load HTML
$document->loadHTML($html);

// Restore error level
libxml_use_internal_errors($internalErrors);

替换简单的

$sContent = htmlspecialchars($sHTML);
$oDom = new DOMDocument();
$oDom->loadHTML($sContent);
echo html_entity_decode($oDom->saveHTML());

有了更强大的

$dom->loadHTML($html);

libxml\u使用\u内部错误（true）；
如果（！$DOM->loadHTML（$page））
{
$errors=“”；
foreach（libxml_get_errors（）作为$error）{
$errors.=$error->message.“
”；
}
libxml_clear_errors（）；
打印“libxml错误：
$errors”；
返回；
}

我知道这是一个老问题，但如果您想修复HTML中格式错误的“&”符号。您可以使用类似于以下内容的代码：

libxml_use_internal_errors(true);

if (!$DOM->loadHTML($page))
    {
        $errors="";
        foreach (libxml_get_errors() as $error)  {
            $errors.=$error->message."<br/>";
        }
        libxml_clear_errors();
        print "libxml errors:<br>$errors";
        return;
    }

试试这个

另一个可能的解决方案是，可能您的文件是ASCII类型的文件，只需更改文件的类型。

即使在这之后，我的代码仍然可以正常工作，所以我只是删除了第1行中带有此语句的所有警告消息

$html = file_get_contents("http://www.somesite.com/");

$dom = new DOMDocument();
$dom->loadHTML(htmlspecialchars($html));

echo $dom;

或$dom->strigerrorchecking=false；这是一个糟糕的解决方案，因为您将使这一行的错误成为调试的噩梦@Dewsworld的解决方案要好得多。

用于什么？这是一个非常肮脏的解决方案，无法解决所有问题。虽然您的答案可以解决问题，但行“This is error”本身是不正确的。在第一个解决方案中，您编写了dom而不是doc。这对我有效，我只添加了$content=mb\u convert\u编码（$content，'HTML-ENTITIES'，'UTF-8'）；仅在此基础上进行扩展，如果&character甚至在文本中而不是HTML属性，则仍需要将其转义到&；。解析器抛出错误的原因是，在看到&it；后，它希望终止HTML实体…并进一步扩展，调用

htmlentities（）

或字符串上的类似内容将解决此问题。这将不起作用。根据，所有html特殊字符也将转义。例如，这段html代码

Hello World

。将其运行到

htmlspecialchars

中将生成

spanHello World</span

，它不再是html。DOMDocument:：loadHTML wi我不再把它当作HTML，而是当作字符串。这对我来说很有用：

$oDom=new DOMDocument（）；$oDom->loadHTML（$sHTML）；echo HTML\u entity\u decode（$oDom->saveHTML（））；

$dom->loadHTML($html);

libxml_use_internal_errors(true);

if (!$DOM->loadHTML($page))
    {
        $errors="";
        foreach (libxml_get_errors() as $error)  {
            $errors.=$error->message."<br/>";
        }
        libxml_clear_errors();
        print "libxml errors:<br>$errors";
        return;
    }

$page = file_get_contents('http://www.example.com');
$page = preg_replace('/\s+/', ' ', trim($page));
fixAmps($page, 0);
$dom->loadHTML($page);


function fixAmps(&$html, $offset) {
    $positionAmp = strpos($html, '&', $offset);
    $positionSemiColumn = strpos($html, ';', $positionAmp+1);

    $string = substr($html, $positionAmp, $positionSemiColumn-$positionAmp+1);

    if ($positionAmp !== false) { // If an '&' can be found.
        if ($positionSemiColumn === false) { // If no ';' can be found.
            $html = substr_replace($html, '&amp;', $positionAmp, 1); // Replace straight away.
        } else if (preg_match('/&(#[0-9]+|[A-Z|a-z|0-9]+);/', $string) === 0) { // If a standard escape cannot be found.
            $html = substr_replace($html, '&amp;', $positionAmp, 1); // This mean we need to escape the '&' sign.
            fixAmps($html, $positionAmp+5); // Recursive call from the new position.
        } else {
            fixAmps($html, $positionAmp+1); // Recursive call from the new position.
        }
    }
}

$html = file_get_contents("http://www.somesite.com/");

$dom = new DOMDocument();
$dom->loadHTML(htmlspecialchars($html));

echo $dom;

<?php error_reporting(E_ERROR); ?>