Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/291.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
PHP DOMDocument saveHTML中断格式_Php_Dom - Fatal编程技术网

PHP DOMDocument saveHTML中断格式

PHP DOMDocument saveHTML中断格式,php,dom,Php,Dom,为什么该代码会: $doc = new DOMDocument(); $doc->loadHTML($this->content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); $imgNodes = $doc->getElementsByTagName('img'); if ($imgNodes->length > 0) { $inlineImage = new Image(); $inlineIm

为什么该代码会:

$doc = new DOMDocument();
$doc->loadHTML($this->content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$imgNodes = $doc->getElementsByTagName('img');

if ($imgNodes->length > 0) {
    $inlineImage = new Image();
    $inlineImage->setPublicDir($publicDirPath);

    foreach ($imgNodes as $imgNode) {
        $inlineImage->setUri($imgNode->getAttribute('src'));
        $inlineImage->setName(basename($inlineImage->getUri()));

        if ($inlineImage->getUri() != $dstPath.$inlineImage->getName()) {
            $inlineImage->move($dstPath);

            $imgNode->setAttribute('src', $dstPath.'/'.$inlineImage->getName());                 
        }
    }

    $this->content = $doc->saveHtml();

}
<p><img alt="fluid cat" src="/images/tmp/fluid-cat.jpg"></p><p><img alt="pandas" src="/images/tmp/pandas.jpg"></p>
<p><img alt="fluid cat" src="/images/full/2016-09/fluid-cat.jpg"><p><img alt="pandas" src="/images/full/2016-09/pandas.jpg"></p></p>
按此代码执行:

$doc = new DOMDocument();
$doc->loadHTML($this->content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$imgNodes = $doc->getElementsByTagName('img');

if ($imgNodes->length > 0) {
    $inlineImage = new Image();
    $inlineImage->setPublicDir($publicDirPath);

    foreach ($imgNodes as $imgNode) {
        $inlineImage->setUri($imgNode->getAttribute('src'));
        $inlineImage->setName(basename($inlineImage->getUri()));

        if ($inlineImage->getUri() != $dstPath.$inlineImage->getName()) {
            $inlineImage->move($dstPath);

            $imgNode->setAttribute('src', $dstPath.'/'.$inlineImage->getName());                 
        }
    }

    $this->content = $doc->saveHtml();

}
<p><img alt="fluid cat" src="/images/tmp/fluid-cat.jpg"></p><p><img alt="pandas" src="/images/tmp/pandas.jpg"></p>
<p><img alt="fluid cat" src="/images/full/2016-09/fluid-cat.jpg"><p><img alt="pandas" src="/images/full/2016-09/pandas.jpg"></p></p>

产生以下代码:

$doc = new DOMDocument();
$doc->loadHTML($this->content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$imgNodes = $doc->getElementsByTagName('img');

if ($imgNodes->length > 0) {
    $inlineImage = new Image();
    $inlineImage->setPublicDir($publicDirPath);

    foreach ($imgNodes as $imgNode) {
        $inlineImage->setUri($imgNode->getAttribute('src'));
        $inlineImage->setName(basename($inlineImage->getUri()));

        if ($inlineImage->getUri() != $dstPath.$inlineImage->getName()) {
            $inlineImage->move($dstPath);

            $imgNode->setAttribute('src', $dstPath.'/'.$inlineImage->getName());                 
        }
    }

    $this->content = $doc->saveHtml();

}
<p><img alt="fluid cat" src="/images/tmp/fluid-cat.jpg"></p><p><img alt="pandas" src="/images/tmp/pandas.jpg"></p>
<p><img alt="fluid cat" src="/images/full/2016-09/fluid-cat.jpg"><p><img alt="pandas" src="/images/full/2016-09/pandas.jpg"></p></p>


为什么要将两个img标记都放在第一个p块中?

您的html示例没有一个包含所有元素的根元素。当LIBXML解析html以构建DOM树时,它假定遇到的第一个标记是根元素。因此,第一个标记

被视为孤立的结束标记(因为它后面有内容),并被自动删除,并在末尾添加一个

,以结束根元素

为了避免在处理html部分(而不是整个html文档)时出现这些自动修复,您需要添加一个伪根元素。最后,要生成结果字符串,需要保存这个伪根元素的每个子节点。例如:

$html = '<p><img alt="fluid cat" src="/images/tmp/fluid-cat.jpg"></p><p><img alt="pandas" src="/images/tmp/pandas.jpg"></p>';

$doc = new DOMDocument;
$doc->loadHTML( '<div>' . $html . '</div>', LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
#               ^-----------------^----- fake root element
$root = $doc->documentElement;

$result = '';

foreach($root->childNodes as $childNode) {
    $result .= $doc->saveHTML($childNode);
}

echo $result;
$html='';
$doc=新文档;
$doc->loadHTML('.$html'',LIBXML_html_NODEFDTD | LIBXML_html_NOIMPLIED);
#^-----------------^-----伪根元素
$root=$doc->documentElement;
$result='';
foreach($root->childNodes作为$childNode){
$result.=$doc->saveHTML($childNode);
}
回声$结果;

因为您的html示例没有根元素。Libxml假定第一个p是根元素,并执行自动修复。它删除“孤立”结束p标记,并将结束标记放在“好位置”,即末尾。要解决此问题,请添加一个伪根元素(
,例如,或删除
LIBXML\u HTML\u NOIMPLIED
)并逐个提取其子节点,以通过串联创建结果字符串。我确信DomDocument会尝试为HTML正确设置格式。尝试在img标记的末尾添加一个
/
,使其自动关闭
loadHTML()
saveHTML()
在实践中是非常糟糕和无用的。考虑使用第三方HTML解析器和自定义HTML代码生成器。