Php xpath查询中的元标记内容值是否可信？_Php_Xpath

Php xpath查询中的元标记内容值是否可信？

php xpath

Php xpath查询中的元标记内容值是否可信？,php,xpath,Php,Xpath,我有一个php函数，它使用xpath查询从url提取元标记 e、 g$xpath->query（'/html/head/meta[@name=“my_target”]/@content'）我的问题: 我可以信任返回的值还是应该验证它 =>是否存在任何可能的XSS漏洞 =>在将html内容加载到DOMDocument中之前，是否应该对其进行净化 // Other way to say it with some code : $doc = new DOMDocument; $d

我有一个php函数，它使用xpath查询从url提取元标记

e、 g

$xpath->query（'/html/head/meta[@name=“my_target”]/@content'）

我的问题:

我可以信任返回的值还是应该验证它

=>是否存在任何可能的XSS漏洞

=>在将html内容加载到

DOMDocument

中之前，是否应该对其进行净化

 // Other way to say it with some code :

    $doc = new DOMDocument;
    $doc->preserveWhiteSpace = false;
    libxml_use_internal_errors(true);

    // is
    $doc->loadHTMLFile($url);
    // trustable ??

    // or is
    file_get_contents($url);
    $trust = $purifier->purify($html);
    $doc->loadHTML($trust);
    // a better practice ??

    libxml_use_internal_errors(false);
    $xpath = new DOMXPath($doc);

    $trustable = $xpath->query('/html/head/meta[@name="my_target"]/@content')->item(0) // ?

====更新=========================================

是的，永远不要相信外部来源

使用<代码> $ByCale= HTMLPrimeCARS（$truest->TrimeCype）或<代码> TrasyTAG（$truest->TrimeCype）

> P>如果你从一个你不控制的源中提取HTML内容，那么是的，我会认为这段代码可能很麻烦！p> 您可以使用将任何特殊字符转换为HTML实体。或者，如果你想保持部分加价，你可以使用。另一个选择是使用它，它可以让您更好地控制它的过滤

或者你可以使用像这样的图书馆，但这对你来说可能太多了。这完全取决于您使用的内容类型

现在，要清理元素，首先需要获取XPath结果的字符串表示形式。应用您的筛选，然后将其放回。以下示例应满足您的要求：

<?php
// The following HTML is what you fetch from your remote source:
$html = <<<EOL
<html>
 <body>
    <h1>Foo, bar!</h1>
    <div id="my-target">
        Here is some <strong>text</strong> <script>javascript:alert('some malicious script!');</script> that we want to sanitize.
    </div>
 </body>
</html>
EOL;

// We instantiate a DOCDocument so we can work with it:
$original = new DOMDocument("1.0", 'UTF-8');
$original->formatOutput = true;
$original->loadHTML($html);

$body = $original->getElementsByTagName('body')->item(0);

// Find the element we need using Xpath:
$xpath = new DOMXPath($original);
$divs  = $xpath->query("//body/div[@id='my-target']");

// The XPath query will return DOMElement objects, so create a string that we can manipulate out of it:
$innerHTML  = '';
if (count($divs))
{
    $div = $divs->item(0);

    // Now get the innerHTML for this element
    foreach ($div->childNodes as $child) {
        $innerHTML .= $original->saveXML($child);
    }

    // Remove it from the original document because we want to replace it anyway
    $div->parentNode->removeChild($div);
}

// Sanitize our string by removing all tags except <strong> and the container <div>
$innerHTML = strip_tags($innerHTML, '<strong>');
// or htmlspecialchars() or filter_var or HTML Purifier ..

// Now re-import the sanitized string into a blank DOMDocument
$sanitized = new DOMDocument("1.0", 'UTF-8');
$sanitized->formatOutput = true;
$sanitized->loadXML('<div id="my-target">' . $innerHTML . '</div>');

// Now add the sanitized DOMElement back into the original document as a child of <body>
$body->appendChild($original->importNode($sanitized->documentElement, true));

echo $original->saveHTML();

parentNode->removeChild（$div）；
}
//通过删除除和容器之外的所有标记来清理字符串
$innerHTML=strip_标记（$innerHTML，）；
//或htmlspecialchars（）或filter_var或HTML净化器。。
//现在将经过消毒的字符串重新导入到一个空白文档中
$sanitized=新DOMDocument（“1.0”，“UTF-8”）；
$sanitized->formatOutput=true；
$sanitized->loadXML（'.$innerHTML'.'）；
//现在，将经过消毒的DomeElement作为的子元素添加回原始文档中
$body->appendChild（$original->importNode（$sanitized->documentElement，true））；
echo$original->saveHTML（）；

希望这能有所帮助。
我们不知道您是否信任HTML的源代码。@Quentin这一点我不感谢您的回答。我的示例展示了一种净化我的内容的方法（实际上，
$purizer
是对HTMLPurifier工具的引用）。问题是，这是否是一种更好的做法，只为元标记内容净化html？html文档完全可以安全地放入
DOMdocument
，首先，它不可能运行PHP代码
DOMDocument
只解析字符串。即使存在XSS漏洞，也只会在将其输出到浏览器时执行。对于meta标记的内容，我绝对不会使用像HTMLPurifier这样的解决方案，因为这是一个相当繁重的库。首先使用
DOMdocument
加载它，然后提取元标记内容，并使用
htmlspecialchars
和/或
strip_标记进行清理。