Php 基于DOM的XSS攻击和InnerHTML_Php_Javascript_Dom_Xss

Php 基于DOM的XSS攻击和InnerHTML

php javascript dom

Php 基于DOM的XSS攻击和InnerHTML,php,javascript,dom,xss,Php,Javascript,Dom,Xss,如何保护以下基于DOM的XSS攻击具体来说，是否有一个protect（）函数可以使下面的内容安全？如果没有，那么还有其他解决方案吗？例如：给div一个id，然后给元素分配一个onclick处理程序 <?php function protect() { // For non-DOM XSS attacks, hex-encoding all non-alphanumeric characters // with ASCII values less than 256 work

如何保护以下基于DOM的XSS攻击

具体来说，是否有一个protect（）函数可以使下面的内容安全？如果没有，那么还有其他解决方案吗？例如：给div一个id，然后给元素分配一个onclick处理程序

<?php
function protect()
{
   // For non-DOM XSS attacks, hex-encoding all non-alphanumeric characters
   // with ASCII values less than 256 works (ie: \xHH)
   // But is it possible to augment this function to protect against
   // the below DOM based XSS attack?
}
?>

<body>
  <div id="mydiv"></div>
  <script type="text/javascript">
    var xss = "<?php echo protect($_GET["xss"]) ?>";
    $("#mydiv").html("<div onclick='myfunc(\""+xss+"\")'></div>")
  </script>
</body>


var xss=“”；
$（“#mydiv”）.html（“”）

我希望得到的答案不是“避免使用innerHTML”或“将xss变量正则化为[a-zA-Z0-9]”即：有更通用的解决方案吗

谢谢

我不是PHP专家，但是如果您想防止针对当前格式的代码示例的XSS攻击，并且只需最少的更改，那么可以使用。具体来说，使用来保护

xss

变量的内容，就像它出现在JavaScript上下文中一样。

在Vinet的回复中展开，下面是一组要研究的测试用例：

我一直在研究PHP的DOMDocument和相关类，以期编写一个HTML解析器来处理类似的内容。目前它还处于开发的早期阶段，离实际使用还有很长的路要走，但我的早期实验似乎显示了这个想法的一些前景

基本上，将标记加载到DOMDocument中，然后遍历树。对于树中的每个节点，根据允许的节点类型列表检查节点类型。如果节点类型不在列表中，则会将其从树中删除

您可以使用类似的方法来定位标记中的所有脚本标记并将其删除。如果您可以从提供的标记中提取任何嵌入的脚本，那么基于DOM的XSS将变得无牙

这是我正在使用的代码，以及一个处理StackOverflow主页的测试用例。正如我所说，它远离产品质量代码，只不过是概念的证明。不过，我希望你觉得它有用

<?php
class HtmlClean
{
    private $whiteList      = array (
        '#cdata-section', '#comment', '#text', 'a', 'abbr', 'acronym', 'address', 'b', 
        'big', 'blockquote', 'body', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 
        'dd', 'del', 'dfn', 'div', 'dl', 'dt', 'em', 'fieldset', 'h1', 'h2', 'h3', 'h4', 
        'h5', 'h6', 'head', 'hr', 'html', 'i', 'img', 'ins', 'kbd', 'li', 'link', 'meta', 
        'ol', 'p', 'pre', 'q', 'samp', 'small', 'span', 'strike', 'strong', 'style', 'sub', 
        'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'title', 'tr', 'tt', 'ul', 
        'var'
    );

    private $attrWhiteList  = array (
        'class', 'id', 'title'
    );

    private $dom            = NULL;

    /**
     * Get current tag whitelist
     * @return array
     */
    public function getWhiteListTags ()
    {
        $this -> whiteList  = array_values ($this -> whiteList);
        return ($this -> whiteList);
    }

    /**
     * Add tag to the whitelist
     * @param string $tagName
     */
    public function addWhiteListTag ($tagName)
    {
        $tagName    = strtolower (trin ($tagName));
        if (!in_array ($tagName, $this -> whiteList))
        {
            $this -> whiteList []   = $tagName;
        }
    }

    /**
     * Remove a tag from the whitelist
     * @param string $tagName
     */
    public function removeWhiteListTag ($tagName)
    {
        if ($index = array_search ($tagName, $this -> whiteList))
        {
            unset ($this -> whiteList [$index]);
        }
    }

    /**
     * Load document markup into the class for cleaning
     * @param string $html The markup to clean
     * @return bool
     */
    public function loadHTML ($html)
    {
        if (!$this -> dom)
        {
            $this -> dom    = new DOMDocument();
        }
        $this -> dom -> preserveWhiteSpace  = false;
        $this -> dom -> formatOutput        = true;
        return $this -> dom -> loadHTML ($html);
    }

    public function outputHtml ()
    {
        $ret    = '';
        if ($this -> dom)
        {
            $ret    = $this -> dom -> saveXML ();
        }
        return ($ret);
    }

    private function cleanAttrs (DOMnode $elem)
    {
        $attrs  = $elem -> attributes;
        $index  = $attrs -> length;
        while (--$index >= 0)
        {
            $attrName   = strtolower ($attrs -> item ($indes) -> name);
            if (!in_array ($attrName, $this -> attrWhiteList))
            {
                $elem -> removeAttribute ($attrName);
            }
        }       
    }

    /**
     * Recursivly remove elements from the DOM that aren't whitelisted
     * @param DOMNode $elem
     * @return array List of elements removed from the DOM
     * @throws Exception If removal of a node failed than an exception is thrown
     */
    private function cleanNodes (DOMNode $elem)
    {
        $removed    = array ();
        if (in_array (strtolower ($elem -> nodeName), $this -> whiteList))
        {
            // Remove non-whitelisted attributes
            if ($elem -> hasAttributes ())
            {
                $this -> cleanAttrs ($elem);
            }
            /*
             * Iterate over the element's children. The reason we go backwards is because
             * going forwards will cause indexes to change when elements get removed
             */
            if ($elem -> hasChildNodes ())
            {
                $children   = $elem -> childNodes;
                $index      = $children -> length;
                while (--$index >= 0)
                {
                    $removed = array_merge ($removed, $this -> cleanNodes ($children -> item ($index)));
                }
            }
        }
        else
        {
            // The element is not on the whitelist, so remove it
            if ($elem -> parentNode -> removeChild ($elem))
            {
                $removed [] = $elem;
            }
            else
            {
                throw new Exception ('Failed to remove node from DOM');
            }
        }
        return ($removed);
    }

    /**
     * Perform the cleaning of the document
     */
    public function clean ()
    {
        $removed    = $this -> cleanNodes ($this -> dom -> getElementsByTagName ('html') -> item (0));
        return ($removed);
    }
}

$test       = file_get_contents( ('http://www.stackoverflow.com/'));
// Windows-stype linebreaks really foul up the works. There's probably a better fix for this
$test       = str_replace (chr (13), '', $test);

$cleaner    = new HtmlClean ();
$cleaner -> loadHTML ($test);

echo ('<h1>Before</h1><pre>' . htmlspecialchars ($cleaner -> outputHtml ()) . '</pre>');

$start      = microtime (true);
$removed    = $cleaner -> clean ();
$cleanTime  = microtime (true) - $start;

echo ('<h1>Removed tag list</h1>');
foreach ($removed as $elem)
{
    var_dump ($elem -> nodeName);
}

echo ('<h1>After</h1><pre>' . htmlspecialchars ($cleaner -> outputHtml ()) . '</pre>');

// benchmark
var_dump ($cleanTime);
?>

属性；
$index=$attrs->length；
而（-$index>=0）
{
$attrName=strtolower（$attrs->item（$indes）->name）；
如果（！在数组中（$attrName，$this->attrWhiteList））
{
$elem->removeAttribute（$attrName）；
}
}       
}
/**
*递归地从DOM中删除未列入白名单的元素
*@param DOMNode$elem
*@return数组从DOM中删除的元素列表
*@如果删除节点失败，则引发异常
*/
私有函数cleanNodes（DOMNode$elem）
{
$removed=array（）；
if（在数组中（strtolower（$elem->nodeName），$this->whiteList））
{
//删除非白名单属性
如果（$elem->hasAttributes（））
{
$this->cleanAttrs（$elem）；
}
/*
*迭代元素的子元素。我们之所以后退是因为
*向前移动将导致删除元素时索引发生更改
*/
如果（$elem->hasChildNodes（））
{
$children=$elem->childNodes；
$index=$children->length；
而（-$index>=0）
{
$removed=array_merge（$removed，$this->cleanNodes（$children->item（$index））；
}
}
}
其他的
{
//该元素不在白名单上，请将其删除
if（$elem->parentNode->removeChild（$elem））
{
$removed[]=$elem；
}
其他的
{
抛出新异常（“未能从DOM中删除节点”）；
}
}
返回（已删除）；
}
/**
*执行文档的清理
*/
公共功能清洁（）
{
$removed=$this->cleanNodes（$this->dom->getElementsByTagName（'html'）->item（0））；
返回（已删除）；
}
}
$test=文件获取内容（（'http://www.stackoverflow.com/'));
//Windows stype断线真的把工作搞砸了。可能有更好的解决办法
$test=str_替换（chr（13），''$test）；
$cleaner=newhtmlclean（）；
$cleaner->loadHTML（$test）；
echo（'Before'.htmlspecialchars（$cleaner->outputHtml（））。'）；
$start=microtime（真）；
$removed=$cleaner->clean（）；
$cleanTime=微时间（真）-$start；
echo（‘删除的标签列表’）；
foreach（$删除为$elem）
{
变量转储（$elem->nodeName）；
}
echo（'After'.htmlspecialchars（$cleaner->outputHtml（））。'）；
//基准
var_dump（清洁时间）；
?>

感谢您的回复。我试过那个图书馆，它把我的申请书弄得一塌糊涂。该库是彻底的，但不能在生产环境中使用，因为它像糖蜜一样慢。此外，它没有解决上述场景。它提供了javascript、html和url编码功能，但并没有真正告诉您如何使用它们。我读过他们的文章，但没有明确提到上述情况——至少我不清楚。好吧，真倒霉。我对Java版本的体验要好得多。不过，我们不能保证使用PHP。顺便说一句，这不是基于DOM的XSS。您会发现关于JavaScript转义的说明更加贴切。就ESAPI的性能问题而言，我认为这可能不是编码例程造成的，因为经过良好优化的编码例程将使应用程序的速度降低约1-5%。您可能需要重构ESAPI代码以满足您的需要。您看过ESAPI代码了吗？它绝对是个臃肿的软件，坦率地说是可怕的——没有办法轻松地“优化”它。关于XSS预防备忘单——是的，我已经阅读了3遍，以及基于DOM的XSS常见问题解答，但它似乎没有直接解决我的问题。最后，这是一个基于DOM的XSS案例，因为它使用innerHTML.L