Php HTML净化器：转换<；车身>；至<；部门>；前提_Php_Html Parsing_Htmlpurifier_Html

Php HTML净化器：转换<；车身>；至<；部门>；前提

php html

Php HTML净化器：转换<；车身>；至<；部门>；前提,php,html-parsing,htmlpurifier,html,Php,Html Parsing,Htmlpurifier,Html,我想使用将标记转换为标记，以保留元素上的内联样式，例如Hi here。将转到Hi here。。我在看一个和TagTransform类的组合当前设置在我的配置部分，我目前正在执行以下操作： $htmlDef = $this->configuration->getHTMLDefinition(true); // defining the element to avoid triggering 'Element 'body' is not supported' $bodyElem =

我想使用将

标记转换为

标记，以保留

元素上的内联样式，例如

Hi here。

将转到

Hi here。

。我在看一个和

TagTransform

类的组合

当前设置在我的配置部分，我目前正在执行以下操作：

$htmlDef  = $this->configuration->getHTMLDefinition(true);
// defining the element to avoid triggering 'Element 'body' is not supported'
$bodyElem = $htmlDef->addElement('body', 'Block', 'Flow', 'Core');
$bodyElem->excludes = array('body' => true);
// add the transformation rule
$htmlDef->info_tag_transform['body'] = new HTMLPurifier_TagTransform_Simple('div');

…以及通过配置指令（它们是解析为

HTML.AllowedElements

和

HTML.AllowedAttributes

的工作大列表的一部分）允许

及其

样式

（和

类

，以及

id

）属性

我已经关闭了定义缓存

$config->set('Cache.DefinitionImpl', null);

不幸的是，在这个设置中，

HTMLPurifier\u TagTransform\u Simple

似乎从未调用过它的

transform（）

方法

家长？我认为罪魁祸首是我的

HTML.Parent

，它被设置为

'div'

，因为很自然，

不允许子

元素。但是，将

HTML.Parent

设置为

'HTML'

会使我：

ErrorException:无法将无法识别的元素用作父元素

添加

$htmlElem = $htmlDef->addElement('html', 'Block', 'Flow', 'Core');
$htmlElem->excludes = array('html' => true);

$htmlElem = $htmlDef->addElement('html', 'Block', 'Custom: head?, body', 'Core');
$htmlElem->excludes = array('html' => true);

…删除该错误消息，但仍不转换标记-它将被删除

添加

$htmlElem = $htmlDef->addElement('html', 'Block', 'Flow', 'Core');
$htmlElem->excludes = array('html' => true);

$htmlElem = $htmlDef->addElement('html', 'Block', 'Custom: head?, body', 'Core');
$htmlElem->excludes = array('html' => true);

…也不执行任何操作，因为它会向我发送一条错误消息：

ErrorException: Trying to get property of non-object       

[...]/library/HTMLPurifier/Strategy/FixNesting.php:237
[...]/library/HTMLPurifier/Strategy/Composite.php:18
[...]/library/HTMLPurifier.php:181
[...]

我现在仍然在调整最后一个选项，试图找出我需要提供的确切语法，但是如果有人知道如何根据他们自己过去的经验来帮助我，我将非常感谢任何指向正确方向的指针

HTML.TidyLevel？作为我能想象的唯一的另一个罪魁祸首，我的

HTML.TidyLevel

被设置为

“重”

。我还没有尝试过所有可能的星座，但到目前为止，这没有什么区别

（因为我只是第二次提到这个问题，所以我很难回忆起我已经尝试过的星座，以免我在这里列出它们，但因为我缺乏信心，所以我不会错过我做过的事情或错误地报道一些事情。不过，我可能会在以后做一些专门的测试时编辑此部分！）

完整配置我的配置数据存储在JSON中，然后解析为HTML净化器。以下是文件：

{
    "CSS" : {
        "MaxImgLength" : "800px"
    },
    "Core" : {
        "CollectErrors" : true,
        "HiddenElements" : {
            "script"   : true,
            "style"    : true,
            "iframe"   : true,
            "noframes" : true
        },
        "RemoveInvalidImg" : false
    },
    "Filter" : {
        "ExtractStyleBlocks" : true
    },
    "HTML" : {
        "MaxImgLength" : 800,
        "TidyLevel"    : "heavy",
        "Doctype"      : "XHTML 1.0 Transitional",
        "Parent"       : "html"
    },
    "Output" : {
        "TidyFormat"   : true
    },
    "Test" : {
        "ForceNoIconv" : true
    },
    "URI" : {
        "AllowedSchemes" : {
            "http"     : true,
            "https"    : true,
            "mailto"   : true,
            "ftp"      : true
        },
        "DisableExternalResources" : true
    }
}

（

URI.Base

、

URI.Munge

和

Cache.SerializerPath

也已设置，但我已在此粘贴中删除了它们。另外，

HTML.Parent

警告：如前所述，通常将其设置为

'div'

）

这样做会不会容易得多：

$search = array('<body', 'body>');
$replace = array('<div', 'div>');

$html = '<body style="background:color#000000;">Hi there.</body>';

echo str_replace($search, $replace, $html);

>> '<div style="background:color#000000;">Hi there.</div>';

$search=数组（“”）；
$replace=数组（“”）；
$html='你好'；
echo str_replace（$search，$replace，$html）；
>>“你好。”；

此代码是您正在执行的操作不起作用的原因：

/** * Takes a string of HTML (fragment or document) and returns the content * @todo Consider making protected */ public function extractBody($html) { $matches = array(); $result = preg_match('!<body[^>]*>(.*)</body>!is', $html, $matches); if ($result) { return $matches[1]; } else { return $html; } } /** *获取HTML字符串（片段或文档）并返回内容 *考虑做保护 */ 公共函数提取体（$html）{ $matches=array（）； $result=preg_match（'！]*>（*）！is'，$html，$matches）；如果（$结果）{ 返回$matches[1]； }否则{ 返回$html； } }

您可以使用%Core.ConvertDocumentToFragment作为false将其关闭；如果代码的其余部分没有错误，那么它应该可以直接从那里开始工作。我不认为您的bodyElem定义是必要的。j

在HTML净化器的最终输出上，当我知道没有任何恶意软件在这个过程中幸存下来时，这可能确实是一个选项。然而，在我用一个简单的字符串替换忽略某些内容之前，我宁愿知道我可以依赖这个解决方案；HTML净化器可靠地解析和标记HTML，鉴于我相当确定我忽略的任何内容都是一个小问题，我肯定更愿意使用该解决方案。不过，还是要谢谢你。：）伏击指挥官来营救！谢谢-太棒了，这很有效D为完成起见（如果其他人偶然发现了这一点）：

$bodyElem

定义似乎仍然是必要的。我也有点担心，因为在最后一个片段中出现了

blah

，但随后我想起了我可以将

'head'

添加到

Core.HiddenElements

列表中。现在它就像一个符咒！另一个为完成而提供的快速附加组件：

及其

样式

-属性不需要在标记白名单中，只需要将其转换为的标记（及其属性）就可以了。