PHP：如何使用nl2br（）和HTML净化器保持换行符？_Php_Html_Regex_Line Breaks_Nl2br

PHP：如何使用nl2br（）和HTML净化器保持换行符？

php html regex

PHP：如何使用nl2br（）和HTML净化器保持换行符？,php,html,regex,line-breaks,nl2br,Php,Html,Regex,Line Breaks,Nl2br,问题：当使用处理用户输入的内容时，换行符不会转换为标记考虑以下用户输入的内容： Lorem ipsum dolor sit amet. This is another line. <pre> .my-css-class { color: blue; } </pre> Lorem ipsum: <ul> <li>Lorem</li> <li>Ipsum</li> <li>Dolor<

问题：当使用处理用户输入的内容时，换行符不会转换为

标记

考虑以下用户输入的内容：

Lorem ipsum dolor sit amet.
This is another line.

<pre>
.my-css-class {
    color: blue;
}
</pre>

Lorem ipsum:

<ul>
<li>Lorem</li>
<li>Ipsum</li>
<li>Dolor</li>
</ul>

Dolor sit amet,
MyName

Lorem ipsum:

Lorem
ipsum
多洛

.my-css-class {
    color: blue;  
}

.my-css-class {

    color: blue; 

}

function custom_nl2br($html) {
    $pattern = "/<ul>(.*?)<\/ul>/s";
    preg_match($pattern, $html, $matches);

    $html = nl2br(str_replace($matches[0], '[placeholder]', $html));
    $html = str_replace('[placeholder]',$matches[0], $html);

    return $html;
}

洛雷姆
同侧
多洛

function nl2br_special($string){

    // Step 1: Add <br /> tags for each line-break
    $string = nl2br($string); 

    // Step 2: Remove the actual line-breaks
    $string = str_replace("\n", "", $string);
    $string = str_replace("\r", "", $string);

    // Step 3: Restore the line-breaks that are inside <pre></pre> tags
    if(preg_match_all('/\<pre\>(.*?)\<\/pre\>/', $string, $match)){
        foreach($match as $a){
            foreach($a as $b){
            $string = str_replace('<pre>'.$b.'</pre>', "<pre>".str_replace("<br />", PHP_EOL, $b)."</pre>", $string);
            }
        }
    }

    // Step 4: Removes extra <br /> tags

    // Before <pre> tags
    $string = str_replace("<br /><br /><br /><pre>", '<br /><br /><pre>', $string);
    // After </pre> tags
    $string = str_replace("</pre><br /><br />", '</pre><br />', $string);

    // Arround <ul></ul> tags
    $string = str_replace("<br /><br /><ul>", '<br /><ul>', $string);
    $string = str_replace("</ul><br /><br />", '</ul><br />', $string);
    // Inside <ul> </ul> tags
    $string = str_replace("<ul><br />", '<ul>', $string);
    $string = str_replace("<br /></ul>", '</ul>', $string);

    // Arround <ol></ol> tags
    $string = str_replace("<br /><br /><ol>", '<br /><ol>', $string);
    $string = str_replace("</ol><br /><br />", '</ol><br />', $string);
    // Inside <ol> </ol> tags
    $string = str_replace("<ol><br />", '<ol>', $string);
    $string = str_replace("<br /></ol>", '</ol>', $string);

    // Arround <li></li> tags
    $string = str_replace("<br /><li>", '<li>', $string);
    $string = str_replace("</li><br />", '</li>', $string);

    return $string;
}

// Process line-breaks
$string = nl2br_special($string);

// Initiate HTML Purifier config
$purifier_config = HTMLPurifier_Config::createDefault();
$purifier_config->set('HTML.Allowed', 'p,ul,ol,li,strong,b,em,i,u,a[href],code,pre,blockquote,cite,img[src|alt],br,hr,h3,h4');
//$purifier_config->set('AutoFormat.AutoParagraph', true); // Make sure to NOT use this

// Initiate HTML Purifier
$purifier = new HTMLPurifier($purifier_config);

// Purify the content!
$string = $purifier->purify($string);

子元素中删除所有
标记，除非我们使用更复杂的正则表达式来删除
元素内部但元素外部的标记。但是，在
项中嵌套
又如何呢？要处理所有这些情况，我们必须有一个更复杂的正则表达式
// === Declare functions ===

function nl2br_special($string){

    // Step 1: Add <br /> tags for each line-break
    $string = nl2br($string); 

    // Step 2: Remove the actual line-breaks
    $string = str_replace("\n", "", $string);
    $string = str_replace("\r", "", $string);

    // Step 3: Restore the line-breaks that are inside <pre></pre> tags
    if(preg_match_all('/\<pre\>(.*?)\<\/pre\>/', $string, $match)){
        foreach($match as $a){
            foreach($a as $b){
            $string = str_replace('<pre>'.$b.'</pre>', "<pre>".str_replace("<br />", PHP_EOL, $b)."</pre>", $string);
            }
        }
    }

    // Step 4: Removes extra <br /> tags

    // Before <pre> tags
    $string = str_replace("<br /><br /><br /><pre>", '<br /><br /><pre>', $string);
    // After </pre> tags
    $string = str_replace("</pre><br /><br />", '</pre><br />', $string);

    // Arround <ul></ul> tags
    $string = str_replace("<br /><br /><ul>", '<br /><ul>', $string);
    $string = str_replace("</ul><br /><br />", '</ul><br />', $string);
    // Inside <ul> </ul> tags
    $string = str_replace("<ul><br />", '<ul>', $string);
    $string = str_replace("<br /></ul>", '</ul>', $string);

    // Arround <ol></ol> tags
    $string = str_replace("<br /><br /><ol>", '<br /><ol>', $string);
    $string = str_replace("</ol><br /><br />", '</ol><br />', $string);
    // Inside <ol> </ol> tags
    $string = str_replace("<ol><br />", '<ol>', $string);
    $string = str_replace("<br /></ol>", '</ol>', $string);

    // Arround <li></li> tags
    $string = str_replace("<br /><li>", '<li>', $string);
    $string = str_replace("</li><br />", '</li>', $string);

    return $string;
}


function custom_code_tag_callback($code) {

    return '<code>'.trim(htmlspecialchars($code[1])).'</code>';
}

function custom_pre_tag_callback($code) {

    return '<pre><code>'.trim(htmlspecialchars($code[1])).'</code></pre>';
}



// === Process user's input ===

// Process line-breaks
$string = nl2br_special($string);

// Allow simple <code> or <pre> tags for posting code
$string = preg_replace_callback("/\<code\>(.*?)\<\/code\>/is", 'custom_code_tag_callback', $string);
$string = preg_replace_callback("/\<pre\>(.*?)\<\/pre\>/is", 'custom_pre_tag_callback', $string);


// Initiate HTML Purifier config
$purifier_config = HTMLPurifier_Config::createDefault();
$purifier_config->set('HTML.Allowed', 'p,ul,ol,li,strong,b,em,i,u,a[href],code,pre,blockquote,cite,img[src|alt],br,hr,h3,h4');
$purifier_config->set('AutoFormat.Linkify', true); // Make links clickable
//$purifier_config->set('HTML.TargetBlank', true); // Uncomment if you want links to open new tabs
//$purifier_config->set('AutoFormat.AutoParagraph', true); // Leave this commented as it conflicts with nl2br


// Initiate HTML Purifier
$purifier = new HTMLPurifier($purifier_config);

// Purify the content!
$string = $purifier->purify($string);


如果这是正确的方法，你能帮我处理正则表达式吗
如果这不是正确的方法，我如何解决这个问题？我也对HTML净化器的替代品持开放态度


我已经看过的其他资源：




也许这会有所帮助
<![CDATA[
Place code here
]]>

函数自定义_nl2br（$html）{
$pattern=“/（**？）/s”；
preg_match（$pattern，$html，$matches）；
$html=nl2br（str_replace（$matches[0]，“[placeholder]”，$html））；
$html=str_replace（“[占位符]”，$matches[0]，$html）；
返回$html；
}
使用自定义的nl2br（）
函数可以部分（如果不是完全）解决此问题：
function custom_code_tag_callback($code) {

    return '<code>'.trim(htmlspecialchars($code[1])).'</code>';
}
function custom_pre_tag_callback($code) {

    return '<pre><code>'.trim(htmlspecialchars($code[1])).'</code></pre>';
}

// Don't require HTMLPurifier's CDATA enclosing, instead allow simple <code> or <pre> tags
$string = preg_replace_callback("/\<code\>(.*?)\<\/code\>/is", 'custom_code_tag_callback', $string);
$string = preg_replace_callback("/\<pre\>(.*?)\<\/pre\>/is", 'custom_pre_tag_callback', $string);

就这样

此外，由于允许基本HTML标记最初的目的是，您可能希望允许用户发布代码，特别是HTML代码，这些代码不会被HTML净化器解释/删除
HTML净化器当前允许发布代码，但需要复杂的CDATA标记：
';
}
//不需要HTMLPurifier的CDATA封装，而是允许使用简单的或标记
if（preg_match_all（'/\（.*？\/'），$string，$match））{
foreach（$a匹配）{
foreach（$a为$b）{
$string=str_replace（“”；
}
//==处理用户的输入===
//工艺线中断
$string=nl2br_特殊（$string）；
//允许使用简单的或标记发布代码
$string=preg\u replace\u callback（“/\（.*？\）/is“，“自定义代码\u标记\u callback”，$string）；
$string=preg\u replace\u callback（“/\（.*？\/is）”，'custom\u pre\u tag\u callback'，$string）；
//启动HTML净化器配置
$punizer_config=HTMLPurifier_config:：createDefault（）；
$Purizer|u config->set（'HTML.Allowed'，'p，ul，ol，li，strong，b，em，i，u，a[href]，code，pre，blockquote，cite，img[src|alt]，br，hr，h3，h4'）；
$punizer\u config->set（'AutoFormat.Linkify'，true）；//使链接可点击
//$punizer_config->set（'HTML.TargetBlank'，true）；//如果希望链接打开新选项卡，请取消注释
//$punizer_config->set（'AutoFormat.AutoParagraph'，true）；//保留此注释，因为它与nl2br冲突
//启动HTML净化器
$purizer=新的HTMLPurifier（$purizer\u config）；
//净化内容！
$string=$purizer->purify（$string）；

干杯！
nl2br
在放入HTML上下文时应该在明文上使用。在您的情况下，您已经有HTML。为什么您的HTML没有正确地包含用于换行符的
s呢？因此，如果用户基本上是在编写HTML，那么他也应该在编写
标记。也许他在HT中使用了换行符ML的本意是：使标记更具可读性，而不在文本中实际引入换行符。就我而言，这两种方法都不可能做到。：）你真的需要解析HTML并只在特定的文本节点上应用nl2br
，不包括
元素。你“太疯狂了”：是的。因此使用了标记换行符！因此不删除一些HTML标记，但它对基本文本格式（包括换行符）进行了标记。这就是我的观点：如果您要求用户只编写HTML和HTML，那么这就是折衷。而且您缺少将换行符转换为
标记，您需要解析HTML，以便仅对某些元素进行解析。但在清理HTML之前，您必须这样做，这意味着您可能无法正确解析HTML。这是一个非常棘手的问题。我理解您正试图做的事情，但这非常棘手且容易出错的原因是Markdown&co。从一开始就存在了。因此，这不是一个很好的例子，因为它不起作用。好吧，总结一下我的观点，帮助的方向是：你需要先清理你的HTML，这是很棘手的。HTML净化器似乎是为数不多的，如果不是唯一一个据说能做到这一点的库的话。之后，你应该使用DOM处理器通过HTML并应用nl2br
。如果净化器在默认情况下弄乱了输入中固有的换行符，因此您以后无法执行第二步，则需要