Java Regex替换字符串中的所有\n，但不替换[code][/code]标记中的所有_Java_Regex

Java Regex替换字符串中的所有\n，但不替换[code][/code]标记中的所有

java regex

Java Regex替换字符串中的所有\n，但不替换[code][/code]标记中的所有,java,regex,Java,Regex,我需要帮助来替换字符串中的所有\n（新行）字符，而不是[code][/code]标记中的那些字符。我的大脑在燃烧，我自己解决不了这个问题：( 例如： test test test test test test test test [code]some test code [/code] more text 应该是： test test test<br /> test test test<br /> test<br /> test<br />

我需要帮助来替换字符串中
的所有\n（新行）字符，而不是[code][/code]标记中的那些字符。我的大脑在燃烧，我自己解决不了这个问题：(

例如：

test test test
test test test
test
test

[code]some
test
code
[/code]

more text

应该是：

test test test<br />
test test test<br />
test<br />
test<br />
<br />
[code]some
test
code
[/code]<br />
<br />
more text<br />

测试

测试测试

测试

测试



[代码]一些
测试
代码
[/code]



更多文本

谢谢你抽出时间。致以最诚挚的问候。

我建议使用（简单的）解析器，而不是正则表达式。类似这样的东西（错误的伪代码）：

stack元素栈；
foreach（字符串中的字符）{
if（字符中的字符串=“[代码]”）{
elementStack.push（“代码”）；
字符串from char=“”；
}
if（字符中的字符串==“[/code]”）{
elementStack.popTo（“代码”）；
字符串from char=“”；
}
if（char==“\n”&&&！elementStack.contains（“代码”））{
char=“
\n”；
}
}

您已经标记了问题正则表达式，但这可能不是执行此任务的最佳工具

使用基本的编译器构建技术（例如，一个lexer提供一个简单的状态机解析器）可能会更好

您的lexer将识别五个标记：（“[code]”、“\n]”、“[/code]”、EOF、：所有其他字符串：）并且您的状态机如下所示：

state token action ------------------------ begin :none: --> out out [code] OUTPUT(token), --> in out \n OUTPUT(break), OUTPUT(token) out * OUTPUT(token) in [/code] OUTPUT(token), --> out in * OUTPUT(token) * EOF --> end 状态令牌操作 ------------------------ 开始：无：-->输出 out[代码]输出（令牌），-->in 输出\n输出（中断）、输出（令牌）输出*输出（令牌）输入[/code]输出（令牌），-->输出 in*输出（令牌） *EOF-->结束编辑：我看到另一张海报讨论了嵌套块的可能需要。这个状态机不会处理这个问题。对于嵌套块，使用一个递归的像样的解析器（不是很简单，但仍然足够简单和可扩展）

编辑：Axeman指出，这种设计不允许在代码中使用“[/code]”。可以使用转义机制来解决这个问题。类似于将“\”添加到您的令牌并添加：

state token action ------------------------ in \ -->esc-in esc-in * OUTPUT(token), -->in out \ -->esc-out esc-out * OUTPUT(token), -->out 状态令牌操作 ------------------------ in \-->esc in esc in*输出（令牌），-->in 退出\-->退出 esc输出*输出（令牌），-->输出到状态机

支持机器生成的词法分析器和语法分析器的常用参数适用。

要正确使用，您确实需要做三个步骤：

查找[code]块并用唯一的令牌+索引替换它们（保存原始块），例如，“foo[code]abc[/code]bar[code]efg[/code]”变为“foo-token-1 barTOKEN-2”

换新线

扫描转义标记并恢复原始块

代码看起来像：

Matcher m = escapePattern.matcher(input);
while(m.find()) {
    String key = nextKey();
    escaped.put(key,m.group());
    m.appendReplacement(output1,"TOKEN-"+key);
}
m.appendTail(output1);
Matcher m2 = newlinePatten.matcher(output1);
while(m2.find()) {
    m.appendReplacement(output2,newlineReplacement);
}
m2.appendTail(output2);
Matcher m3 = Pattern.compile("TOKEN-(\\d+)").matcher(output2); 
while(m3.find()) {
    m.appendReplacement(finalOutput,escaped.get(m3.group(1)));
}
m.appendTail(finalOutput);

这是一种既快又脏的方法。有更有效的方法（其他人提到了解析器/词法分析器），但除非你正在处理数百万行代码，并且你的代码受CPU限制（而不是像大多数Web应用一样受I/O限制），并且你已经用分析器确认这是瓶颈，否则它们可能不值得

*我没有运行它，这都是从内存中获得的。只需检查，您就可以计算出来。

正如其他海报所提到的，正则表达式不是这项工作的最佳工具，因为它们几乎被普遍实现为贪婪算法。这意味着，即使您尝试使用以下方法匹配代码块：

(\[code\].*\[/code\])

然后表达式将匹配从第一个

[code]

标记到最后一个

[/code]标记的所有内容

tag，这显然不是您想要的。虽然有一些方法可以解决这个问题，但得到的正则表达式通常是脆弱的、不直观的，而且非常难看。类似于下面的python代码的方法会更好

output = []
def add_brs(str):
    return str.replace('\n','<br/>\n')
# the first block will *not* have a matching [/code] tag
blocks = input.split('[code]')
output.push(add_brs(blocks[0]))
# for all the rest of the blocks, only add <br/> tags to
# the segment after the [/code] segment
for block in blocks[1:]:
    if len(block.split('[/code]'))!=1:
        raise ParseException('Too many or few [/code] tags')
    else:
        # the segment in the code block is pre, everything
        # after is post
        pre, post = block.split('[/code]')
        output.push(pre)
        output.push(add_brs(post))
# finally join all the processed segments together
output = "".join(output)

output=[]
def添加brs（str）：
返回str.replace（'\n'，'
\n'）
#第一个块*没有*匹配的[/code]标记
blocks=input.split（“[code]”）
push（添加br（块[0]））
#对于所有其余的块，只需将
标记添加到
#[/code]段后的段
对于块中的块[1:]：
如果len（block.split（'[/code]'）！=1：
raise ParseException（'太多或太少[/code]标记'）
其他：
#代码块中的段是pre，所有
#之后是邮政
pre，post=block.split（“[/code]”）
输出。推送（预）
输出推送（添加brs（post））
#最后将所有处理过的段连接在一起
output=”“.join（输出）

请注意，上面的代码没有经过测试，只是大致了解您需要做什么。

这似乎可以做到：

private final static String PATTERN = "\\*+";

public static void main(String args[]) {
    Pattern p = Pattern.compile("(.*?)(\\[/?code\\])", Pattern.DOTALL);
    String s = "test 1 ** [code]test 2**blah[/code] test3 ** blah [code] test * 4 [code] test 5 * [/code] * test 6[/code] asdf **";
    Matcher m = p.matcher(s);
    StringBuffer sb = new StringBuffer(); // note: it has to be a StringBuffer not a StringBuilder because of the Pattern API
    int codeDepth = 0;
    while (m.find()) {
        if (codeDepth == 0) {
            m.appendReplacement(sb, m.group(1).replaceAll(PATTERN, ""));
        } else {
            m.appendReplacement(sb, m.group(1));
        }
        if (m.group(2).equals("[code]")) {
            codeDepth++;
        } else {
            codeDepth--;
        }
        sb.append(m.group(2));
    }
    if (codeDepth == 0) {
        StringBuffer sb2 = new StringBuffer();
        m.appendTail(sb2);
        sb.append(sb2.toString().replaceAll(PATTERN, ""));
    } else {
        m.appendTail(sb);
    }
    System.out.printf("Original: %s%n", s);
    System.out.printf("Processed: %s%n", sb);
}

这不是一个简单的正则表达式，但我认为你不能用一个简单的正则表达式做你想做的事情。不能处理嵌套元素等等。

这很难，因为如果正则表达式善于发现某些东西，它们就不太擅长匹配除某些东西以外的所有东西……所以你必须使用循环，我怀疑你能否一次性做到这一点。

搜索之后，我发现了一些与cletus的解决方案相近的东西，只是我认为代码块不能嵌套，导致代码更简单：选择适合您需要的代码

import java.util.regex.*;

class Test
{
  static final String testString = "foo\nbar\n[code]\nprint'';\nprint{'c'};\n[/code]\nbar\nfoo";
  static final String replaceString = "<br>\n";
  public static void main(String args[])
  {
    Pattern p = Pattern.compile("(.+?)(\\[code\\].*?\\[/code\\])?", Pattern.DOTALL);
    Matcher m = p.matcher(testString);
    StringBuilder result = new StringBuilder();
    while (m.find()) 
    {
      result.append(m.group(1).replaceAll("\\n", replaceString));
      if (m.group(2) != null)
      {
        result.append(m.group(2));
      }
    }
    System.out.println(result.toString());
  }
}

import java.util.regex.*；
课堂测试
{
静态最终字符串testString=“foo\nbar\n[code]\nprit”“；\nprit{'c'}；\n[/code]\nbar\nfoo”；
静态最终字符串replaceString=“
\n”；
公共静态void main（字符串参数[]）
{
Pattern p=Pattern.compile（（.+？）（\\[code\\].\\[/code\\]）？”，Pattern.DOTALL）；
Matcher m=p.Matcher（testString）；
StringBuilder结果=新建StringBuilder（）；
while（m.find（））
{
result.append（m.group（1.replaceAll）（\\n，replaceString））；
如果（m.group（2）！=null）
{
结果：追加（m组（2））；
}
}
System.out.println（result.toString（））；
}
}

粗略的快速测试，您需要更多（null、空字符串、无代码标记、多个等）。

对于这个用例，显然不会有嵌套的[code]块，因此不情愿的量词会处理这个问题。例如，“[code].*？[\\code]”将在遇到“[/code]”时立即停止

import java.util.regex.*;

class Test
{
  static final String testString = "foo\nbar\n[code]\nprint'';\nprint{'c'};\n[/code]\nbar\nfoo";
  static final String replaceString = "<br>\n";
  public static void main(String args[])
  {
    Pattern p = Pattern.compile("(.+?)(\\[code\\].*?\\[/code\\])?", Pattern.DOTALL);
    Matcher m = p.matcher(testString);
    StringBuilder result = new StringBuilder();
    while (m.find()) 
    {
      result.append(m.group(1).replaceAll("\\n", replaceString));
      if (m.group(2) != null)
      {
        result.append(m.group(2));
      }
    }
    System.out.println(result.toString());
  }
}