使用java和Android提取bbcode quote，但不提取quote标记内的内容_Java_Regex_String_Bbcode

使用java和Android提取bbcode quote，但不提取quote标记内的内容

java regex string

使用java和Android提取bbcode quote，但不提取quote标记内的内容,java,regex,string,bbcode,Java,Regex,String,Bbcode,我将用引号提取bbcode，但在实际输出到来时没有用我想实现bbcode解析模块，以便将引号提取为所需的输出。引号的数量应为递归方法或其他方法 INput : Testing [quote]http://www.yourube.com?watch?v=asasdsadsa [url] aisa [/url] [/quote] Testing Desired Output 测试 [url]aisa[/url] 爱莎测试 Actual Output: http://www.yo

我将用引号提取bbcode，但在实际输出到来时没有用

我想实现bbcode解析模块，以便将引号提取为所需的输出。引号的数量应为递归方法或其他方法

INput : 

Testing [quote]http://www.yourube.com?watch?v=asasdsadsa [url] aisa [/url] [/quote] Testing 

   Desired Output

测试 [url]aisa[/url] 爱莎测试

Actual Output:

http://www.yourube.com?watch?v=asasdsadsa [url] aisa [/url]
http://www.yourube.com?watch?v=asasdsadsa  aisa

下面是我的代码

        String s = "[quote]http://www.yourube.com?watch?v=asasdsadsa [url] aisa [/url][/quote]";
        String t = bbcode(s);
        System.out.println(t);
        String u = bbcode2(t);
        System.out.println(u);

 public static String bbcode(String text) {
        String html = text;

        HashMap<String,String> bbMap = new HashMap<String , String>();


        bbMap.put("\\[quote\\](.+?)\\[/quote\\]", "$1");


        for (Map.Entry entry: bbMap.entrySet()) {
            html = html.replaceAll(entry.getKey().toString(), entry.getValue().toString());
        }

        return html;
    }

       public static String bbcode2(String text) {
        String html = text;

        HashMap<String,String> bbMap = new HashMap<String , String>();



        bbMap.put("\\[quote\\](.+?)\\[/quote\\]", "$1");

        bbMap.put("\\[url\\](.+?)\\[/url\\]", "$1");

        for (Map.Entry entry: bbMap.entrySet()) {
            html = html.replaceAll(entry.getKey().toString(), entry.getValue().toString());
        }

        return html;
    }

String s=“[quote]http://www.yourube.com?watch?v=asasdsadsa [url]aisa[/url][quote]”；
字符串t=bbcode（s）；
系统输出打印ln（t）；
字符串u=bbcode2（t）；
系统输出打印LN（u）；
公共静态字符串bbcode（字符串文本）{
字符串html=文本；
HashMap bbMap=新的HashMap（）；
bbMap.put（“\\[quote\\]（.+？）\\[/quote\\]”，“$1”）；
对于（Map.Entry:bbMap.entrySet（））{
html=html.replaceAll（entry.getKey（）.toString（），entry.getValue（）.toString（））；
}
返回html；
}
公共静态字符串bbcode2（字符串文本）{
字符串html=文本；
HashMap bbMap=新的HashMap（）；
bbMap.put（“\\[quote\\]（.+？）\\[/quote\\]”，“$1”）；
bbMap.put（“\\[url\\]（.+？）\\[/url\\]”，“$1”）；
对于（Map.Entry:bbMap.entrySet（））{
html=html.replaceAll（entry.getKey（）.toString（），entry.getValue（）.toString（））；
}
返回html；
}

不是最整洁的方式，而是一种非注册方式

int lastIndex = 0;
String startString = "[quote]";
String endString = "[/quote]";
int start;
int end;
while (lastIndex != -1) {
   start = string.indexOf(startString, lastIndex);
   lastIndex = start;
   if (lastIndex == -1) {
      break;
   }
   end   = string.indexOf(endString, lastIndex);
   lastIndex = end;
   if (lastIndex == -1) {
      break;
   }
   System.out.println(string.substring(
       start  + startString.length,
       end + 1));
}

这是用于匹配成对BB代码标记的通用Java正则表达式：

\\[([^\\]]+)\\](.+?)\\[/\\1\\]

这将获得顶级匹配，例如在

[a][b]hi[/b]hello[/a][c]yo[/c]

中，第2组将匹配

[b]hi[\b]hello

和

yo

。（）

在我看来，任何正则表达式解决方案都需要使用递归（在正则表达式之外）来查找所有匹配项。您必须找到所有顶级匹配项（将它们添加到某个数组），然后递归地在每个匹配项上使用相同的正则表达式（将它们全部添加到相同的结果数组），直到最终找不到匹配项为止

在该示例中，您可以看到需要在

[b]hi[\b]hello

上再次运行正则表达式，以返回

[b]hi[/b]

的内容，即

hi

例如，对于以下输入：

[A] outer [B] [C] last one left [/C] middle [/B] [/A]  [A] out [B] in [/B] [/A]

首先，针对该字符串运行正则表达式，并查看第2组匹配项：

outer [B] [C] last one left [/C] middle [/B]
out [B] in [/B]

将这些添加到结果数组中，然后针对这些匹配项运行正则表达式并获得：

 [C] last one left [/C] middle
 in

 last one left
 [no matches]

将这些添加到结果数组中，然后再次对这些匹配项运行它，并获得：

 [C] last one left [/C] middle
 in

 last one left
 [no matches]

最后，您将针对剩下的最后一个

运行它，并且不再获得匹配项，所以您就完成了
Raju，如果你不熟悉递归，那么此时停止阅读并尝试自己解决问题对你来说是非常有益的——如果你放弃了，请回来。也就是说……

此问题的Java解决方案是：
public static void getAllMatches(Pattern p, String in, List<String> out) {
  Matcher m = p.matcher(in);           // get matches in input
  while (m.find()) {                   // for each match
    out.add(m.group(2));               // add match to result array
    getAllMatches(p, m.group(2), out); // call function again with match as input
  }
}

你想解析html吗？不，我用“$1”来提取标记中的原始内容。你能再解释一下这个程序应该做什么吗？请阅读输入和所需内容。我将用bbcode解析内容，并提取bbcode包含的所有内容，用于内容管理因此您在bbcode2（）
中遇到的问题是什么？Raju，我已经在我的答案中添加了工作代码。但是，如果您对递归概念一无所知，我强烈建议您首先尝试自己解决这个问题！这是一件非常有用的事情，当涉及到递归正则表达式匹配时，使用递归似乎可以减少很多错误的处理。所以…如果我的消息结尾有一些任意字符串（无论如何不是bbocde结束标记），我必须构造正则表达式以分成两组，一组用于bbcode中嵌入的字符串，另一组用于bbcode标记后的任意字符串？我不确定你的意思，你的最终目标是什么如果要解释的时间太长，可以加上一个例子。非常感谢您以前的帮助。问题已编辑，OGHaza，我现在正在使用（.）[^]]（.）和您用于测试的工具。一旦成功运行，递归REST可以将令牌拆分并添加到输出列表中。考虑将<代码>（[^ \]*）/代码>添加到ReGEX的开始和结束-这将在BBCode之前和之后捕获文本（因为在开始添加一个组，组编号将增加1）。…请注意，对于大多数比赛，开始处的文本将为空-因为它已在上一场比赛结束时捕获。