Java 正则表达式和转义和未转义分隔符_Java_Regex_Escaping_Backslash

Java 正则表达式和转义和未转义分隔符

java regex

Java 正则表达式和转义和未转义分隔符,java,regex,escaping,backslash,Java,Regex,Escaping,Backslash,有关我有一根绳子 a\;b\\;c;d 在Java中看起来像什么 String s = "a\\;b\\\\;c;d" 我需要按照以下规则用分号将其拆分：如果分号前面有反斜杠，则不应将其视为分隔符（在a和b之间）如果反斜杠本身被转义，因此不转义为分号，则该分号应为分隔符（在b和c之间）所以，如果分号前面有零个或偶数个反斜杠，则应将其视为分隔符例如上面的例子，我想得到以下字符串（java编译器的双反斜杠）：您可以使用正则表达式 (?:\\.|[^;\\]++)* 要匹配未设分号的

有关

我有一根绳子

a\;b\\;c;d

在Java中看起来像什么

String s = "a\\;b\\\\;c;d"

我需要按照以下规则用分号将其拆分：

如果分号前面有反斜杠，则不应将其视为分隔符（在a和b之间）

如果反斜杠本身被转义，因此不转义为分号，则该分号应为分隔符（在b和c之间）

所以，如果分号前面有零个或偶数个反斜杠，则应将其视为分隔符

例如上面的例子，我想得到以下字符串（java编译器的双反斜杠）：

您可以使用正则表达式

(?:\\.|[^;\\]++)*

要匹配未设分号的分号之间的所有文本，请执行以下操作：

List<String> matchList = new ArrayList<String>();
try {
    Pattern regex = Pattern.compile("(?:\\\\.|[^;\\\\]++)*");
    Matcher regexMatcher = regex.matcher(subjectString);
    while (regexMatcher.find()) {
        matchList.add(regexMatcher.group());
    }

所有格匹配（

++

）对于避免由于嵌套的量词而导致灾难性的回溯非常重要

String[] splitArray = subjectString.split("(?<!(?<!\\\\)\\\\);");

这将处理任何奇数个。当然，如果您有超过4000000个\，它将失败。编辑答案的解释：

// (?<!(?<!\$\\\${0,2000000})\\); // // Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!(?<!\$\\\${0,2000000})\\)» // Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\$\\\${0,2000000})» // Match the character “\” literally «\\» // Match the regular expression below and capture its match into backreference number 1 «(\\\\){0,2000000}» // Between zero and 2000000 times, as many times as possible, giving back as needed (greedy) «{0,2000000}» // Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «{0,2000000}» // Match the character “\” literally «\\» // Match the character “\” literally «\\» // Match the character “\” literally «\\» // Match the character “;” literally «;»

/（？我不相信用任何类型的正则表达式来检测这些情况。我通常会为这些事情做一个简单的循环，我将使用C 来绘制它，因为我上次接触Java ；-）已经很久了 inti，len，state；字符c；对于（len=myString.size（），state=0，i=0；i 的优点是：您可以在每个步骤上执行语义操作把它移植到另一种语言是很容易的您不需要为了这个简单的任务而包含完整的正则表达式库，这增加了可移植性它应该比正则表达式匹配器快得多这种方法假设字符串中没有char'\0' 。如果您这样做，您可以使用其他字符 public static String[] split(String s) { String[] result = s.replaceAll("([^\\\\])\\\\;", "$1\0").split(";"); for (int i = 0; i < result.length; i++) { result[i] = result[i].replaceAll("\0", "\\\\;"); } return result; } 公共静态字符串[]拆分（字符串s）{ 字符串[]result=s.replaceAll（“（[^\\\]）\\\\\；”，“$1\0”）。拆分（“；”； for（int i=0；i我认为这才是真正的答案。在我的例子中，我尝试使用| 进行拆分，转义字符是& final String regx = "(?<!((?:[^&]|^)(&&){0,10000}&))\\|"; String[] res = "&|aa|aa|&|&&&|&&|s||||e|".split(regx); System.out.println(Arrays.toString(res)); final String regx=”（？双反斜杠在哪里？不见了？我不确定这是你想要的正则表达式。我也不确定正则表达式是否是执行此任务的最佳工具。但是你选择忽略我下面的答案；-/对于a\\\\\\；b；c 和其他有两个以上反斜杠的情况，这将失败。有人能解释一下否决票吗？除非我遗漏了一些明显的东西我们？我不知道，不是我。也许嵌套的反向引用有点太复杂了？也不是我，但不要指望我投票。（{0，many} hack是不可信的，因为Java的可变宽度lookbehind支持臭名昭著。但即使在.NET中，我也不会使用这种方法，因为它对lookbehind没有任何限制。像Tim这样的正匹配方法更可读、更可靠、更可移植（所有格量词不是必需的）@Alanmore同意。但这并不意味着解决方案是错误的。它还返回空字符串，所以我得到了[a\；b\\，c，d，] 。除了检查group（）的返回值外，是否有可能以某种方式阻止它？是的，使用a+而不是*，您可以去掉空字符串，但在我的测试中它不会这样做（在RegexBuddy中）。如果您不想要空匹配，请将* 更改为+ ，但这样您也不会得到像a；；b 中那样的“真正”空匹配。是的，真正的空匹配是可以的。仅供参考：当最后一个字段以转义字符“\”结尾，或者输入只是唯一的转义字符时，最后一个转义字符丢失，即“a\”=>[“a”，”“，”“]”。下面的表达式似乎修复了边缘大小写”（？：\\\\（.\$）\[^；\\\\]++）*“ ，但不确定是否创建了另一个。我的表达式（到目前为止）也解决了假空字段，但保留了真空字段是”（？）？ String[] splitArray = subjectString.split("(?<!(?<!\\\$\\\\\\\${0,2000000})\\\\);"); // (?<!(?<!\$\\\${0,2000000})\\); // // Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!(?<!\$\\\${0,2000000})\\)» // Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\$\\\${0,2000000})» // Match the character “\” literally «\\» // Match the regular expression below and capture its match into backreference number 1 «(\\\\){0,2000000}» // Between zero and 2000000 times, as many times as possible, giving back as needed (greedy) «{0,2000000}» // Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «{0,2000000}» // Match the character “\” literally «\\» // Match the character “\” literally «\\» // Match the character “\” literally «\\» // Match the character “;” literally «;» int i, len, state; char c; for (len=myString.size(), state=0, i=0; i < len; i++) { c=myString[i]; if (state == 0) { if (c == '\\') { state++; } else if (c == ';') { printf("; at offset %d", i); } } else { state--; } } public static String[] split(String s) { String[] result = s.replaceAll("([^\\\\])\\\\;", "$1\0").split(";"); for (int i = 0; i < result.length; i++) { result[i] = result[i].replaceAll("\0", "\\\\;"); } return result; } final String regx = "(?<!((?:[^&]|^)(&&){0,10000}&))\\|"; String[] res = "&|aa|aa|&|&&&|&&|s||||e|".split(regx); System.out.println(Arrays.toString(res)); (?<!((?:[^&]|^)(&&){0,10000}&))\\|