Java 正则表达式提取分隔符中的字符串_Java_Regex

Java 正则表达式提取分隔符中的字符串

java regex

Java 正则表达式提取分隔符中的字符串,java,regex,Java,Regex,我试图提取分隔符（本例中为括号）中的字符串出现，但不提取引号（单引号或双引号）中的字符串出现。这是我尝试过的-这个正则表达式获取括号内的所有事件，以及引号内的事件（我不想要引号内的事件）注意：这不是最后的回答，因为我不熟悉JAVA，但我相信它仍然可以转换成JAVA语言就我而言，最简单的方法是用空字符串替换字符串中带引号的部分，然后查找匹配项。希望您对PHP有点熟悉，以下是想法 $str = "Rhyme (Jack) and (Jill) went up the hill on \" (Pe

我试图提取分隔符（本例中为括号）中的字符串出现，但不提取引号（单引号或双引号）中的字符串出现。这是我尝试过的-这个正则表达式获取括号内的所有事件，以及引号内的事件（我不想要引号内的事件）

注意：这不是最后的回答，因为我不熟悉JAVA，但我相信它仍然可以转换成JAVA语言

就我而言，最简单的方法是用空字符串替换字符串中带引号的部分，然后查找匹配项。希望您对PHP有点熟悉，以下是想法

$str = "Rhyme (Jack) and (Jill) went up the hill on \" (Peter's)\" request.";

preg_match_all(
    $pat = '~(?<=\().*?(?=\))~',
    // anything inside parentheses
    preg_replace('~([\'"]).*?\1~','',$str),
    // this replaces quoted strings with ''
    $matches
    // and assigns the result into this variable
);
print_r($matches[0]);
// $matches[0] returns the matches in preg_match_all

// [0] => Jack
// [1] => Jill

$str=“韵（杰克）和（吉尔）应\”（彼得的）\“请求上山。”；
预赛(
$pat='~（？杰克
//[1]=>吉尔

您可以试试

public class RegexMain {
    static final String PATTERN = "\\(([^)]+)\\)|\"[^\"]*\"";
    static final Pattern CONTENT = Pattern.compile(PATTERN);
    /**
     * @param args
     */
    public static void main(String[] args) {
        String testString = "Rhyme (Jack) and (Jill) went up the hill on \"(Peter's)\" request.";
        Matcher match = CONTENT.matcher(testString);
        while(match.find()) {
            if(match.group(1) != null) {
                System.out.println(match.group(1)); // prints Jack, Jill
            }
        }
    }
}

此模式将匹配带引号的字符串和带括号的字符串，但只有带括号的字符串才会将某些内容放入

组（1）

。由于

和

在正则表达式中贪婪，因此它更愿意匹配

（彼得的）

，而不是

（彼得的）

在这种情况下，您可以优雅地使用look-behind和look-ahead操作符来实现所需的功能。这里有一个Python解决方案（我总是使用它在命令行上快速尝试内容），但正则表达式在Java代码中应该是相同的

此正则表达式匹配的内容前面有一个使用正向后看的左括号，后面有一个使用正向前看的右括号。但是，当左括号前面有一个使用负向后看的单引号或双引号时，以及当右括号后面有一个o时，它可以避免这些匹配r使用负前瞻性双引号

In [1]: import re

In [2]: s = "Rhyme (Jack) and (Jill) went up the hill on \"(Peter's)\" request."

In [3]: re.findall(r"""
   ...:     (?<=               # start of positive look-behind
   ...:         (?<!           # start of negative look-behind
   ...:             [\"\']     # avoids matching opening parenthesis preceded by single or double quote
   ...:         )              # end of negative look-behind
   ...:         \(             # matches opening parenthesis
   ...:     )                  # end of positive look-behind
   ...:     \w+ (?: \'\w* )?   # matches whatever your content looks like (configure this yourself)             
   ...:     (?=                # start of positive look-ahead
   ...:         \)             # matches closing parenthesis 
   ...:         (?!            # start of negative look-ahead
   ...:             [\"\']     # avoids matching closing parenthesis succeeded by single or double quote
   ...:         )              # end of negative look-ahead  
   ...:     )                  # end of positive look-ahead
   ...:     """, 
   ...:     s, 
   ...:     flags=re.X)
Out[3]: ['Jack', 'Jill']

[1]中的

：导入re
在[2]：s=“韵（杰克）和（吉尔）根据\（彼得的）\”的请求上山。”
在[3]中：关于findall（r
…：（？如果我是你，我会先用一个空字符串替换带引号的部分，然后查找匹配的组。这不是一个选项吗？+1，我将正则表达式修改为“\（[^）]+）\）\“[^\”]*\“[^']*'”，然后还包括检查单引号中的字符串。此外，作为替代，我们仍然可以使用match.group（0）（只处理以括号开头的字符串）。但是，我会在接受这个答案之前等待，因为我相信应该有一种方法可以直接使用正则表达式（而不必处理组（0）和组（1）-我只是不知道如何处理。不幸的是，我不理解python构造（re.findall）太好了，不能用java来尝试。@Scorpion，正如您在我的示例中看到的，返回字符串中模式的所有非重叠出现。您的java代码在while
循环中基本上实现了相同的使用。我不是java专家，但您可能只需要将所有匹配项添加到列表或其他内容。这取决于您想用什么匹配项。@Scorpion我的解决方案中有一些错误。请查看更正后的版本。现在它可以按照您的预期工作。+1，非常感谢您的研究；但是，如果带引号的文本是“Mr.（Peter's）”，它仍然不适用于完整的用例示例括号前面或后面的文本-但我想这可以自定义以修复行为（？@Scorpion您在问题中没有说引号和括号之间可以有文本。将来，请从一开始就更具体地说明您的用例。但我相信您现在知道如何调整正则表达式以实现所需。可以有单引号和双引号，转换听起来像是一种变通方法。我会ld理想情况下需要一个正则表达式来完成这项工作。@Scorpion有很多可能会打破一种模式。而且由于lookbehing和lookahead不够灵活，无法处理像*
、？
和+这样的特殊字符，因此可能实际上不可能编写一个正则表达式来完成您想要的任务。但我将看着这个问题，我很好奇其他人会想出什么。
In [1]: import re

In [2]: s = "Rhyme (Jack) and (Jill) went up the hill on \"(Peter's)\" request."

In [3]: re.findall(r"""
   ...:     (?<=               # start of positive look-behind
   ...:         (?<!           # start of negative look-behind
   ...:             [\"\']     # avoids matching opening parenthesis preceded by single or double quote
   ...:         )              # end of negative look-behind
   ...:         \(             # matches opening parenthesis
   ...:     )                  # end of positive look-behind
   ...:     \w+ (?: \'\w* )?   # matches whatever your content looks like (configure this yourself)             
   ...:     (?=                # start of positive look-ahead
   ...:         \)             # matches closing parenthesis 
   ...:         (?!            # start of negative look-ahead
   ...:             [\"\']     # avoids matching closing parenthesis succeeded by single or double quote
   ...:         )              # end of negative look-ahead  
   ...:     )                  # end of positive look-ahead
   ...:     """, 
   ...:     s, 
   ...:     flags=re.X)
Out[3]: ['Jack', 'Jill']