在java中,根据以双引号和单引号转义的空格以及前面的空格拆分字符串\

在java中,根据以双引号和单引号转义的空格以及前面的空格拆分字符串\,java,regex,string,Java,Regex,String,我对正则表达式一无所知。我正在尝试组合一个表达式,该表达式将使用所有不被单引号或双引号包围且前面没有“\”的空格拆分示例字符串 例如: 必须拆分为 He is a "man of his" words\ always 我明白 List<String> matchList = new ArrayList<String>(); Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'"); Matche

我对正则表达式一无所知。我正在尝试组合一个表达式,该表达式将使用所有不被单引号或双引号包围且前面没有“\”的空格拆分示例字符串

例如:

必须拆分为

He
is 
a 
"man of his"
words\ always
我明白

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'");
Matcher regexMatcher = regex.matcher(StringToBeMatched);
while (regexMatcher.find()) {
    matchList.add(regexMatcher.group());
}
List matchList=new ArrayList();
Pattern regex=Pattern.compile(“[^\\s\”]+\“[^\”]*\“[^']*\”);
Matcher regexMatcher=regex.Matcher(StringToBeMatched);
while(regexMatcher.find()){
add(regexMatcher.group());
}
l使用不被单引号或双引号包围的所有空格拆分示例字符串


如果前面有\??

的空格,如何合并忽略空格的第三个条件您可以使用此正则表达式:

((["']).*?\2|(?:[^\\ ]+\\\s+)+[^\\ ]+|\S+)

在Java中:

Pattern regex = Pattern.compile ( 
"(([\"']).*?\\2|(?:[^\\\\ ]+\\\\\\s+)+[^\\\\ ]+|\\S+)" );
说明:

Pattern regex = Pattern.compile ( 
"(([\"']).*?\\2|(?:[^\\\\ ]+\\\\\\s+)+[^\\\\ ]+|\\S+)" );
此正则表达式用于替换:

  • 第一个匹配
    ([\“'])。\\2
    匹配任何带引号的(双或单)字符串
  • 然后匹配
    (?:[^\\]+\\\s+[^\\]+
    ,以将任何字符串与转义空格匹配
  • 最后使用
    \S+
    匹配任何不带空格的单词

  • 表示
    \
    空格的正则表达式可以类似于
    \\\s
    ,其中
    \
    表示
    \
    ,而
    \s
    表示任何空格。表示此类正则表达式的字符串需要写成
    “\“
    因为我们需要通过在字符串前面添加另一个
    \
    来转义字符串中的
    \

    所以现在我们可能希望我们的模式能找到

    • “…”
      ->
      “[^”]*”
    • '…'
      ->
      '[^']*'
    • 或非空白字符(
      \S
      ),但也包括前面有
      \
      的空白字符(
      \\\S)
      。这一点有点棘手,因为
      \S
      也会消耗放置在空格前的
      \
      ,这会阻止
      \\\S
      被匹配,这就是为什么我们希望正则表达式引擎

      • 首次搜索
        \\\s
      • 以及以后的
        \S
      因此,我们需要将正则表达式的这部分写成
      (\\\S\\\\S)+
      (\\\S\\\\S)+
    (因为正则表达式引擎试图测试和匹配由左到右由
    分隔的条件-例如,在类似正则表达式的情况下,
    a | ab
    ab
    将永远不会匹配,因为
    a
    将被正则表达式的左侧部分消耗)

    所以你的图案看起来像

    Pattern regex = Pattern.compile("\"[^\"]*\"|'[^']*'|(\\\\\\s|\\S)+");
    
    解决方案很好…我特别喜欢他使用S+。我的解决方案在分组中类似,除了在第三个备选分组中捕获单词的开头和结尾边界

    正则表达式 对于Java
    例子
    结果 详细说明
    如果输入是
    他是“他”的“酒吧”词的“人”\ \always
    你的意合不匹配??你的期望输出是什么?一定是他,是,他“酒吧”词的“人”\ \always因为
    他是“他”的酒吧词的“人”\ \always
    。请在你的问题上为这个输入添加期望输出。他,是,“他的酒吧”的“男人”\“酒吧”字\n总是你不会碰巧在解析CSV,是吗?谢谢anubhava。你能解释一下这个表达式吗?它也不使用单引号吗?很酷的解决方案…不过回溯有点重。谢谢。所以单引号和双引号都要转义,必须是Pattern regex=Pattern.compile(“(\”)[^\“]*\”[^']*'\\S+?(?:\S+\\S*)+\\S+);对??我很抱歉。我使用了一个蹩脚的手机,在导航到另一个页面时被bi错误拒绝。绝对没有问题@EddieB,批评总是以正确的精神进行的,我不想让正则表达式复杂化,除非它需要处理大量数据。祝你新年快乐。
    (?i)((?:(['|"]).+\2)|(?:\w+\\\s\w+)+|\b(?=\w)\w+\b(?!\w))
    
    (?i)((?:(['|\"]).+\\2)|(?:\\w+\\\\\\s\\w+)+|\\b(?=\\w)\\w+\\b(?!\\w))
    
    String subject = "He is a \"man of his\" words\\ always 'and forever'";
    Pattern pattern = Pattern.compile( "(?i)((?:(['|\"]).+\\2)|(?:\\w+\\\\\\s\\w+)+|\\b(?=\\w)\\w+\\b(?!\\w))" );
    Matcher matcher = pattern.matcher( subject );
    while( matcher.find() ) {
        System.out.println( matcher.group(0).replaceAll( subject, "$1" ));
    }
    
    He
    is
    a
    "man of his"
    words\ always
    'and forever'
    
    "(?i)" +                 // Match the remainder of the regex with the options: case insensitive (i)
    "(" +                    // Match the regular expression below and capture its match into backreference number 1
                                // Match either the regular expression below (attempting the next alternative only if this one fails)
          "(?:" +                  // Match the regular expression below
             "(" +                    // Match the regular expression below and capture its match into backreference number 2
                "['|\"]" +                // Match a single character present in the list “'|"”
             ")" +
             "." +                    // Match any single character that is not a line break character
                "+" +                    // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
             "\\2" +                   // Match the same text as most recently matched by capturing group number 2
          ")" +
       "|" +                    // Or match regular expression number 2 below (attempting the next alternative only if this one fails)
          "(?:" +                  // Match the regular expression below
             "\\w" +                   // Match a single character that is a “word character” (letters, digits, etc.)
                "+" +                    // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
             "\\\\" +                   // Match the character “\” literally
             "\\s" +                   // Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
             "\\w" +                   // Match a single character that is a “word character” (letters, digits, etc.)
                "+" +                    // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
          ")+" +                   // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
       "|" +                    // Or match regular expression number 3 below (the entire group fails if this one fails to match)
          "\\b" +                   // Assert position at a word boundary
          "(?=" +                  // Assert that the regex below can be matched, starting at this position (positive lookahead)
             "\\w" +                   // Match a single character that is a “word character” (letters, digits, etc.)
          ")" +
          "\\w" +                   // Match a single character that is a “word character” (letters, digits, etc.)
             "+" +                    // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
          "\\b" +                   // Assert position at a word boundary
          "(?!" +                  // Assert that it is impossible to match the regex below starting at this position (negative lookahead)
             "\\w" +                   // Match a single character that is a “word character” (letters, digits, etc.)
          ")" +
    ")"