在java中，根据以双引号和单引号转义的空格以及前面的空格拆分字符串\_Java_Regex_String

在java中，根据以双引号和单引号转义的空格以及前面的空格拆分字符串\

java regex string

在java中，根据以双引号和单引号转义的空格以及前面的空格拆分字符串\,java,regex,string,Java,Regex,String,我对正则表达式一无所知。我正在尝试组合一个表达式，该表达式将使用所有不被单引号或双引号包围且前面没有“\”的空格拆分示例字符串例如：必须拆分为 He is a "man of his" words\ always 我明白 List<String> matchList = new ArrayList<String>(); Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'"); Matche

我对正则表达式一无所知。我正在尝试组合一个表达式，该表达式将使用所有不被单引号或双引号包围且前面没有“\”的空格拆分示例字符串

例如：

必须拆分为

He
is 
a 
"man of his"
words\ always

我明白

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'");
Matcher regexMatcher = regex.matcher(StringToBeMatched);
while (regexMatcher.find()) {
    matchList.add(regexMatcher.group());
}

List matchList=new ArrayList（）；
Pattern regex=Pattern.compile（“[^\\s\”]+\“[^\”]*\“[^']*\”）；
Matcher regexMatcher=regex.Matcher（StringToBeMatched）；
while（regexMatcher.find（））{
add（regexMatcher.group（））；
}

l使用不被单引号或双引号包围的所有空格拆分示例字符串

如果前面有\？？

的空格，如何合并忽略空格的第三个条件您可以使用此正则表达式：

((["']).*?\2|(?:[^\\ ]+\\\s+)+[^\\ ]+|\S+)

在Java中：

Pattern regex = Pattern.compile ( 
"(([\"']).*?\\2|(?:[^\\\\ ]+\\\\\\s+)+[^\\\\ ]+|\\S+)" );

说明：

Pattern regex = Pattern.compile ( 
"(([\"']).*?\\2|(?:[^\\\\ ]+\\\\\\s+)+[^\\\\ ]+|\\S+)" );

此正则表达式用于替换：

第一个匹配

（[\“']）。\\2

匹配任何带引号的（双或单）字符串

然后匹配

（？：[^\\]+\\\s+[^\\]+

，以将任何字符串与转义空格匹配

最后使用

\S+

匹配任何不带空格的单词

表示

和

空格的正则表达式可以类似于\\\s
，其中\
表示\
，而\s
表示任何空格。表示此类正则表达式的字符串需要写成“\“
因为我们需要通过在字符串前面添加另一个\
来转义字符串中的\

所以现在我们可能希望我们的模式能找到

“…”
->“[^”]*”
或'…'
->'[^']*'
或非空白字符（\S
），但也包括前面有\
的空白字符（\\\S）
。这一点有点棘手，因为\S
也会消耗放置在空格前的\
，这会阻止\\\S
被匹配，这就是为什么我们希望正则表达式引擎

首次搜索\\\s
以及以后的\S

因此，我们需要将正则表达式的这部分写成（\\\S\\\\S）+
（\\\S\\\\S）+

（因为正则表达式引擎试图测试和匹配由左到右由
或分隔的条件-例如，在类似正则表达式的情况下，a | ab ab 将永远不会匹配，因为a 将被正则表达式的左侧部分消耗）所以你的图案看起来像 Pattern regex = Pattern.compile("\"[^\"]*\"|'[^']*'|(\\\\\\s|\\S)+"); 解决方案很好…我特别喜欢他使用S+。我的解决方案在分组中类似，除了在第三个备选分组中捕获单词的开头和结尾边界正则表达式对于Java 例子结果详细说明如果输入是他是“他”的“酒吧”词的“人”\ \always 你的意合不匹配？？你的期望输出是什么？一定是他，是，他“酒吧”词的“人”\ \always因为他是“他”的酒吧词的“人”\ \always。请在你的问题上为这个输入添加期望输出。他，是，“他的酒吧”的“男人”\“酒吧”字\n总是你不会碰巧在解析CSV，是吗？谢谢anubhava。你能解释一下这个表达式吗？它也不使用单引号吗？很酷的解决方案…不过回溯有点重。谢谢。所以单引号和双引号都要转义，必须是Pattern regex=Pattern.compile（“（\”）[^\“]*\”[^']*'\\S+？（？：\S+\\S*）+\\S+）；对？？我很抱歉。我使用了一个蹩脚的手机，在导航到另一个页面时被bi错误拒绝。绝对没有问题@EddieB，批评总是以正确的精神进行的，我不想让正则表达式复杂化，除非它需要处理大量数据。祝你新年快乐。 (?i)((?:(['|"]).+\2)|(?:\w+\\\s\w+)+|\b(?=\w)\w+\b(?!\w)) (?i)((?:(['|\"]).+\\2)|(?:\\w+\\\\\\s\\w+)+|\\b(?=\\w)\\w+\\b(?!\\w)) String subject = "He is a \"man of his\" words\\ always 'and forever'"; Pattern pattern = Pattern.compile( "(?i)((?:(['|\"]).+\\2)|(?:\\w+\\\\\\s\\w+)+|\\b(?=\\w)\\w+\\b(?!\\w))" ); Matcher matcher = pattern.matcher( subject ); while( matcher.find() ) { System.out.println( matcher.group(0).replaceAll( subject, "$1" )); } He is a "man of his" words\ always 'and forever' "(?i)" + // Match the remainder of the regex with the options: case insensitive (i) "(" + // Match the regular expression below and capture its match into backreference number 1 // Match either the regular expression below (attempting the next alternative only if this one fails) "(?:" + // Match the regular expression below "(" + // Match the regular expression below and capture its match into backreference number 2 "['|\"]" + // Match a single character present in the list “'|"” ")" + "." + // Match any single character that is not a line break character "+" + // Between one and unlimited times, as many times as possible, giving back as needed (greedy) "\\2" + // Match the same text as most recently matched by capturing group number 2 ")" + "|" + // Or match regular expression number 2 below (attempting the next alternative only if this one fails) "(?:" + // Match the regular expression below "\\w" + // Match a single character that is a “word character” (letters, digits, etc.) "+" + // Between one and unlimited times, as many times as possible, giving back as needed (greedy) "\\\\" + // Match the character “\” literally "\\s" + // Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) "\\w" + // Match a single character that is a “word character” (letters, digits, etc.) "+" + // Between one and unlimited times, as many times as possible, giving back as needed (greedy) ")+" + // Between one and unlimited times, as many times as possible, giving back as needed (greedy) "|" + // Or match regular expression number 3 below (the entire group fails if this one fails to match) "\\b" + // Assert position at a word boundary "(?=" + // Assert that the regex below can be matched, starting at this position (positive lookahead) "\\w" + // Match a single character that is a “word character” (letters, digits, etc.) ")" + "\\w" + // Match a single character that is a “word character” (letters, digits, etc.) "+" + // Between one and unlimited times, as many times as possible, giving back as needed (greedy) "\\b" + // Assert position at a word boundary "(?!" + // Assert that it is impossible to match the regex below starting at this position (negative lookahead) "\\w" + // Match a single character that is a “word character” (letters, digits, etc.) ")" + ")"