在java中,根据以双引号和单引号转义的空格以及前面的空格拆分字符串\
我对正则表达式一无所知。我正在尝试组合一个表达式,该表达式将使用所有不被单引号或双引号包围且前面没有“\”的空格拆分示例字符串 例如: 必须拆分为在java中,根据以双引号和单引号转义的空格以及前面的空格拆分字符串\,java,regex,string,Java,Regex,String,我对正则表达式一无所知。我正在尝试组合一个表达式,该表达式将使用所有不被单引号或双引号包围且前面没有“\”的空格拆分示例字符串 例如: 必须拆分为 He is a "man of his" words\ always 我明白 List<String> matchList = new ArrayList<String>(); Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'"); Matche
He
is
a
"man of his"
words\ always
我明白
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'");
Matcher regexMatcher = regex.matcher(StringToBeMatched);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
List matchList=new ArrayList();
Pattern regex=Pattern.compile(“[^\\s\”]+\“[^\”]*\“[^']*\”);
Matcher regexMatcher=regex.Matcher(StringToBeMatched);
while(regexMatcher.find()){
add(regexMatcher.group());
}
l使用不被单引号或双引号包围的所有空格拆分示例字符串
如果前面有\??的空格,如何合并忽略空格的第三个条件您可以使用此正则表达式:
((["']).*?\2|(?:[^\\ ]+\\\s+)+[^\\ ]+|\S+)
在Java中:
Pattern regex = Pattern.compile (
"(([\"']).*?\\2|(?:[^\\\\ ]+\\\\\\s+)+[^\\\\ ]+|\\S+)" );
说明:
Pattern regex = Pattern.compile (
"(([\"']).*?\\2|(?:[^\\\\ ]+\\\\\\s+)+[^\\\\ ]+|\\S+)" );
此正则表达式用于替换:
([\“'])。\\2
匹配任何带引号的(双或单)字符串(?:[^\\]+\\\s+[^\\]+
,以将任何字符串与转义空格匹配\S+
匹配任何不带空格的单词表示
\
和空格的正则表达式可以类似于\\\s
,其中\
表示\
,而\s
表示任何空格。表示此类正则表达式的字符串需要写成“\“
因为我们需要通过在字符串前面添加另一个\
来转义字符串中的\
所以现在我们可能希望我们的模式能找到
“…”
->“[^”]*”
- 或
'…'
->'[^']*'
- 或非空白字符(
\S
),但也包括前面有\
的空白字符(\\\S)
。这一点有点棘手,因为\S
也会消耗放置在空格前的\
,这会阻止\\\S
被匹配,这就是为什么我们希望正则表达式引擎
- 首次搜索
\\\s
- 以及以后的
\S
因此,我们需要将正则表达式的这部分写成(\\\S\\\\S)+
(\\\S\\\\S)+
(因为正则表达式引擎试图测试和匹配由左到右由或
分隔的条件-例如,在类似正则表达式的情况下,a | ab
ab
将永远不会匹配,因为a
将被正则表达式的左侧部分消耗)
所以你的图案看起来像
Pattern regex = Pattern.compile("\"[^\"]*\"|'[^']*'|(\\\\\\s|\\S)+");
解决方案很好…我特别喜欢他使用S+。我的解决方案在分组中类似,除了在第三个备选分组中捕获单词的开头和结尾边界
正则表达式
对于Java
例子
结果
详细说明
如果输入是他是“他”的“酒吧”词的“人”\ \always
你的意合不匹配??你的期望输出是什么?一定是他,是,他“酒吧”词的“人”\ \always因为他是“他”的酒吧词的“人”\ \always
。请在你的问题上为这个输入添加期望输出。他,是,“他的酒吧”的“男人”\“酒吧”字\n总是你不会碰巧在解析CSV,是吗?谢谢anubhava。你能解释一下这个表达式吗?它也不使用单引号吗?很酷的解决方案…不过回溯有点重。谢谢。所以单引号和双引号都要转义,必须是Pattern regex=Pattern.compile(“(\”)[^\“]*\”[^']*'\\S+?(?:\S+\\S*)+\\S+);对??我很抱歉。我使用了一个蹩脚的手机,在导航到另一个页面时被bi错误拒绝。绝对没有问题@EddieB,批评总是以正确的精神进行的,我不想让正则表达式复杂化,除非它需要处理大量数据。祝你新年快乐。
(?i)((?:(['|"]).+\2)|(?:\w+\\\s\w+)+|\b(?=\w)\w+\b(?!\w))
(?i)((?:(['|\"]).+\\2)|(?:\\w+\\\\\\s\\w+)+|\\b(?=\\w)\\w+\\b(?!\\w))
String subject = "He is a \"man of his\" words\\ always 'and forever'";
Pattern pattern = Pattern.compile( "(?i)((?:(['|\"]).+\\2)|(?:\\w+\\\\\\s\\w+)+|\\b(?=\\w)\\w+\\b(?!\\w))" );
Matcher matcher = pattern.matcher( subject );
while( matcher.find() ) {
System.out.println( matcher.group(0).replaceAll( subject, "$1" ));
}
He
is
a
"man of his"
words\ always
'and forever'
"(?i)" + // Match the remainder of the regex with the options: case insensitive (i)
"(" + // Match the regular expression below and capture its match into backreference number 1
// Match either the regular expression below (attempting the next alternative only if this one fails)
"(?:" + // Match the regular expression below
"(" + // Match the regular expression below and capture its match into backreference number 2
"['|\"]" + // Match a single character present in the list “'|"”
")" +
"." + // Match any single character that is not a line break character
"+" + // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"\\2" + // Match the same text as most recently matched by capturing group number 2
")" +
"|" + // Or match regular expression number 2 below (attempting the next alternative only if this one fails)
"(?:" + // Match the regular expression below
"\\w" + // Match a single character that is a “word character” (letters, digits, etc.)
"+" + // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"\\\\" + // Match the character “\” literally
"\\s" + // Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
"\\w" + // Match a single character that is a “word character” (letters, digits, etc.)
"+" + // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
")+" + // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"|" + // Or match regular expression number 3 below (the entire group fails if this one fails to match)
"\\b" + // Assert position at a word boundary
"(?=" + // Assert that the regex below can be matched, starting at this position (positive lookahead)
"\\w" + // Match a single character that is a “word character” (letters, digits, etc.)
")" +
"\\w" + // Match a single character that is a “word character” (letters, digits, etc.)
"+" + // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"\\b" + // Assert position at a word boundary
"(?!" + // Assert that it is impossible to match the regex below starting at this position (negative lookahead)
"\\w" + // Match a single character that is a “word character” (letters, digits, etc.)
")" +
")"