Regex 如何仅为每个匹配返回非空捕获组?
我使用这个正则表达式来解析APEX中的CSV行:Regex 如何仅为每个匹配返回非空捕获组?,regex,salesforce,apex-code,Regex,Salesforce,Apex Code,我使用这个正则表达式来解析APEX中的CSV行: Pattern csvPattern = Pattern.compile('(?:^|,)(?:\"([^\"]+|\"\")*\"|([^,]+)*)'); 它工作得很好,但是每个匹配返回两个组(一个用于引用值,一个用于非引用值)。见下文: Matcher csvMatcher = csvPattern.matcher('"hello",world'); Integer m = 1; while (csvMatcher.find()) {
Pattern csvPattern = Pattern.compile('(?:^|,)(?:\"([^\"]+|\"\")*\"|([^,]+)*)');
它工作得很好,但是每个匹配返回两个组(一个用于引用值,一个用于非引用值)。见下文:
Matcher csvMatcher = csvPattern.matcher('"hello",world');
Integer m = 1;
while (csvMatcher.find()) {
System.debug('Match ' + m);
for (Integer i = 1; i <= csvMatcher.groupCount(); i++) {
System.debug('Capture group ' + i + ': ' + csvMatcher.group(i));
}
m++;
}
我希望每个匹配只返回非空捕获。这可能吗?受ruakh的启发,我更新了正则表达式,使其每次匹配只返回一个捕获组(并在字段和空格中处理引号)
这实际上是一件很难做到的事。
这可以通过前向/后向断言来完成。
但不是很直观 它看起来像这样:
(?:^ ^,)(\s*)(?=(?:[^”]+|“”)*“\s*(?:,|$)))((?您的正则表达式有点不完整;请注意,(a)*
之类的东西将只捕获匹配的a
中的一个。因此您需要将([^\”]+\124\\”*
更改为(?:[^\)+\124\)*)
如果您想捕获包含“
的双引号字符串的全部内容(还有一些其他问题;您的正则表达式写得不是很有防御性)另外--您似乎希望这两个变体显示在不同的捕获组中,不是吗?因为引用的变体将需要后期处理才能将“”
转换为“”
,而未引用的变体不会。我问这个问题是因为我想要你的NSHO!谢谢你的想法。我刚开始做这个,所以它们非常有用。哇。我从这个答案中学到了很多东西,而且效果很好。我同意这不太直观,但我会在我的代码中链接到这个答案,以备将来遇到困难时使用。
[5]|DEBUG|Match 1
[7]|DEBUG|Capture group 1: hello
[7]|DEBUG|Capture group 2: null
[5]|DEBUG|Match 2
[7]|DEBUG|Capture group 1: null
[7]|DEBUG|Capture group 2: world
(?:^|[\s]*?,[\s]*)(\"(?:(?:[^\"]+|\"\")*)[^,]*|(?:[^,])*)
$samp = ' "hello " , world",,me,and,th""is, or , "tha""t" ';
$regex = '
(?: ^ | , )
(\s*" (?= (?:[^"]+|"")* " \s*(?:,|$) ) )?
(
(?<=") (?:[^"]+|"")* (?="\s*(?:,|$) )
|
[^,]*
)
';
while ($samp =~ /$regex/xg)
{
print "'$2'\n";
}
'hello '
' world"'
''
'me'
'and'
'th""is'
' or '
'tha""t'
(?: ^ | , ) # Consume comma (or BOL is fine)
( # Capture group 1, capture '"' only if a complete quoted field
\s* # Optional many spaces
"
(?= # Lookahead, check for a valid quoted field, determines if a '"' will be consumed
(?:[^"]+|"")*
"
\s*
(?:,|$)
)
)? # End capt grp 1. 0 or 1 quote
( # Capture group 2, the body of text
(?<=") # If there is a '"' behind us, we have consumed a '"' in capture grp 1, so this is valid
(?:[^"]+|"")*
(?="\s*(?:,|$) )
| # OR,
[^,]* # Just get up to the next ',' This could be incomplete quoted fields
) # End capt grp 2
(?: ^|, )
(?: \s* " (?= ( (?:[^"]+|"")* ) " \s* (?: ,|$ ) ))?
( (?<=") \1 | [^,]* )
(?: ^ | , ) # Consume comma (or BOL is fine)
(?: # Start grouping
\s* # Spaces, then double quote '"' (consumed if valid quoted field)
" #
(?= # Lookahead, nothing consumed (check for valid quoted field)
( # Capture grp 1
(?:[^"]+|"")* # Body of quoted field (stored for later consumption)
) # End capt grp 1
" # Double quote '"'
\s* # Optional spaces
(?: , | $ ) # Comma or EOL
) # End lookahead
)? # End grouping, optionaly matches and consumes '\s*"'
( # Capture group 2, consume FIELD BODY
(?<=") # Lookbehind, if there is a '"' behind us the field is quoted
\1 # Consume capt grp 1
| # OR,
[^,]* # Invalid-quoted or Non-quoted field, get up to the next ','
) # End capt grp 2