Regex 用于提取第一组+;第二组或第一组(仅当无第二组时)(包括变化)
最好的解释方式是准确地展示我希望实现的目标:Regex 用于提取第一组+;第二组或第一组(仅当无第二组时)(包括变化),regex,Regex,最好的解释方式是准确地展示我希望实现的目标: 案例1:“搜索波士顿芬威公园” 摘录:第1组--> “芬威公园”,第二组-->“波士顿” 案例2:“搜索芬威公园” 摘录:第1组-->“芬威 停车场“ 请注意,在这两种情况下,我都希望能够适应“搜索”(“查找”,“查找”,等等)和“在”(“在”,“周围”,等等)的变化。 我尝试了许多不同的变体,但要么在第1组提取了“波士顿芬威公园”,而在第2组中没有提取任何内容,要么如果我正确理解了案例1,案例2将不起作用。这应该对您有用 ^(?:search
- 案例1:
“搜索波士顿芬威公园”
摘录:第1组-->
,第二组-->“芬威公园”
“波士顿”
- 案例2:
“搜索芬威公园”
摘录:第1组-->“芬威 停车场“
“搜索”
(“查找”
,“查找”
,等等)和“在”
(“在”
,“周围”
,等等)的变化。
我尝试了许多不同的变体,但要么在第1组提取了
“波士顿芬威公园”
,而在第2组中没有提取任何内容,要么如果我正确理解了案例1,案例2将不起作用。这应该对您有用
^(?:search for|look for|find)\s*(.*?)(?:\s*(?:in|around|at)\s*(.*))?$
通过向非捕获组添加moer或子句,可以添加更多子句,如查找/in/at
说明:
@"
^ # Assert position at the beginning of a line (at beginning of the string or after a line break character)
(?: # Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
search\ for # Match the characters “search for” literally
| # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
look\ for # Match the characters “look for” literally
| # Or match regular expression number 3 below (the entire group fails if this one fails to match)
find # Match the characters “find” literally
)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 1
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?: # Match the regular expression below
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
(?: # Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
in # Match the characters “in” literally
| # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
around # Match the characters “around” literally
| # Or match regular expression number 3 below (the entire group fails if this one fails to match)
at # Match the characters “at” literally
)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 2
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
$ # Assert position at the end of a line (at the end of the string or before a line break character)
"
这应该对你有用
^(?:search for|look for|find)\s*(.*?)(?:\s*(?:in|around|at)\s*(.*))?$
通过向非捕获组添加moer或子句,可以添加更多子句,如查找/in/at
说明:
@"
^ # Assert position at the beginning of a line (at beginning of the string or after a line break character)
(?: # Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
search\ for # Match the characters “search for” literally
| # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
look\ for # Match the characters “look for” literally
| # Or match regular expression number 3 below (the entire group fails if this one fails to match)
find # Match the characters “find” literally
)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 1
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?: # Match the regular expression below
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
(?: # Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
in # Match the characters “in” literally
| # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
around # Match the characters “around” literally
| # Or match regular expression number 3 below (the entire group fails if this one fails to match)
at # Match the characters “at” literally
)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 2
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
$ # Assert position at the end of a line (at the end of the string or before a line break character)
"
谢谢你的详细回答。不幸的是,我得到了相同的结果:它适用于“搜索波士顿芬威公园”,并正确地分别提取每个组,但不匹配“搜索芬威公园”中的任何内容,因为“in | around | at”组不是可选的。如果我将其设置为可选,则所有内容都在一个组中匹配(不是我想要的)。@BlazingFrog更新了正则表达式。现在检查一下,太棒了!我现在明白了我认为做错了什么:我把这两个组(“on | in”和“boston”)都设置为可选的,但分开设置。您将这两个组都包含在一个组中,并使该组成为可选组,看起来它成功了。再次感谢。如果我能投票两次,我会的!棒极了。我唯一的更改是将两个\s*都更改为\s+——否则您可以匹配“无空格”,并发现“search for Fenway parkin boston”匹配。@谢谢您的评论。我在想你提到的
\s+
,我想知道\s*
可能是一个功能,而不是一个bug,因为它在某种程度上解决了拼写错误;)谢谢你的详细回答。不幸的是,我得到了相同的结果:它适用于“搜索波士顿芬威公园”,并正确地分别提取每个组,但不匹配“搜索芬威公园”中的任何内容,因为“in | around | at”组不是可选的。如果我将其设置为可选,则所有内容都在一个组中匹配(不是我想要的)。@BlazingFrog更新了正则表达式。现在检查一下,太棒了!我现在明白了我认为做错了什么:我把这两个组(“on | in”和“boston”)都设置为可选的,但分开设置。您将这两个组都包含在一个组中,并使该组成为可选组,看起来它成功了。再次感谢。如果我能投票两次,我会的!棒极了。我唯一的更改是将两个\s*都更改为\s+——否则您可以匹配“无空格”,并发现“search for Fenway parkin boston”匹配。@谢谢您的评论。我在想你提到的\s+
,我想知道\s*
可能是一个功能,而不是一个bug,因为它在某种程度上解决了拼写错误;)