Regex 用于提取第一组+;第二组或第一组(仅当无第二组时)(包括变化)

Regex 用于提取第一组+;第二组或第一组(仅当无第二组时)(包括变化),regex,Regex,最好的解释方式是准确地展示我希望实现的目标: 案例1:“搜索波士顿芬威公园” 摘录:第1组--> “芬威公园”,第二组-->“波士顿” 案例2:“搜索芬威公园” 摘录:第1组-->“芬威 停车场“ 请注意,在这两种情况下,我都希望能够适应“搜索”(“查找”,“查找”,等等)和“在”(“在”,“周围”,等等)的变化。 我尝试了许多不同的变体,但要么在第1组提取了“波士顿芬威公园”,而在第2组中没有提取任何内容,要么如果我正确理解了案例1,案例2将不起作用。这应该对您有用 ^(?:search

最好的解释方式是准确地展示我希望实现的目标:

  • 案例1:
    “搜索波士顿芬威公园”

    摘录:第1组-->
    “芬威公园”
    ,第二组-->
    “波士顿”

  • 案例2:
    “搜索芬威公园”

    摘录:第1组-->
    “芬威
    停车场“

请注意,在这两种情况下,我都希望能够适应
“搜索”
“查找”
“查找”
,等等)和
“在”
“在”
“周围”
,等等)的变化。


我尝试了许多不同的变体,但要么在第1组提取了
“波士顿芬威公园”
,而在第2组中没有提取任何内容,要么如果我正确理解了案例1,案例2将不起作用。

这应该对您有用

^(?:search for|look for|find)\s*(.*?)(?:\s*(?:in|around|at)\s*(.*))?$
通过向非捕获组添加moer子句,可以添加更多子句,如
查找/in/at

说明:

@"
^                   # Assert position at the beginning of a line (at beginning of the string or after a line break character)
(?:                 # Match the regular expression below
                       # Match either the regular expression below (attempting the next alternative only if this one fails)
      search\ for         # Match the characters “search for” literally
   |                   # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
      look\ for           # Match the characters “look for” literally
   |                   # Or match regular expression number 3 below (the entire group fails if this one fails to match)
      find                # Match the characters “find” literally
)
\s                  # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
(                   # Match the regular expression below and capture its match into backreference number 1
   .                   # Match any single character that is not a line break character
      *?                  # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?:                 # Match the regular expression below
   \s                  # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
      *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   (?:                 # Match the regular expression below
                          # Match either the regular expression below (attempting the next alternative only if this one fails)
         in                  # Match the characters “in” literally
      |                   # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
         around              # Match the characters “around” literally
      |                   # Or match regular expression number 3 below (the entire group fails if this one fails to match)
         at                  # Match the characters “at” literally
   )
   \s                  # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
      *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   (                   # Match the regular expression below and capture its match into backreference number 2
      .                   # Match any single character that is not a line break character
         *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   )
)?                  # Between zero and one times, as many times as possible, giving back as needed (greedy)
$                   # Assert position at the end of a line (at the end of the string or before a line break character)
"

这应该对你有用

^(?:search for|look for|find)\s*(.*?)(?:\s*(?:in|around|at)\s*(.*))?$
通过向非捕获组添加moer子句,可以添加更多子句,如
查找/in/at

说明:

@"
^                   # Assert position at the beginning of a line (at beginning of the string or after a line break character)
(?:                 # Match the regular expression below
                       # Match either the regular expression below (attempting the next alternative only if this one fails)
      search\ for         # Match the characters “search for” literally
   |                   # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
      look\ for           # Match the characters “look for” literally
   |                   # Or match regular expression number 3 below (the entire group fails if this one fails to match)
      find                # Match the characters “find” literally
)
\s                  # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
(                   # Match the regular expression below and capture its match into backreference number 1
   .                   # Match any single character that is not a line break character
      *?                  # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?:                 # Match the regular expression below
   \s                  # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
      *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   (?:                 # Match the regular expression below
                          # Match either the regular expression below (attempting the next alternative only if this one fails)
         in                  # Match the characters “in” literally
      |                   # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
         around              # Match the characters “around” literally
      |                   # Or match regular expression number 3 below (the entire group fails if this one fails to match)
         at                  # Match the characters “at” literally
   )
   \s                  # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
      *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   (                   # Match the regular expression below and capture its match into backreference number 2
      .                   # Match any single character that is not a line break character
         *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   )
)?                  # Between zero and one times, as many times as possible, giving back as needed (greedy)
$                   # Assert position at the end of a line (at the end of the string or before a line break character)
"

谢谢你的详细回答。不幸的是,我得到了相同的结果:它适用于“搜索波士顿芬威公园”,并正确地分别提取每个组,但不匹配“搜索芬威公园”中的任何内容,因为“in | around | at”组不是可选的。如果我将其设置为可选,则所有内容都在一个组中匹配(不是我想要的)。@BlazingFrog更新了正则表达式。现在检查一下,太棒了!我现在明白了我认为做错了什么:我把这两个组(“on | in”和“boston”)都设置为可选的,但分开设置。您将这两个组都包含在一个组中,并使该组成为可选组,看起来它成功了。再次感谢。如果我能投票两次,我会的!棒极了。我唯一的更改是将两个\s*都更改为\s+——否则您可以匹配“无空格”,并发现“search for Fenway parkin boston”匹配。@谢谢您的评论。我在想你提到的
\s+
,我想知道
\s*
可能是一个功能,而不是一个bug,因为它在某种程度上解决了拼写错误;)谢谢你的详细回答。不幸的是,我得到了相同的结果:它适用于“搜索波士顿芬威公园”,并正确地分别提取每个组,但不匹配“搜索芬威公园”中的任何内容,因为“in | around | at”组不是可选的。如果我将其设置为可选,则所有内容都在一个组中匹配(不是我想要的)。@BlazingFrog更新了正则表达式。现在检查一下,太棒了!我现在明白了我认为做错了什么:我把这两个组(“on | in”和“boston”)都设置为可选的,但分开设置。您将这两个组都包含在一个组中,并使该组成为可选组,看起来它成功了。再次感谢。如果我能投票两次,我会的!棒极了。我唯一的更改是将两个\s*都更改为\s+——否则您可以匹配“无空格”,并发现“search for Fenway parkin boston”匹配。@谢谢您的评论。我在想你提到的
\s+
,我想知道
\s*
可能是一个功能,而不是一个bug,因为它在某种程度上解决了拼写错误;)