Regex 用于提取第一组+；第二组或第一组（仅当无第二组时）（包括变化）_Regex

Regex 用于提取第一组+；第二组或第一组（仅当无第二组时）（包括变化）

regex

Regex 用于提取第一组+；第二组或第一组（仅当无第二组时）（包括变化）,regex,Regex,最好的解释方式是准确地展示我希望实现的目标：案例1：“搜索波士顿芬威公园” 摘录：第1组--> “芬威公园”，第二组-->“波士顿” 案例2：“搜索芬威公园” 摘录：第1组-->“芬威停车场“ 请注意，在这两种情况下，我都希望能够适应“搜索”（“查找”，“查找”，等等）和“在”（“在”，“周围”，等等）的变化。我尝试了许多不同的变体，但要么在第1组提取了“波士顿芬威公园”，而在第2组中没有提取任何内容，要么如果我正确理解了案例1，案例2将不起作用。这应该对您有用 ^(?:search

最好的解释方式是准确地展示我希望实现的目标：

案例1：

“搜索波士顿芬威公园”

摘录：第1组-->

“芬威公园”

，第二组-->

“波士顿”

案例2：

“搜索芬威公园”

摘录：第1组-->

“芬威
停车场“

请注意，在这两种情况下，我都希望能够适应
“搜索”
（
“查找”
，
“查找”
，等等）和
“在”
（
“在”
，
“周围”
，等等）的变化。

我尝试了许多不同的变体，但要么在第1组提取了

“波士顿芬威公园”

，而在第2组中没有提取任何内容，要么如果我正确理解了案例1，案例2将不起作用。

这应该对您有用

^(?:search for|look for|find)\s*(.*?)(?:\s*(?:in|around|at)\s*(.*))?$

通过向非捕获组添加moer或子句，可以添加更多子句，如

查找/in/at

说明：

@"
^                   # Assert position at the beginning of a line (at beginning of the string or after a line break character)
(?:                 # Match the regular expression below
                       # Match either the regular expression below (attempting the next alternative only if this one fails)
      search\ for         # Match the characters “search for” literally
   |                   # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
      look\ for           # Match the characters “look for” literally
   |                   # Or match regular expression number 3 below (the entire group fails if this one fails to match)
      find                # Match the characters “find” literally
)
\s                  # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
(                   # Match the regular expression below and capture its match into backreference number 1
   .                   # Match any single character that is not a line break character
      *?                  # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?:                 # Match the regular expression below
   \s                  # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
      *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   (?:                 # Match the regular expression below
                          # Match either the regular expression below (attempting the next alternative only if this one fails)
         in                  # Match the characters “in” literally
      |                   # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
         around              # Match the characters “around” literally
      |                   # Or match regular expression number 3 below (the entire group fails if this one fails to match)
         at                  # Match the characters “at” literally
   )
   \s                  # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
      *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   (                   # Match the regular expression below and capture its match into backreference number 2
      .                   # Match any single character that is not a line break character
         *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   )
)?                  # Between zero and one times, as many times as possible, giving back as needed (greedy)
$                   # Assert position at the end of a line (at the end of the string or before a line break character)
"

这应该对你有用

^(?:search for|look for|find)\s*(.*?)(?:\s*(?:in|around|at)\s*(.*))?$

通过向非捕获组添加moer或子句，可以添加更多子句，如

查找/in/at

说明：

@"
^                   # Assert position at the beginning of a line (at beginning of the string or after a line break character)
(?:                 # Match the regular expression below
                       # Match either the regular expression below (attempting the next alternative only if this one fails)
      search\ for         # Match the characters “search for” literally
   |                   # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
      look\ for           # Match the characters “look for” literally
   |                   # Or match regular expression number 3 below (the entire group fails if this one fails to match)
      find                # Match the characters “find” literally
)
\s                  # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
(                   # Match the regular expression below and capture its match into backreference number 1
   .                   # Match any single character that is not a line break character
      *?                  # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?:                 # Match the regular expression below
   \s                  # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
      *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   (?:                 # Match the regular expression below
                          # Match either the regular expression below (attempting the next alternative only if this one fails)
         in                  # Match the characters “in” literally
      |                   # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
         around              # Match the characters “around” literally
      |                   # Or match regular expression number 3 below (the entire group fails if this one fails to match)
         at                  # Match the characters “at” literally
   )
   \s                  # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
      *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   (                   # Match the regular expression below and capture its match into backreference number 2
      .                   # Match any single character that is not a line break character
         *                   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   )
)?                  # Between zero and one times, as many times as possible, giving back as needed (greedy)
$                   # Assert position at the end of a line (at the end of the string or before a line break character)
"

谢谢你的详细回答。不幸的是，我得到了相同的结果：它适用于“搜索波士顿芬威公园”，并正确地分别提取每个组，但不匹配“搜索芬威公园”中的任何内容，因为“in | around | at”组不是可选的。如果我将其设置为可选，则所有内容都在一个组中匹配（不是我想要的）。@BlazingFrog更新了正则表达式。现在检查一下，太棒了！我现在明白了我认为做错了什么：我把这两个组（“on | in”和“boston”）都设置为可选的，但分开设置。您将这两个组都包含在一个组中，并使该组成为可选组，看起来它成功了。再次感谢。如果我能投票两次，我会的！棒极了。我唯一的更改是将两个\s*都更改为\s+——否则您可以匹配“无空格”，并发现“search for Fenway parkin boston”匹配。@谢谢您的评论。我在想你提到的

\s+

，我想知道

\s*

可能是一个功能，而不是一个bug，因为它在某种程度上解决了拼写错误；）谢谢你的详细回答。不幸的是，我得到了相同的结果：它适用于“搜索波士顿芬威公园”，并正确地分别提取每个组，但不匹配“搜索芬威公园”中的任何内容，因为“in | around | at”组不是可选的。如果我将其设置为可选，则所有内容都在一个组中匹配（不是我想要的）。@BlazingFrog更新了正则表达式。现在检查一下，太棒了！我现在明白了我认为做错了什么：我把这两个组（“on | in”和“boston”）都设置为可选的，但分开设置。您将这两个组都包含在一个组中，并使该组成为可选组，看起来它成功了。再次感谢。如果我能投票两次，我会的！棒极了。我唯一的更改是将两个\s*都更改为\s+——否则您可以匹配“无空格”，并发现“search for Fenway parkin boston”匹配。@谢谢您的评论。我在想你提到的

\s+

，我想知道

\s*

可能是一个功能，而不是一个bug，因为它在某种程度上解决了拼写错误；）