Python3中的正则表达式:在数字或可选句点之后但在可选逗号之前匹配所有内容

Python3中的正则表达式:在数字或可选句点之后但在可选逗号之前匹配所有内容,python,regex,Python,Regex,我正试图在没有任何测量或说明的情况下从食谱中返回配料。成分是列表,如下所示: ['1 medium tomato, cut into 8 wedges', '4 c. torn mixed salad greens', '1/2 small red onion, sliced and separated into rings', '1/4 small cucumber, sliced', '1/4 c. sliced pitted ripe olives', '2 Tbsp

我正试图在没有任何测量或说明的情况下从食谱中返回配料。成分是列表,如下所示:

['1  medium tomato, cut into 8 wedges',
 '4  c. torn mixed salad greens',
 '1/2  small red onion, sliced and separated into rings',
 '1/4  small cucumber, sliced',
 '1/4  c. sliced pitted ripe olives',
 '2  Tbsp. reduced-calorie Italian salad dressing',
 '2  Tbsp. lemon juice',
 '1  Tbsp. water',
 '1/2  tsp. dried mint, crushed',
 '1/4  c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese']
我想返回以下列表:

['medium tomato',
 'torn mixed salad greens',
 'small red onion',
 'small cucumber',
 'sliced pitted ripe olives',
 'reduced-calorie Italian salad dressing',
 'lemon juice',
 'water',
 'dried mint',
 'crumbled Blue cheese']
我发现的最接近的模式是:

pattern = '[\s\d\.]* ([^\,]+).*'
但在测试中:

for ing in ingredients:
    print(re.findall(pattern, ing))
每个测量缩写后的期间也会返回,例如:

['c. torn mixed salad greens']


提前谢谢你

问题是您正在将数字与点配对

\s\d*\.?

应能正确匹配数字(带点或不带点)

您可以使用以下模式:

for ing in ingredients:
    print(re.search(r'[a-z][^.,]*(?![^,])(?i)', ing).group())
图案详情:

([a-z][^.,]*) # a substring that starts with a letter and that doesn't contain a period
                # or a comma
(?![^,]) # not followed by a character that is not a comma
         # (in other words, followed by a comma or the end of the string)
(?i)     # make the pattern case insensitive
描述 我建议使用下面的正则表达式来查找并替换您不感兴趣的子字符串。通过说明计量单位,这也将涉及未缩写的计量单位

\s*(?:(?:[0-9]\s*)?[0-9]+\/)?[0-9]+\s*(?:(?:(?:c\.杯?|茶匙?|汤匙)s*))?)、...\bor\b

替换为:

例子 现场演示

显示这将如何匹配

示例字符串

注意,最后一行有一个由
分隔的双成分,根据OP,他们希望消除第一个成分

1  medium tomato, cut into 8 wedges
4  c. torn mixed salad greens
1/2  small red onion, sliced and separated into rings
1/4  small cucumber, sliced
1 1/4  c. sliced pitted ripe olives
2  Tbsp. reduced-calorie Italian salad dressing
2  Tbsp. lemon juice
1  Tbsp. water
1/2  tsp. dried mint, crushed
1/4  c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese
更换后

medium tomato
torn mixed salad greens
small red onion
small cucumber
sliced pitted ripe olives
reduced-calorie Italian salad dressing
lemon juice
water
dried mint
crumbled Blue cheese
解释
欢迎来到stackoverflow!非常感谢。我可以稍微调整一下您的模式以使其正常工作--[\d\w\s]*[\.]+(?:.*或[\d\w\s]*[\.]+)?([^\,]+).*必须添加一个被动组来处理有两个选项的情况,例如:“1/4 c”。碎羊奶干酪或2汤匙。“碎蓝奶酪”]这个想法很有趣,但要列出一个详尽的可能的术语列表,比如茶匙、汤匙,并不是很方便。。。我会这样说:我也这么认为,但后来OP说最后一行包含两种成分,但他们只想要最后一种…,如果最后一行中的测量值是
汤匙
而不是
tbsp.
,然后你被建议说regex不会清理第一批Ingredenty这有点奇怪。而且再次与你相遇真的很高兴:)
1  medium tomato, cut into 8 wedges
4  c. torn mixed salad greens
1/2  small red onion, sliced and separated into rings
1/4  small cucumber, sliced
1 1/4  c. sliced pitted ripe olives
2  Tbsp. reduced-calorie Italian salad dressing
2  Tbsp. lemon juice
1  Tbsp. water
1/2  tsp. dried mint, crushed
1/4  c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese
medium tomato
torn mixed salad greens
small red onion
small cucumber
sliced pitted ripe olives
reduced-calorie Italian salad dressing
lemon juice
water
dried mint
crumbled Blue cheese
NODE                     EXPLANATION
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      (?:                      group, but do not capture (optional
                               (matching the most amount possible)):
----------------------------------------------------------------------
        [0-9]                    any character of: '0' to '9'
----------------------------------------------------------------------
        \s*                      whitespace (\n, \r, \t, \f, and " ")
                                 (0 or more times (matching the most
                                 amount possible))
----------------------------------------------------------------------
      )?                       end of grouping
----------------------------------------------------------------------
      [0-9]+                   any character of: '0' to '9' (1 or
                               more times (matching the most amount
                               possible))
----------------------------------------------------------------------
      \/                       '/'
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      (?:                      group, but do not capture:
----------------------------------------------------------------------
        c                        'c'
----------------------------------------------------------------------
        \.                       '.'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        cup                      'cup'
----------------------------------------------------------------------
        s?                       's' (optional (matching the most
                                 amount possible))
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        tsp                      'tsp'
----------------------------------------------------------------------
        \.                       '.'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        teaspoon                 'teaspoon'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        tbsp                     'tbsp'
----------------------------------------------------------------------
        \.                       '.'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        tablespoon               'tablespoon'
----------------------------------------------------------------------
      )                        end of grouping
----------------------------------------------------------------------
      \s*                      whitespace (\n, \r, \t, \f, and " ")
                               (0 or more times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  ,                        ','
----------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
  or                       'or'
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------