Python3中的正则表达式:在数字或可选句点之后但在可选逗号之前匹配所有内容
我正试图在没有任何测量或说明的情况下从食谱中返回配料。成分是列表,如下所示:Python3中的正则表达式:在数字或可选句点之后但在可选逗号之前匹配所有内容,python,regex,Python,Regex,我正试图在没有任何测量或说明的情况下从食谱中返回配料。成分是列表,如下所示: ['1 medium tomato, cut into 8 wedges', '4 c. torn mixed salad greens', '1/2 small red onion, sliced and separated into rings', '1/4 small cucumber, sliced', '1/4 c. sliced pitted ripe olives', '2 Tbsp
['1 medium tomato, cut into 8 wedges',
'4 c. torn mixed salad greens',
'1/2 small red onion, sliced and separated into rings',
'1/4 small cucumber, sliced',
'1/4 c. sliced pitted ripe olives',
'2 Tbsp. reduced-calorie Italian salad dressing',
'2 Tbsp. lemon juice',
'1 Tbsp. water',
'1/2 tsp. dried mint, crushed',
'1/4 c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese']
我想返回以下列表:
['medium tomato',
'torn mixed salad greens',
'small red onion',
'small cucumber',
'sliced pitted ripe olives',
'reduced-calorie Italian salad dressing',
'lemon juice',
'water',
'dried mint',
'crumbled Blue cheese']
我发现的最接近的模式是:
pattern = '[\s\d\.]* ([^\,]+).*'
但在测试中:
for ing in ingredients:
print(re.findall(pattern, ing))
每个测量缩写后的期间也会返回,例如:
['c. torn mixed salad greens']
当
提前谢谢你 问题是您正在将数字与点配对
\s\d*\.?
应能正确匹配数字(带点或不带点)您可以使用以下模式:
for ing in ingredients:
print(re.search(r'[a-z][^.,]*(?![^,])(?i)', ing).group())
图案详情:
([a-z][^.,]*) # a substring that starts with a letter and that doesn't contain a period
# or a comma
(?![^,]) # not followed by a character that is not a comma
# (in other words, followed by a comma or the end of the string)
(?i) # make the pattern case insensitive
描述
我建议使用下面的正则表达式来查找并替换您不感兴趣的子字符串。通过说明计量单位,这也将涉及未缩写的计量单位
\s*(?:(?:[0-9]\s*)?[0-9]+\/)?[0-9]+\s*(?:(?:(?:c\.杯?|茶匙?|汤匙)s*))?)、...\bor\b
替换为:无
例子
现场演示
显示这将如何匹配
示例字符串
注意,最后一行有一个由或
分隔的双成分,根据OP,他们希望消除第一个成分
1 medium tomato, cut into 8 wedges
4 c. torn mixed salad greens
1/2 small red onion, sliced and separated into rings
1/4 small cucumber, sliced
1 1/4 c. sliced pitted ripe olives
2 Tbsp. reduced-calorie Italian salad dressing
2 Tbsp. lemon juice
1 Tbsp. water
1/2 tsp. dried mint, crushed
1/4 c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese
更换后
medium tomato
torn mixed salad greens
small red onion
small cucumber
sliced pitted ripe olives
reduced-calorie Italian salad dressing
lemon juice
water
dried mint
crumbled Blue cheese
解释
欢迎来到stackoverflow!非常感谢。我可以稍微调整一下您的模式以使其正常工作--[\d\w\s]*[\.]+(?:.*或[\d\w\s]*[\.]+)?([^\,]+).*必须添加一个被动组来处理有两个选项的情况,例如:“1/4 c”。碎羊奶干酪或2汤匙。“碎蓝奶酪”]这个想法很有趣,但要列出一个详尽的可能的术语列表,比如茶匙、汤匙,并不是很方便。。。我会这样说:我也这么认为,但后来OP说最后一行包含两种成分,但他们只想要最后一种…,如果最后一行中的测量值是
汤匙而不是tbsp.
,然后你被建议说regex不会清理第一批Ingredenty这有点奇怪。而且再次与你相遇真的很高兴:)
1 medium tomato, cut into 8 wedges
4 c. torn mixed salad greens
1/2 small red onion, sliced and separated into rings
1/4 small cucumber, sliced
1 1/4 c. sliced pitted ripe olives
2 Tbsp. reduced-calorie Italian salad dressing
2 Tbsp. lemon juice
1 Tbsp. water
1/2 tsp. dried mint, crushed
1/4 c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese
medium tomato
torn mixed salad greens
small red onion
small cucumber
sliced pitted ripe olives
reduced-calorie Italian salad dressing
lemon juice
water
dried mint
crumbled Blue cheese
NODE EXPLANATION
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
[0-9] any character of: '0' to '9'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
c 'c'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
cup 'cup'
----------------------------------------------------------------------
s? 's' (optional (matching the most
amount possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
tsp 'tsp'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
teaspoon 'teaspoon'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
tbsp 'tbsp'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
tablespoon 'tablespoon'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
, ','
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
or 'or'
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------