Python 如何使用正则表达式从句子中提取两位数?
我正在尝试创建一个函数,该函数只从特定的正则表达式中提取两位整数Python 如何使用正则表达式从句子中提取两位数?,python,regex,Python,Regex,我正在尝试创建一个函数,该函数只从特定的正则表达式中提取两位整数 def extract_number(message_text): regex_expression = 'What are the top ([0-9]{2}) trends on facebook' regex= re.compile(regex_expression) matches = regex.finditer(message_text) for match in matches:
def extract_number(message_text):
regex_expression = 'What are the top ([0-9]{2}) trends on facebook'
regex= re.compile(regex_expression)
matches = regex.finditer(message_text)
for match in matches:
return match.group()
# if there were no matches, return None
return None
所以当我打印的时候
message_text= 'What are the top 54 trends on facebook today'
print(extract_number(message_text))
我要54号。
如果我在下面写下,我会得到我输入的字符(+.+)…为什么它对数字不起作用
def extract_number(message_text):
regex_expression = 'What are the top (.+) trends on facebook'
regex= re.compile(regex_expression)
matches = regex.finditer(message_text)
for match in matches:
return match.group()
message_text= 'What are the top fifty trends on facebook today'
print(extract_number(message_text))
这两个代码片段的唯一问题是,您没有返回感兴趣的捕获组结果,而是返回整体匹配:
return match.group()
与返回匹配相同。组(0)
,即它将返回整个匹配,在您的情况下,它是整个输入字符串
相反,您希望索引1
,即第一个捕获组(包含在(…)
中的第一个子表达式),([0-9]{2})
匹配的内容:
return match.group(1)
总而言之:
def extract_number(message_text):
regex_expression = 'What are the top ([0-9]{2}) trends on facebook'
regex= re.compile(regex_expression)
matches = regex.finditer(message_text)
# (See bottom of this answer for a loop-less alternative.)
for match in matches:
return match.group(1) # index 1 returns what the 1st capture group matched
# if there were no matches, return None
return None
message_text= 'What are the top 54 trends on facebook today'
print(extract_number(message_text))
这将产生所需的输出:
54
注:正如@EvanL00所指出的,考虑到只需要1次匹配,使用
regex.finditer()
和随后的for
循环(在第一次迭代中无条件返回)是不必要的,并且可能会模糊代码的意图;更简单、更清晰的方法是:
match = regex.search(message_text) # Get first match only.
if match:
return match.group(1)
这适用于数字/字符串:
def extract_number(message_text):
regex_expression = 'What are the top ([a-zA-Z0-9]+) trends on facebook'
regex= re.compile(regex_expression)
matches = regex.findall(message_text)
if matches:
return matches[0]
message_text= 'What are the top fifty trends on facebook today'
print(extract_number(message_text))
message_text= 'What are the top 50 trends on facebook today'
print(extract_number(message_text))
message_text= 'What are the top -- trends on facebook today'
print(extract_number(message_text))
输出:
fifty
50
None
由于字符串中只有一个数字,请使用
(\d+)
从中提取数字,并使用第一个捕获的组访问该数字。这是使用第一个代码,我得到的是“facebook上的前54个趋势是什么”,而不是“54”,