正则表达式Python-返回字符串列表中包含关键字的元组_Python_Regex_Python 3.x_Nlp

正则表达式Python-返回字符串列表中包含关键字的元组

python regex python-3.x nlp

正则表达式Python-返回字符串列表中包含关键字的元组,python,regex,python-3.x,nlp,Python,Regex,Python 3.x,Nlp,我有一个关键字列表，我想通过一个长字符串列表来解析关键字、货币格式的价格以及字符串中小于10的任何其他数字。例如： keywords = ['Turin', 'Milan' , 'Nevada'] strings = ['This is a sentence about Turin with 5 and $10.00 in it.', ' 2.5 Milan is a city with £1,000 in it.', 'Nevada and $1,100,000. and 10.09']]

我有一个关键字列表，我想通过一个长字符串列表来解析关键字、货币格式的价格以及字符串中小于10的任何其他数字。例如：

keywords = ['Turin', 'Milan' , 'Nevada']
strings = ['This is a sentence about Turin with 5 and $10.00 in it.', ' 2.5 Milan is a city with £1,000 in it.', 'Nevada and $1,100,000. and 10.09']]

希望能返回以下内容：

final_list = [('Turin', '$10.00', '5'), ('Milan', '£1,000', '2.5'), ('Nevada', '$1,100,000', '')]

我已经有了下面的函数和功能正则表达式，但我不知道如何将输出组合成元组列表。有没有更简单的方法来实现这一点？我应该按单词分割然后寻找匹配项吗

def find_keyword_comments(list_of_strings,keywords_a):
    list_of_tuples = []
    for string in list_of_strings:
        keywords = '|'.join(keywords_a)
        keyword_rx = re.findall(r"^\b({})\b$".format(keywords), string, re.I)
        price_rx = re.findall(r'^[\$\£\€]\s?\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{1,2})?$', string)
        number_rx1 = re.findall(r'\b\d[.]\d{1,2}\b', string)
        number_rx2 = re.findall(r'\s\d\s', string)

您可以使用

re.findall

：

import re
keywords = ['Turin', 'Milan' , 'Nevada']
strings = ['This is a sentence about Turin with 5 and $10.00 in it.', '2.5 Milan is a city with £1,000 in it.', 'Nevada and $1,100,000. and 10.09']
grouped_strings = [(i, [b for b in strings if i in b]) for i in keywords]
new_groups = [(a, filter(lambda x:re.findall('\d', x),[re.findall('[\$\d\.£,]+', c) for c in b][0])) for a, b in grouped_strings]
last_groups = [(a, list(filter(lambda x:re.findall('\d', x) and float(x) < 10 if x[0].isdigit() else True, b))) for a, b in new_groups]

您可以使用

re.findall

：

import re
keywords = ['Turin', 'Milan' , 'Nevada']
strings = ['This is a sentence about Turin with 5 and $10.00 in it.', '2.5 Milan is a city with £1,000 in it.', 'Nevada and $1,100,000. and 10.09']
grouped_strings = [(i, [b for b in strings if i in b]) for i in keywords]
new_groups = [(a, filter(lambda x:re.findall('\d', x),[re.findall('[\$\d\.£,]+', c) for c in b][0])) for a, b in grouped_strings]
last_groups = [(a, list(filter(lambda x:re.findall('\d', x) and float(x) < 10 if x[0].isdigit() else True, b))) for a, b in new_groups]

你会如何处理“从都灵到米兰的一张票花了我100美元”这样的句子？理想情况下，复制品会捕获最接近价格和数字的值。你会如何处理“从都灵到米兰的一张票花了我100美元”这样的句子？理想情况下，复制品会捕获最接近价格和数字的值。