Python 如果在一个模式后大写,则抓取一个或两个单词,并将结果与另一个列表匹配
我需要从文本中提取具有诸如Lord | Baroness | Lady | Baron等标题的唯一名称,并将其与另一个列表匹配。我努力获得正确的结果,希望社区能帮助我。谢谢Python 如果在一个模式后大写,则抓取一个或两个单词,并将结果与另一个列表匹配,python,regex,python-3.x,Python,Regex,Python 3.x,我需要从文本中提取具有诸如Lord | Baroness | Lady | Baron等标题的唯一名称,并将其与另一个列表匹配。我努力获得正确的结果,希望社区能帮助我。谢谢 import re def get_names(text): # find nobel titles and grab it with the following name match = re.compile(r'(Lord|Baroness|Lady|Baron) ([A-Z][a-z]+) ([A-Z]
import re
def get_names(text):
# find nobel titles and grab it with the following name
match = re.compile(r'(Lord|Baroness|Lady|Baron) ([A-Z][a-z]+) ([A-Z][a-z]+)')
names = list(set(match.findall(text)))
# remove duplicates based on the index in tuples
names_ = list(dict((v[1],v) for v in sorted(names, key=lambda names: names[0])).values())
names_lst = list(set([' '.join(map(str, name)) for name in names_]))
return names_lst
text = 'Baroness Firstname Surname and Baroness who is also known as Lady Anothername and Lady Surname or Lady Firstname.'
names_lst = get_names(text)
print(names_lst)
现在产生:['Baroness Firstname姓氏]
所需输出:['Baroness Firstname-name','Lady Anothername']
但不Lady-name
或Lady-Firstname
然后我需要将结果与此列表匹配:
other_names = ['Firstname Surname', 'James', 'Simon Smith']
并从中删除元素
'Firstname-name'
,因为它与“所需输出”中男爵夫人的名字和姓氏匹配。我建议您使用以下解决方案:
import re
def get_names(text):
# find nobel titles and grab it with the following name
match = re.compile(r'(Lord|Baroness|Lady|Baron) ([A-Z][a-z]+)[ ]?([A-Z][a-z]+)?')
names = list(match.findall(text))
# keep only the first title encountered
d = {}
for name in names:
if name[0] not in d:
d[name[0]] = ' '.join(name[1:3]).strip()
return d
text = 'Baroness Firstname Surname and Baroness who is also known as Lady Anothername and Lady Surname or Lady Firstname.'
other_names = ['Firstname Surname', 'James', 'Simon Smith']
names_dict = get_names(text)
print(names_dict)
# {'Baroness': 'Firstname Surname', 'Lady': 'Anothername'}
print([' '.join([k,v]) for k,v in names_dict.items()])
# ['Baroness Firstname Surname', 'Lady Anothername']
other_names_dropped = [name for name in other_names if name not in names_dict.values()]
print(other_names_dropped)
# ['James', 'Simon Smith']
首先,你确实提供了一个例子,但你应该进一步解释你打算完成什么,特别是关于最后一点:“我需要将结果与此列表匹配”。。。“并删除元素”。。。解释一下你为什么需要这样做,会让问题变得更清楚。还有,我们在这里讨论的是多少数据?你真的只需要匹配你给出的几个例子吗?(男爵夫人、女士等)非常感谢。这肯定会帮助我向前迈进,但我需要从“其他名字”列表中删除匹配的名字“Firstname姓氏”,而不是反过来。我不明白,但现在似乎明白了。我更新了我的帖子。你现在好了吗?谢谢你!星期五快乐!