与Python中提供的引用列表匹配的近似字符串

与Python中提供的引用列表匹配的近似字符串,python,text,match,Python,Text,Match,我想在一个长字符串中找到最常出现的近似匹配,条件是该单词也来自提供的列表 例如: # provided list >> jobskill = ["scrum", "customer experience improvement", "python"] # long string >> jobtext = ["We are looking for Graduates in our Customer Experience department in Swindon, y

我想在一个长字符串中找到最常出现的近似匹配,条件是该单词也来自提供的列表

例如:

# provided list 
>> jobskill = ["scrum", "customer experience improvement", "python"]

# long string 
>> jobtext = ["We are looking for Graduates in our Customer Experience department in Swindon, you will be responsible for improving customer experience and will also be working with the digital team. Send in your application by 31st December 2018", 
"If you are ScrumMaster at the top of your game with ability to communicate inspire and take people with you then there could not be a better time, we are the pioneer in digital relationship banking, and we are currently lacking talent in our Scrum team, if you are passionate about Scrum, apply to our Scrum team, knowledge with python is a plus!"]

# write a function that returns most frequent approximate match
>> mostfrequent(input = jobtext, lookup = jobskill)
# desired_output: {"customer experience improvement, "scrum"}
感谢任何形式的帮助,谢谢

使用模糊模糊
到目前为止你试过什么?如果你完全不知道,你可以从翻阅两个列表开始,然后比较字符串。在你的长期计划中,没有客户体验的改善string@HenryYik我尝试了模糊模糊库中的process.extractOne,但它没有提供最佳匹配。你介意展示循环如何得到结果吗?@ThatBird“customer experience”是“customer experience improvement”最常见的匹配,这就是为什么我希望函数返回近似匹配而不是精确匹配。
from collections import defaultdict
from fuzzywuzzy import fuzz

# provided list
jobskill = ["scrum", "customer experience improvement", "python"]

# long string
jobtext = [
    "We are looking for Graduates in our Customer Experience department in Swindon, you will be responsible for improving customer experience and will also be working with the digital team. Send in your application by 31st December 2018",
    "If you are ScrumMaster at the top of your game with ability to communicate inspire and take people with you then there could not be a better time, we are the pioneer in digital relationship banking, and we are currently lacking talent in our Scrum team, if you are passionate about Scrum, apply to our Scrum team, knowledge with python is a plus!",
]


def k_most_frequent(k, text, queries, threshold=70):
    """Return k most frequent queries using fuzzywuzzy to match."""

    frequency = defaultdict(int)
    text = " ".join(text).split()
    for query in queries:
        for window in range(len(query.split()) + 1):
            frequency[query] += sum(
                [
                    fuzz.ratio(query, " ".join(text[i : i + window])) > threshold
                    for i in range(len(text))
                ]
            )

    return sorted(frequency.keys(), key=frequency.get, reverse=True)[:k]


print(k_most_frequent(2, jobtext, jobskill))

# output: ["customer experience improvement, "scrum"]