Python 在集合列表中查找最常用的单词_Python

Python 在集合列表中查找最常用的单词

python

Python 在集合列表中查找最常用的单词,python,Python,我目前在NLP的大学项目中工作。我想显示此集合列表中最常见的单词： [{'allow'，'feel'，'fear'，'situat'，'properti'，'skitt'，'face'，'ani'，}，{'unpleas'，'someth'，'fear'，'make'，'abil'，'face'，'scar'，'us'，'feel'}] 这就是我迄今为止所取得的成就： def word_list(sent): if isinstance(sent, str): tokens

我目前在NLP的大学项目中工作。我想显示此集合列表中最常见的单词：

[{'allow'，'feel'，'fear'，'situat'，'properti'，'skitt'，'face'，'ani'，}，{'unpleas'，'someth'，'fear'，'make'，'abil'，'face'，'scar'，'us'，'feel'}]

这就是我迄今为止所取得的成就：

def word_list(sent):
   if isinstance(sent, str):
       tokens = set(word_tokenize(sent.lower()))
   else:
       tokens = set([t for s in sent for t in word_tokenize(s.lower())])
   tokens = set([stemmer.stem(t) for t in tokens])

   for w in stopword_final:
       tokens.discard(w)
   return tokens
   
def get_most_relevant_words(definitions):
   list_of_words = list()
   most_common_word_dict = dict()
   for d1 in definitions:
       list_of_words.append(word_list(d1))

   for elem in list_of_words:
    for word in elem:
        print(word)
        word_counter = Counter(word)
        most_occurrences = word_counter.most_common(3)
        most_common_word_dict.update({word: most_occurrences})
        return most_common_word_dict

所需的输出应该是：

{fear:2，feel:2}

它打印的输出是：

{'feel'：[（'e'，2），（'f'，1），（'l'，1）]}

使用

集合。计数器

：

from collections import Counter

list_of_sets = [{'allow', 'feel', 'fear', 'situat', 'properti', 'despit', 'face', 'ani'}, {'unpleas', 'someth', 'fear', 'make', 'abil', 'face', 'scar', 'us', 'feel'}]
words = [word for my_set in list_of_sets for word in my_set]
c = Counter(words)
print(c)

输出：

Counter({
    'fear': 2, 
    'face': 2, 
    'feel': 2, 
    'properti': 1, 
    'despit': 1, 
    'allow': 1, 
    'situat': 1, 
    'ani': 1, 
    'someth': 1, 
    'unpleas': 1, 
    'make': 1, 
    'abil': 1, 
    'us': 1, 
    'scar': 1
})

您可以简单地遍历这两个集合，查找常用术语，并在字典中更新计数。顺便说一下，“脸”也应该包括在你的结果中

lst = [{'allow', 'feel', 'fear', 'situat', 'properti', 'despit', 'face', 'ani'}, {'unpleas', 'someth', 'fear', 'make', 'abil', 'face', 'scar', 'us', 'feel'}]
dic = {}
for word1 in lst[0]:
    for word2 in lst[1]:
        if word1 == word2:
            dic[word1] = dic.get(word1, 0) + 2

print(dic)
#{'fear': 2, 'feel': 2, 'face': 2}

当我运行此代码时，它不会打印任何内容。如果我尝试将集合列表传递给您定义的两个函数中的任何一个，我会得到一个错误

name'word\u tokenize'未定义

。