Python 在集合列表中查找最常用的单词
我目前在NLP的大学项目中工作。我想显示此集合列表中最常见的单词:Python 在集合列表中查找最常用的单词,python,Python,我目前在NLP的大学项目中工作。我想显示此集合列表中最常见的单词: [{'allow','feel','fear','situat','properti','skitt','face','ani',},{'unpleas','someth','fear','make','abil','face','scar','us','feel'}] 这就是我迄今为止所取得的成就: def word_list(sent): if isinstance(sent, str): tokens
[{'allow','feel','fear','situat','properti','skitt','face','ani',},{'unpleas','someth','fear','make','abil','face','scar','us','feel'}]
这就是我迄今为止所取得的成就:
def word_list(sent):
if isinstance(sent, str):
tokens = set(word_tokenize(sent.lower()))
else:
tokens = set([t for s in sent for t in word_tokenize(s.lower())])
tokens = set([stemmer.stem(t) for t in tokens])
for w in stopword_final:
tokens.discard(w)
return tokens
def get_most_relevant_words(definitions):
list_of_words = list()
most_common_word_dict = dict()
for d1 in definitions:
list_of_words.append(word_list(d1))
for elem in list_of_words:
for word in elem:
print(word)
word_counter = Counter(word)
most_occurrences = word_counter.most_common(3)
most_common_word_dict.update({word: most_occurrences})
return most_common_word_dict
所需的输出应该是:{fear:2,feel:2}
它打印的输出是:
{'feel':[('e',2),('f',1),('l',1)]}
使用集合。计数器
:
from collections import Counter
list_of_sets = [{'allow', 'feel', 'fear', 'situat', 'properti', 'despit', 'face', 'ani'}, {'unpleas', 'someth', 'fear', 'make', 'abil', 'face', 'scar', 'us', 'feel'}]
words = [word for my_set in list_of_sets for word in my_set]
c = Counter(words)
print(c)
输出:
Counter({
'fear': 2,
'face': 2,
'feel': 2,
'properti': 1,
'despit': 1,
'allow': 1,
'situat': 1,
'ani': 1,
'someth': 1,
'unpleas': 1,
'make': 1,
'abil': 1,
'us': 1,
'scar': 1
})
您可以简单地遍历这两个集合,查找常用术语,并在字典中更新计数。顺便说一下,“脸”也应该包括在你的结果中
lst = [{'allow', 'feel', 'fear', 'situat', 'properti', 'despit', 'face', 'ani'}, {'unpleas', 'someth', 'fear', 'make', 'abil', 'face', 'scar', 'us', 'feel'}]
dic = {}
for word1 in lst[0]:
for word2 in lst[1]:
if word1 == word2:
dic[word1] = dic.get(word1, 0) + 2
print(dic)
#{'fear': 2, 'feel': 2, 'face': 2}
当我运行此代码时,它不会打印任何内容。如果我尝试将集合列表传递给您定义的两个函数中的任何一个,我会得到一个错误
name'word\u tokenize'未定义
。