Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/sqlite/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在集合列表中查找最常用的单词_Python - Fatal编程技术网

Python 在集合列表中查找最常用的单词

Python 在集合列表中查找最常用的单词,python,Python,我目前在NLP的大学项目中工作。我想显示此集合列表中最常见的单词: [{'allow','feel','fear','situat','properti','skitt','face','ani',},{'unpleas','someth','fear','make','abil','face','scar','us','feel'}] 这就是我迄今为止所取得的成就: def word_list(sent): if isinstance(sent, str): tokens

我目前在NLP的大学项目中工作。我想显示此集合列表中最常见的单词:

[{'allow','feel','fear','situat','properti','skitt','face','ani',},{'unpleas','someth','fear','make','abil','face','scar','us','feel'}]

这就是我迄今为止所取得的成就:

def word_list(sent):
   if isinstance(sent, str):
       tokens = set(word_tokenize(sent.lower()))
   else:
       tokens = set([t for s in sent for t in word_tokenize(s.lower())])
   tokens = set([stemmer.stem(t) for t in tokens])

   for w in stopword_final:
       tokens.discard(w)
   return tokens
   
def get_most_relevant_words(definitions):
   list_of_words = list()
   most_common_word_dict = dict()
   for d1 in definitions:
       list_of_words.append(word_list(d1))

   for elem in list_of_words:
    for word in elem:
        print(word)
        word_counter = Counter(word)
        most_occurrences = word_counter.most_common(3)
        most_common_word_dict.update({word: most_occurrences})
        return most_common_word_dict
所需的输出应该是:
{fear:2,feel:2}


它打印的输出是:
{'feel':[('e',2),('f',1),('l',1)]}
使用
集合。计数器

from collections import Counter

list_of_sets = [{'allow', 'feel', 'fear', 'situat', 'properti', 'despit', 'face', 'ani'}, {'unpleas', 'someth', 'fear', 'make', 'abil', 'face', 'scar', 'us', 'feel'}]
words = [word for my_set in list_of_sets for word in my_set]
c = Counter(words)
print(c)
输出:

Counter({
    'fear': 2, 
    'face': 2, 
    'feel': 2, 
    'properti': 1, 
    'despit': 1, 
    'allow': 1, 
    'situat': 1, 
    'ani': 1, 
    'someth': 1, 
    'unpleas': 1, 
    'make': 1, 
    'abil': 1, 
    'us': 1, 
    'scar': 1
})

您可以简单地遍历这两个集合,查找常用术语,并在字典中更新计数。顺便说一下,“脸”也应该包括在你的结果中

lst = [{'allow', 'feel', 'fear', 'situat', 'properti', 'despit', 'face', 'ani'}, {'unpleas', 'someth', 'fear', 'make', 'abil', 'face', 'scar', 'us', 'feel'}]
dic = {}
for word1 in lst[0]:
    for word2 in lst[1]:
        if word1 == word2:
            dic[word1] = dic.get(word1, 0) + 2

print(dic)
#{'fear': 2, 'feel': 2, 'face': 2}

当我运行此代码时,它不会打印任何内容。如果我尝试将集合列表传递给您定义的两个函数中的任何一个,我会得到一个错误
name'word\u tokenize'未定义