python列表中不太常见的单词_Python_List_Counter

python列表中不太常见的单词

python list

python列表中不太常见的单词,python,list,counter,Python,List,Counter,我对最常见的单词进行了计数，以便只保留列表中128个最常见的单词的顺序： words = my_list mcommon_words = [word for word, word_count in Counter(words).most_common(128)] my_list = [x for x in my_list if x in mcommon_words] my_list = OrderedDict.fromkeys(my_list) my_list = list(my_list.ke

我对最常见的单词进行了计数，以便只保留列表中128个最常见的单词的顺序：

words = my_list
mcommon_words = [word for word, word_count in Counter(words).most_common(128)]
my_list = [x for x in my_list if x in mcommon_words]
my_list = OrderedDict.fromkeys(my_list)
my_list = list(my_list.keys())

但现在我想用同样的方法计算128个不太常见的单词。更快的解决方案也会对我有很大帮助

您可以尝试以下方法：

from collections import Counter

def common_words(words, number_of_words, reverse=False):
    counter = Counter(words)
    return sorted(counter, key = counter.get, reverse=reverse)[:number_of_words]

这里我们确保计数器字典按其值排序。排序后，我们返回最少最多的单词。下面是一个测试示例：

words = []
for i,num in enumerate('one two three four five six seven eight nine ten'.split()):
    words.extend([num]*(i+1))

print(common_words(words,5))

本例从单词列表中获得了5个最不常见的单词

结果:

['one', 'two', 'three', 'four', 'five']

['ten', 'nine', 'eight', 'seven', 'six']

我们还可以得到最常用的词：

print(common_words(words,5, reverse=True))

结果:

['one', 'two', 'three', 'four', 'five']

['ten', 'nine', 'eight', 'seven', 'six']

most_common

将单词及其计数作为元组列表返回。此外

该方法返回列表的事实意味着您可以使用切片来获取第一个和最后一个

元素

演示：

可能重复的