对Python字典进行排序,该字典按升序键和降序值计算单词的出现次数

对Python字典进行排序,该字典按升序键和降序值计算单词的出现次数,python,Python,做一个编程练习,我必须编写一个程序,接受输入并打印strong中包含的唯一单词列表,以及字符串中每个单词的出现次数。列表应按频率降序排列,当多个单词以相同频率出现时,应按字母升序排列 示例输入:“这是一个测试。那不是一个测试。测试” 输出: 我试着用下面的代码在Python3中实现这一点,但似乎无法获得正确的键顺序。如有任何帮助,将不胜感激: import operator import re a = 'This is a test. That is not a test. Test' b =

做一个编程练习,我必须编写一个程序,接受输入并打印strong中包含的唯一单词列表,以及字符串中每个单词的出现次数。列表应按频率降序排列,当多个单词以相同频率出现时,应按字母升序排列

示例输入:“这是一个测试。那不是一个测试。测试” 输出:

我试着用下面的代码在Python3中实现这一点,但似乎无法获得正确的键顺序。如有任何帮助,将不胜感激:

import operator
import re

a = 'This is a test. That is not a test. Test'
b = re.split('[\s,.;!?]', a.lower())

words = {}

for i in b:
    if i is not '':
        if i not in words:
            words[i] = 1
        else:
            words.update({i: words.get(i) + 1})

for key, value in sorted(words.items(), key = lambda kv: kv[1], reverse = True):
    print(key, value)
输出:

test 3                                                                                      
is 2                                                                                        
a 2                                                                                         
this 1                                                                                      
not 1                                                                                       
that 1 

必须按两个元素对数组进行两次排序。python中的排序保证是稳定的,所以首先可以按单词排序,然后按频率排序。如果两个或多个单词的频率相同,则它们将按字母顺序排序。还请注意,元组默认情况下按其第一个元素排序,字符串在的布尔上下文中为false,并且仅当它为空时才为false

import re

a = 'This is a test. That is not a test. Test'
b = re.split('[\s,.;!?]', a.lower())

words = {}

for i in b:
    if i:
        words.update({i: words.get(i,0) + 1})

for key, value in sorted(sorted(words.items()), key = lambda kv: kv[1], reverse = True):
    print(key, value)

Python有一个名为Counter的有用类,可以减少您的工作负载

from collections import Counter
import re

a = 'This is a test. That is not a test. Test'
counts = Counter(re.findall(r'[\w]+', a.lower()))
sort_func = lambda x: (-x[1], x[0])
for word, freq in sorted(counts.items(), key=sort_func):
    print(word, freq)
显示中间步骤的较长代码:

from collections import Counter
import re

a = 'This is a test. That is not a test. Test'
print(re.findall(r'\w+', a.lower()))
# ['this', 'is', 'a', 'test', 'that', 'is', 'not', 'a', 'test', 'test']

counts = Counter(re.findall(r'[\w]+', a.lower()))
print(counts)
# Counter({'test': 3, 'is': 2, 'a': 2, 'this': 1, 'that': 1, 'not': 1})

print(counts.items())
# dict_items([('this', 1), ('is', 2), ('a', 2), ('test', 3), ('that', 1), ('not', 1)])

# Set up a sort function for the tuples above
sort_func = lambda x: (-x[1], x[0])
for word, freq in sorted(counts.items(), key=sort_func):
    print(word, freq)

test 3
a 2
is 2
not 1
that 1
this 1

结果是:
测试3是2 a 2这1那不是我的错。忽略这一点。没关系,关于如何实现正确的解决方案,您还有其他想法吗?您可以删除
reverse=True
,然后尝试此键:
key=lambda kv:(-kv[1],len(kv[0]),kv[0])
。然而,这只适用于这种精确的数据。更好的解决方案可能是,只需预先定义键值对的显示顺序(如键字符串列表),然后按照键在该列表中的显示顺序迭代键。我将如何@PaulM做到这一点。?我以前尝试过列出字典值,按降序排序,然后打印出对应于该值的字典键;但是打印出来的钥匙不一定按字母顺序排列;谢谢
from collections import Counter
import re

a = 'This is a test. That is not a test. Test'
print(re.findall(r'\w+', a.lower()))
# ['this', 'is', 'a', 'test', 'that', 'is', 'not', 'a', 'test', 'test']

counts = Counter(re.findall(r'[\w]+', a.lower()))
print(counts)
# Counter({'test': 3, 'is': 2, 'a': 2, 'this': 1, 'that': 1, 'not': 1})

print(counts.items())
# dict_items([('this', 1), ('is', 2), ('a', 2), ('test', 3), ('that', 1), ('not', 1)])

# Set up a sort function for the tuples above
sort_func = lambda x: (-x[1], x[0])
for word, freq in sorted(counts.items(), key=sort_func):
    print(word, freq)

test 3
a 2
is 2
not 1
that 1
this 1