Python 前K个常用词卡在一个部分_Python

Python 前K个常用词卡在一个部分

python

Python 前K个常用词卡在一个部分,python,Python,这是指leetcode问题：这是我的密码： import heapq class Solution: # def topKFrequent(self, words: List[str], k: int) -> List[str]: def topKFrequent(self, words, k): results = [] wordTable = {} for word in words: if (wordTable.get(word) is N

这是指leetcode问题：这是我的密码：

import heapq

class Solution:
# def topKFrequent(self, words: List[str], k: int) -> List[str]:
def topKFrequent(self, words, k):
    results = []
    wordTable = {}
    for word in words:
        if (wordTable.get(word) is None):
            wordTable[word] = 1
            continue
        wordTable[word] = (wordTable.get(word)) + 1

    heap = []
    # print(wordTable)
    heapSize = 0

    for word in wordTable.keys():
        node = [wordTable[word], word]
        if(heapSize<k):
            heapq.heappush(heap,node)
            heapSize += 1
            continue
        if(heapSize>=k):
            if (heap[0][0]< node[0]):
                heapq.heappushpop(heap,node)
                heapSize -= 1
                continue
            if heap[0][0] == node[0] and heap[0][1]>node[1]:
                heapq.heappop(heap)
                heapq.heappush(heap,node)
                heapSize -= 1
                continue

    # heap.sort(key = lambda x: x.freq, reverse=True);
    print(heap)

    for i in reversed(range(k)):
        results.append(heap[i][1])
    return results

导入heapq
类解决方案：
#def topKFrequent（self，words:List[str]，k:int）->List[str]：
def topKFrequent（self，words，k）：
结果=[]
wordTable={}
用文字表示：
如果（wordTable.get（word）为无）：
字表[字]=1
持续
wordTable[word]=（wordTable.get（word））+1
堆=[]
#打印（字表）
heapSize=0
对于wordTable.keys（）中的word：
node=[wordTable[word]，word]
如果（heapSize=k）：
如果（堆[0][0]<节点[0]）：
heapq.heappushpop（堆，节点）
heapSize-=1
持续
如果堆[0][0]==节点[0]和堆[0][1]>节点[1]：
heapq.heapop（堆）
heapq.heappush（堆，节点）
heapSize-=1
持续
#sort（key=lambda x:x.freq，reverse=True）；
打印（堆）
对于反向（范围（k））中的i：
results.append（堆[i][1]）
返回结果

如果所有单词的频率都不同，那么代码就可以工作，因为它使用一个最小堆。但是，如果它们的频率相同，则不起作用，因为它们的顺序相反，因此字母顺序较大的单词排在不被接受的第一位（例如，如果我有4个频率相同的单词，假设它们是a、b、c、d：我的结果将是d、c、b、a，这是不可接受的）我不知道如何解释这个案例，我在这个问题上被困了3个小时。

有人能帮忙吗？

使用

functools.cmp\u to\u键

从functools导入cmp\u到\u键
def cmp（a、b）：
如果a[0]==b[0]：
如果a[1]b[0]否则返回-1
返回已排序（堆，键=cmp\u到\u键（cmp））

这一行应该对您有所帮助

from collections import Counter

topk = lambda words, k: [t[0] for t in Counter(list(sorted(words))).most_common(k)]

print(topk(["i", "love", "leetcode", "i", "love", "coding"], k=2))
print(topk(["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k=4))

# Output
['i', 'love']
['the', 'is', 'sunny', 'day']

第一步是使用

列表（已排序（单词））

计数器将

列表

转换为频率。它是内置的

heapq

most_common（k）

顾名思义，它为您提供了最多常用词。但请注意，我们已经对它们进行了排序按词典编纂

最后一个外部for循环只需使用第一个

最常见（k）

函数返回的元组值