带有键函数的heapq.nlarest的输出更改顺序(Python)
有人能解释一下,当使用只有第一个参数的键函数调用NLAGEST函数时,为什么输出顺序会发生变化带有键函数的heapq.nlarest的输出更改顺序(Python),python,Python,有人能解释一下,当使用只有第一个参数的键函数调用NLAGEST函数时,为什么输出顺序会发生变化 import heapq heap_arr = [(1, 'a'), (2, 'b'), (2, 'b'), (3, 'c'), (3, 'd')] heapq.nlargest(2, x) # Perfectly fine - OP is [(3, 'd'), (3, 'c')] # This is similar to heapq.nlargest(2, x, key=lambda a: (a[
import heapq
heap_arr = [(1, 'a'), (2, 'b'), (2, 'b'), (3, 'c'), (3, 'd')]
heapq.nlargest(2, x)
# Perfectly fine - OP is [(3, 'd'), (3, 'c')]
# This is similar to heapq.nlargest(2, x, key=lambda a: (a[0], a[1]))
heapq.nlargest(2, x, key=lambda a: a[0])
# OP is [(3, 'c'), (3, 'd')]... Why ??
为什么(3,'c')在第二个示例中出现在(3,'d')之前。这个问题背后的原因是输出列表中元组的顺序很重要。简短回答:
heapq.nlargest(2,heap_arr)
返回[(3,'d'),(3,'c')]
In [6]: (3, 'd') > (3, 'c')
Out[6]: True
heapq.nlargest(2,heap\u arr,key=lambda:a[0])
返回[(3,'c'),(3,'d')]
因为heapq
,像排序的一样,使用了一个。由于键匹配(值为3),稳定排序将按项目在堆中出现的顺序返回项目:
In [8]: heapq.nlargest(2, [(3, 'c'), (3, 'd')], key=lambda a: a[0])
Out[8]: [(3, 'c'), (3, 'd')]
In [9]: heapq.nlargest(2, [(3, 'd'), (3, 'c')], key=lambda a: a[0])
Out[9]: [(3, 'd'), (3, 'c')]
更长的回答:
,heapq.nlagest(n,iterable,key)
相当于
sorted(iterable, key=key, reverse=True)[:n]
(尽管heapq.nlagest
以不同的方式计算其结果)。
然而,我们可以使用此等效性来检查heapq.nlargest
的行为是否符合我们的预期:
import heapq
heap_arr = [(1, 'a'), (2, 'b'), (2, 'b'), (3, 'c'), (3, 'd')]
assert heapq.nlargest(2, heap_arr) == sorted(heap_arr, reverse=True)[:2]
assert heapq.nlargest(2, heap_arr, key=lambda a: a[0]) == sorted(heap_arr, key=lambda a: a[0], reverse=True)[:2]
因此,如果您接受这种等价性,那么您只需确认
In [47]: sorted(heap_arr, reverse=True)
Out[47]: [(3, 'd'), (3, 'c'), (2, 'b'), (2, 'b'), (1, 'a')]
In [48]: sorted(heap_arr, key=lambda a: a[0], reverse=True)
Out[48]: [(3, 'c'), (3, 'd'), (2, 'b'), (2, 'b'), (1, 'a')]
使用key=lambda a:a[0]
时,(3,'c')
,(3,'d')
根据
相同的键值,3。因为,两个项目相等
键(例如(3,'c')
和(3,'d')
)在结果中的显示顺序与
它们出现在heap\u arr
中
更详细的回答:
要了解实际情况,可以使用调试器,或者简单地将heapq的代码复制到文件中,然后使用print语句来研究堆(即变量result
)在检查iterable中的元素并可能将其推送到堆中时如何更改。运行此代码:
def heapreplace(heap, item):
"""Pop and return the current smallest value, and add the new item.
This is more efficient than heappop() followed by heappush(), and can be
more appropriate when using a fixed-size heap. Note that the value
returned may be larger than item! That constrains reasonable uses of
this routine unless written as part of a conditional replacement:
if item > heap[0]:
item = heapreplace(heap, item)
"""
returnitem = heap[0] # raises appropriate IndexError if heap is empty
heap[0] = item
_siftup(heap, 0)
return returnitem
def heapify(x):
"""Transform list into a heap, in-place, in O(len(x)) time."""
n = len(x)
# Transform bottom-up. The largest index there's any point to looking at
# is the largest with a child index in-range, so must have 2*i + 1 < n,
# or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
# j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is
# (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
for i in reversed(range(n//2)):
_siftup(x, i)
# 'heap' is a heap at all indices >= startpos, except possibly for pos. pos
# is the index of a leaf with a possibly out-of-order value. Restore the
# heap invariant.
def _siftdown(heap, startpos, pos):
newitem = heap[pos]
# Follow the path to the root, moving parents down until finding a place
# newitem fits.
while pos > startpos:
parentpos = (pos - 1) >> 1
parent = heap[parentpos]
if newitem < parent:
heap[pos] = parent
pos = parentpos
continue
break
heap[pos] = newitem
def _siftup(heap, pos):
endpos = len(heap)
startpos = pos
newitem = heap[pos]
# Bubble up the smaller child until hitting a leaf.
childpos = 2*pos + 1 # leftmost child position
while childpos < endpos:
# Set childpos to index of smaller child.
rightpos = childpos + 1
if rightpos < endpos and not heap[childpos] < heap[rightpos]:
childpos = rightpos
# Move the smaller child up.
heap[pos] = heap[childpos]
pos = childpos
childpos = 2*pos + 1
# The leaf at pos is empty now. Put newitem there, and bubble it up
# to its final resting place (by sifting its parents down).
heap[pos] = newitem
_siftdown(heap, startpos, pos)
def nlargest(n, iterable, key=None):
"""Find the n largest elements in a dataset.
Equivalent to: sorted(iterable, key=key, reverse=True)[:n]
"""
# Short-cut for n==1 is to use max()
if n == 1:
it = iter(iterable)
sentinel = object()
if key is None:
result = max(it, default=sentinel)
else:
result = max(it, default=sentinel, key=key)
return [] if result is sentinel else [result]
# When n>=size, it's faster to use sorted()
try:
size = len(iterable)
except (TypeError, AttributeError):
pass
else:
if n >= size:
return sorted(iterable, key=key, reverse=True)[:n]
# When key is none, use simpler decoration
if key is None:
it = iter(iterable)
result = [(elem, i) for i, elem in zip(range(0, -n, -1), it)]
print('result: {}'.format(result))
if not result:
return result
heapify(result)
top = result[0][0]
order = -n
_heapreplace = heapreplace
for elem in it:
print('elem: {}'.format(elem))
if top < elem:
_heapreplace(result, (elem, order))
print('result: {}'.format(result))
top, _order = result[0]
order -= 1
result.sort(reverse=True)
return [elem for (elem, order) in result]
# General case, slowest method
it = iter(iterable)
result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
print('result: {}'.format(result))
if not result:
return result
heapify(result)
top = result[0][0]
order = -n
_heapreplace = heapreplace
for elem in it:
print('elem: {}'.format(elem))
k = key(elem)
if top < k:
_heapreplace(result, (k, order, elem))
print('result: {}'.format(result))
top, _order, _elem = result[0]
order -= 1
result.sort(reverse=True)
return [elem for (k, order, elem) in result]
heap_arr = [(1, 'a'), (2, 'b'), (2, 'b'), (3, 'c'), (3, 'd')]
nlargest(2, heap_arr)
print('-'*10)
nlargest(2, heap_arr, key=lambda a: a[0])
这证明了我们在第(1)行和第(2)行中看到的结果是正确的。
当元组在第一种情况下为(1),
(3,'c')
最后出现在(3,'d')
之前,而在第二种情况下,(2),
反之亦然
因此,您看到的行为源自这样一个事实:当键为None
时,iterable中的元素被放置在堆中,就像它们是(elem,order)
形式的元组一样,其中order
随着的每一个heapplace
递减1。
相反,当键不是None
时,元组的形式是(k,order,elem)
,其中k
是键(elem)
。元组形式上的这种差异导致了结果上的差异
在第一种情况下,elem
最终控制订单。在第二种情况下,
由于k
值相等,order
最终控制顺序。这个
订单的目的是以稳定的方式断开关系。所以最终我们达到了
与我们检查排序(heap_arr,key=lambda:a[0]时得出的结论相同,
反向=真)
。(3,'c')
和(3,'d')
的顺序与它们的顺序相同
当键相等时,按heap\u arr
排序
如果您希望a[0]
中的连接被a
本身断开,请使用
In [53]: heapq.nlargest(2, heap_arr, key=lambda a: (a[0], a))
Out[53]: [(3, 'd'), (3, 'c')]
In [45]: ((3, 'c'), -3) < ((3, 'd'), -4)
Out[45]: True
In [46]: (3, -4, (3, 'd')) < (3, -3, (3, 'c'))
Out[46]: True
In [53]: heapq.nlargest(2, heap_arr, key=lambda a: (a[0], a))
Out[53]: [(3, 'd'), (3, 'c')]