带有键函数的heapq.nlarest的输出更改顺序（Python）_Python

带有键函数的heapq.nlarest的输出更改顺序（Python）

python

带有键函数的heapq.nlarest的输出更改顺序（Python）,python,Python,有人能解释一下，当使用只有第一个参数的键函数调用NLAGEST函数时，为什么输出顺序会发生变化 import heapq heap_arr = [(1, 'a'), (2, 'b'), (2, 'b'), (3, 'c'), (3, 'd')] heapq.nlargest(2, x) # Perfectly fine - OP is [(3, 'd'), (3, 'c')] # This is similar to heapq.nlargest(2, x, key=lambda a: (a[

有人能解释一下，当使用只有第一个参数的键函数调用NLAGEST函数时，为什么输出顺序会发生变化

import heapq
heap_arr = [(1, 'a'), (2, 'b'), (2, 'b'), (3, 'c'), (3, 'd')]

heapq.nlargest(2, x)
# Perfectly fine - OP is [(3, 'd'), (3, 'c')]
# This is similar to heapq.nlargest(2, x, key=lambda a: (a[0], a[1]))

heapq.nlargest(2, x, key=lambda a: a[0])
# OP is [(3, 'c'), (3, 'd')]... Why ??

为什么（3，'c'）在第二个示例中出现在（3，'d'）之前。这个问题背后的原因是输出列表中元组的顺序很重要。

简短回答：

heapq.nlargest（2，heap_arr）

[（3，'d'），（3，'c'）]

In [6]: (3, 'd') > (3, 'c')
Out[6]: True

heapq.nlargest（2，heap\u arr，key=lambda:a[0]）

[（3，'c'），（3，'d'）]

因为

heapq

，像

排序的一样，使用了一个。由于键匹配（值为3），稳定排序将按项目在堆中出现的顺序返回项目：
In [8]: heapq.nlargest(2, [(3, 'c'), (3, 'd')], key=lambda a: a[0])
Out[8]: [(3, 'c'), (3, 'd')]

In [9]: heapq.nlargest(2, [(3, 'd'), (3, 'c')], key=lambda a: a[0])
Out[9]: [(3, 'd'), (3, 'c')]


更长的回答：
，heapq.nlagest（n，iterable，key）
相当于
sorted(iterable, key=key, reverse=True)[:n]

（尽管heapq.nlagest以不同的方式计算其结果）。
然而，我们可以使用此等效性来检查heapq.nlargest
的行为是否符合我们的预期：
import heapq
heap_arr = [(1, 'a'), (2, 'b'), (2, 'b'), (3, 'c'), (3, 'd')]

assert heapq.nlargest(2, heap_arr) == sorted(heap_arr, reverse=True)[:2]

assert heapq.nlargest(2, heap_arr, key=lambda a: a[0]) == sorted(heap_arr, key=lambda a: a[0], reverse=True)[:2]

因此，如果您接受这种等价性，那么您只需确认
In [47]: sorted(heap_arr, reverse=True)
Out[47]: [(3, 'd'), (3, 'c'), (2, 'b'), (2, 'b'), (1, 'a')]

In [48]: sorted(heap_arr, key=lambda a: a[0], reverse=True)
Out[48]: [(3, 'c'), (3, 'd'), (2, 'b'), (2, 'b'), (1, 'a')]

使用key=lambda a:a[0]
时，（3，'c'）
，（3，'d'）根据
相同的键值，3。因为，两个项目相等
键（例如（3，'c'）
和（3，'d'））在结果中的显示顺序与
它们出现在heap\u arr
中

更详细的回答：
要了解实际情况，可以使用调试器，或者简单地将heapq的代码复制到文件中，然后使用print语句来研究堆（即变量result
）在检查iterable中的元素并可能将其推送到堆中时如何更改。运行此代码：
def heapreplace(heap, item):
    """Pop and return the current smallest value, and add the new item.

    This is more efficient than heappop() followed by heappush(), and can be
    more appropriate when using a fixed-size heap.  Note that the value
    returned may be larger than item!  That constrains reasonable uses of
    this routine unless written as part of a conditional replacement:

        if item > heap[0]:
            item = heapreplace(heap, item)
    """
    returnitem = heap[0]    # raises appropriate IndexError if heap is empty
    heap[0] = item
    _siftup(heap, 0)
    return returnitem

def heapify(x):
    """Transform list into a heap, in-place, in O(len(x)) time."""
    n = len(x)
    # Transform bottom-up.  The largest index there's any point to looking at
    # is the largest with a child index in-range, so must have 2*i + 1 < n,
    # or i < (n-1)/2.  If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
    # j-1 is the largest, which is n//2 - 1.  If n is odd = 2*j+1, this is
    # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
    for i in reversed(range(n//2)):
        _siftup(x, i)

# 'heap' is a heap at all indices >= startpos, except possibly for pos.  pos
# is the index of a leaf with a possibly out-of-order value.  Restore the
# heap invariant.
def _siftdown(heap, startpos, pos):
    newitem = heap[pos]
    # Follow the path to the root, moving parents down until finding a place
    # newitem fits.
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = heap[parentpos]
        if newitem < parent:
            heap[pos] = parent
            pos = parentpos
            continue
        break
    heap[pos] = newitem


def _siftup(heap, pos):
    endpos = len(heap)
    startpos = pos
    newitem = heap[pos]
    # Bubble up the smaller child until hitting a leaf.
    childpos = 2*pos + 1    # leftmost child position
    while childpos < endpos:
        # Set childpos to index of smaller child.
        rightpos = childpos + 1
        if rightpos < endpos and not heap[childpos] < heap[rightpos]:
            childpos = rightpos
        # Move the smaller child up.
        heap[pos] = heap[childpos]
        pos = childpos
        childpos = 2*pos + 1
    # The leaf at pos is empty now.  Put newitem there, and bubble it up
    # to its final resting place (by sifting its parents down).
    heap[pos] = newitem
    _siftdown(heap, startpos, pos)


def nlargest(n, iterable, key=None):
    """Find the n largest elements in a dataset.

    Equivalent to:  sorted(iterable, key=key, reverse=True)[:n]
    """

    # Short-cut for n==1 is to use max()
    if n == 1:
        it = iter(iterable)
        sentinel = object()
        if key is None:
            result = max(it, default=sentinel)
        else:
            result = max(it, default=sentinel, key=key)
        return [] if result is sentinel else [result]

    # When n>=size, it's faster to use sorted()
    try:
        size = len(iterable)
    except (TypeError, AttributeError):
        pass
    else:
        if n >= size:
            return sorted(iterable, key=key, reverse=True)[:n]

    # When key is none, use simpler decoration
    if key is None:
        it = iter(iterable)
        result = [(elem, i) for i, elem in zip(range(0, -n, -1), it)]
        print('result: {}'.format(result))
        if not result:
            return result
        heapify(result)
        top = result[0][0]
        order = -n
        _heapreplace = heapreplace
        for elem in it:
            print('elem: {}'.format(elem))
            if top < elem:
                _heapreplace(result, (elem, order))
                print('result: {}'.format(result))
                top, _order = result[0]
                order -= 1
        result.sort(reverse=True)
        return [elem for (elem, order) in result]

    # General case, slowest method
    it = iter(iterable)
    result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
    print('result: {}'.format(result))
    if not result:
        return result
    heapify(result)
    top = result[0][0]
    order = -n
    _heapreplace = heapreplace
    for elem in it:
        print('elem: {}'.format(elem))
        k = key(elem)
        if top < k:
            _heapreplace(result, (k, order, elem))
            print('result: {}'.format(result))
            top, _order, _elem = result[0]
            order -= 1
    result.sort(reverse=True)
    return [elem for (k, order, elem) in result]


heap_arr = [(1, 'a'), (2, 'b'), (2, 'b'), (3, 'c'), (3, 'd')]

nlargest(2, heap_arr)
print('-'*10)
nlargest(2, heap_arr, key=lambda a: a[0]) 

这证明了我们在第（1）行和第（2）行中看到的结果是正确的。
当元组在第一种情况下为（1），
（3，'c'）
最后出现在（3，'d'）之前，而在第二种情况下，（2），
反之亦然
因此，您看到的行为源自这样一个事实：当键为None
时，iterable中的元素被放置在堆中，就像它们是（elem，order）
形式的元组一样，其中order
随着的每一个heapplace递减1。
相反，当键不是None
时，元组的形式是（k，order，elem）
，其中k
是键（elem）
。元组形式上的这种差异导致了结果上的差异
在第一种情况下，elem最终控制订单。在第二种情况下，
由于k
值相等，order最终控制顺序。这个
订单的目的是以稳定的方式断开关系。所以最终我们达到了
与我们检查排序（heap_arr，key=lambda:a[0]时得出的结论相同，
反向=真）
。（3，'c'）
和（3，'d'）的顺序与它们的顺序相同
当键相等时，按heap\u arr
排序
如果您希望a[0]
中的连接被a
本身断开，请使用
In [53]: heapq.nlargest(2, heap_arr, key=lambda a: (a[0], a))
Out[53]: [(3, 'd'), (3, 'c')]

In [45]: ((3, 'c'), -3) < ((3, 'd'), -4)
Out[45]: True

In [46]: (3, -4, (3, 'd')) < (3, -3, (3, 'c'))
Out[46]: True

In [53]: heapq.nlargest(2, heap_arr, key=lambda a: (a[0], a))
Out[53]: [(3, 'd'), (3, 'c')]