python heapq排序列表错误？_Python_Sorting

python heapq排序列表错误？

python sorting

python heapq排序列表错误？,python,sorting,Python,Sorting,我试图将列表排序为一个列表，其中包含章节、子章节和子章节的编号和名称。程序如下所示： import heapq sections = ['1. Section', '2. Section', '3. Section', '4. Section', '5. Section', '6. Section', '7. Section', '8. Section', '9. Section', '10. Section', '11. Section', '12. Section'] subsection

我试图将列表排序为一个列表，其中包含章节、子章节和子章节的编号和名称。程序如下所示：

import heapq

sections = ['1. Section', '2. Section', '3. Section', '4. Section', '5. Section', '6. Section', '7. Section', '8. Section', '9. Section', '10. Section', '11. Section', '12. Section']
subsections = ['1.1 Subsection', '1.2 Subsection', '1.3 Subsection', '1.4 Subsection', '2.1 Subsection', '4.1 My subsection', '7.1 Subsection', '8.1 Subsection', '12.1 Subsection']
subsubsections = ['1.2.1 Subsubsection', '1.2.2 Subsubsection', '1.4.1 Subsubsection', '2.1.1 Subsubsection', '7.1.1 Subsubsection', '8.1.1 Subsubsection', '12.1.1 Subsubsection']

sorted_list = list(heapq.merge(sections, subsections, subsubsections))

print(sorted_list)

我得到的是：

['1. Section', '1.1 Subsection', '1.2 Subsection', '1.2.1 Subsubsection', '1.2.2 Subsubsection', '1.3 Subsection', '1.4 Subsection', '1.4.1 Subsubsection', '2. Section', '2.1 Subsection', '2.1.1 Subsubsection', '3. Section', '4. Section', '4.1 My subsection', '5. Section', '6. Section', '7. Section', '7.1 Subsection', '7.1.1 Subsubsection', '8. Section', '8.1 Subsection', '12.1 Subsection', '8.1.1 Subsubsection', '12.1.1 Subsubsection', '9. Section', '10. Section', '11. Section', '12. Section']

我的第12小节和第8小节位于第8小节，而不是第12小节

为什么会这样？最初的列表被排序，一切都很顺利，显然排到了第10位

我不知道为什么会发生这种情况，有没有办法根据列表中的数字更好地将其分类到“树”中？我正在构建一个目录，它将返回（一旦我过滤掉列表）

请注意8.1小节后面的12.1小节和8.1.1小节后面的12.1.1子小节。

您的列表在人眼看来可能会排序。但是对于Python，您的输入不是完全排序的，因为它按字典顺序对字符串进行排序。这意味着

'12'

按排序顺序排在

'8'

之前，因为只比较第一个字符

因此，合并是完全正确的；在看到

'8.1'

字符串后，会遇到以

'12.1'

开头的字符串，但以

'8.1.1'

开头的字符串随后会被排序

必须使用键函数从字符串中提取整数元组才能正确排序：

section = lambda s: [int(d) for d in s.partition(' ')[0].split('.') if d]
heapq.merge(sections, subsections, subsubsections, key=section))

请注意，

键

参数仅在Python 3.5及更高版本中可用；在早期的版本中，你必须进行手工装饰合并未装饰的舞蹈

演示（使用Python 3.6）：

键控合并很容易后端口到Python 3.3和3.4：

import heapq

def _heappop_max(heap):
    lastelt = heap.pop()
    if heap:
        returnitem = heap[0]
        heap[0] = lastelt
        heapq._siftup_max(heap, 0)
        return returnitem
    return lastelt

def _heapreplace_max(heap, item):
    returnitem = heap[0]
    heap[0] = item
    heapq._siftup_max(heap, 0)
    return returnitem

def merge(*iterables, key=None, reverse=False):    
    h = []
    h_append = h.append

    if reverse:
        _heapify = heapq._heapify_max
        _heappop = _heappop_max
        _heapreplace = _heapreplace_max
        direction = -1
    else:
        _heapify = heapify
        _heappop = heappop
        _heapreplace = heapreplace
        direction = 1

    if key is None:
        for order, it in enumerate(map(iter, iterables)):
            try:
                next = it.__next__
                h_append([next(), order * direction, next])
            except StopIteration:
                pass
        _heapify(h)
        while len(h) > 1:
            try:
                while True:
                    value, order, next = s = h[0]
                    yield value
                    s[0] = next()           # raises StopIteration when exhausted
                    _heapreplace(h, s)      # restore heap condition
            except StopIteration:
                _heappop(h)                 # remove empty iterator
        if h:
            # fast case when only a single iterator remains
            value, order, next = h[0]
            yield value
            yield from next.__self__
        return

    for order, it in enumerate(map(iter, iterables)):
        try:
            next = it.__next__
            value = next()
            h_append([key(value), order * direction, value, next])
        except StopIteration:
            pass
    _heapify(h)
    while len(h) > 1:
        try:
            while True:
                key_value, order, value, next = s = h[0]
                yield value
                value = next()
                s[0] = key(value)
                s[2] = value
                _heapreplace(h, s)
        except StopIteration:
            _heappop(h)
    if h:
        key_value, order, value, next = h[0]
        yield value
        yield from next.__self__

装饰排序-取消装饰合并非常简单，如下所示：

def decorate(iterable, key):
    for elem in iterable:
        yield key(elem), elem

sorted = [v for k, v in heapq.merge(
    decorate(sections, section), decorate(subsections, section)
    decorate(subsubsections, section))]

因为您的输入已经排序，所以使用合并排序更有效。最后，您可以使用

sorted（）

但是：

from itertools import chain
result = sorted(chain(sections, subsections, subsubsections), key=section)

您的列表在人眼看来可能会排序。但是对于Python，您的输入不是完全排序的，因为它按字典顺序对字符串进行排序。这意味着

'12'

按排序顺序排在

'8'

之前，因为只比较第一个字符

因此，合并是完全正确的；在看到

'8.1'

字符串后，会遇到以

'12.1'

开头的字符串，但以

'8.1.1'

开头的字符串随后会被排序

必须使用键函数从字符串中提取整数元组才能正确排序：

section = lambda s: [int(d) for d in s.partition(' ')[0].split('.') if d]
heapq.merge(sections, subsections, subsubsections, key=section))

请注意，

键

参数仅在Python 3.5及更高版本中可用；在早期的版本中，你必须进行手工装饰合并未装饰的舞蹈

演示（使用Python 3.6）：

键控合并很容易后端口到Python 3.3和3.4：

import heapq

def _heappop_max(heap):
    lastelt = heap.pop()
    if heap:
        returnitem = heap[0]
        heap[0] = lastelt
        heapq._siftup_max(heap, 0)
        return returnitem
    return lastelt

def _heapreplace_max(heap, item):
    returnitem = heap[0]
    heap[0] = item
    heapq._siftup_max(heap, 0)
    return returnitem

def merge(*iterables, key=None, reverse=False):    
    h = []
    h_append = h.append

    if reverse:
        _heapify = heapq._heapify_max
        _heappop = _heappop_max
        _heapreplace = _heapreplace_max
        direction = -1
    else:
        _heapify = heapify
        _heappop = heappop
        _heapreplace = heapreplace
        direction = 1

    if key is None:
        for order, it in enumerate(map(iter, iterables)):
            try:
                next = it.__next__
                h_append([next(), order * direction, next])
            except StopIteration:
                pass
        _heapify(h)
        while len(h) > 1:
            try:
                while True:
                    value, order, next = s = h[0]
                    yield value
                    s[0] = next()           # raises StopIteration when exhausted
                    _heapreplace(h, s)      # restore heap condition
            except StopIteration:
                _heappop(h)                 # remove empty iterator
        if h:
            # fast case when only a single iterator remains
            value, order, next = h[0]
            yield value
            yield from next.__self__
        return

    for order, it in enumerate(map(iter, iterables)):
        try:
            next = it.__next__
            value = next()
            h_append([key(value), order * direction, value, next])
        except StopIteration:
            pass
    _heapify(h)
    while len(h) > 1:
        try:
            while True:
                key_value, order, value, next = s = h[0]
                yield value
                value = next()
                s[0] = key(value)
                s[2] = value
                _heapreplace(h, s)
        except StopIteration:
            _heappop(h)
    if h:
        key_value, order, value, next = h[0]
        yield value
        yield from next.__self__

装饰排序-取消装饰合并非常简单，如下所示：

def decorate(iterable, key):
    for elem in iterable:
        yield key(elem), elem

sorted = [v for k, v in heapq.merge(
    decorate(sections, section), decorate(subsections, section)
    decorate(subsubsections, section))]

因为您的输入已经排序，所以使用合并排序更有效。最后，您可以使用

sorted（）

但是：

from itertools import chain
result = sorted(chain(sections, subsections, subsubsections), key=section)

正如在其他答案中所解释的，您必须指定一个排序方法，否则python将按字典顺序对字符串进行排序。如果您使用的是python 3.5+，则可以在

merge

函数中使用

key

参数，在python 3.5中，您可以使用

itertools.chain

和

sorted

，作为一种通用方法，您可以使用regex来查找数字并将其转换为int：

In [18]: from itertools import chain
In [19]: import re
In [23]: sorted(chain.from_iterable((sections, subsections, subsubsections)),
                key = lambda x: [int(i) for i in re.findall(r'\d+', x)])
Out[23]: 
['1. Section',
 '1.1 Subsection',
 '1.2 Subsection',
 '1.2.1 Subsubsection',
 '1.2.2 Subsubsection',
 '1.3 Subsection',
 '1.4 Subsection',
 '1.4.1 Subsubsection',
 '2. Section',
 '2.1 Subsection',
 '2.1.1 Subsubsection',
 '3. Section',
 '4. Section',
 '4.1 My subsection',
 '5. Section',
 '6. Section',
 '7. Section',
 '7.1 Subsection',
 '7.1.1 Subsubsection',
 '8. Section',
 '8.1 Subsection',
 '8.1.1 Subsubsection',
 '9. Section',
 '10. Section',
 '11. Section',
 '12. Section',
 '12.1 Subsection',
 '12.1.1 Subsubsection']

正如在其他答案中所解释的，您必须指定一个排序方法，否则python将按字典顺序对字符串进行排序。如果您使用的是python 3.5+，则可以在

merge

函数中使用

key

参数，在python 3.5中，您可以使用

itertools.chain

和

sorted

，作为一种通用方法，您可以使用regex来查找数字并将其转换为int：

In [18]: from itertools import chain
In [19]: import re
In [23]: sorted(chain.from_iterable((sections, subsections, subsubsections)),
                key = lambda x: [int(i) for i in re.findall(r'\d+', x)])
Out[23]: 
['1. Section',
 '1.1 Subsection',
 '1.2 Subsection',
 '1.2.1 Subsubsection',
 '1.2.2 Subsubsection',
 '1.3 Subsection',
 '1.4 Subsection',
 '1.4.1 Subsubsection',
 '2. Section',
 '2.1 Subsection',
 '2.1.1 Subsubsection',
 '3. Section',
 '4. Section',
 '4.1 My subsection',
 '5. Section',
 '6. Section',
 '7. Section',
 '7.1 Subsection',
 '7.1.1 Subsubsection',
 '8. Section',
 '8.1 Subsection',
 '8.1.1 Subsubsection',
 '9. Section',
 '10. Section',
 '11. Section',
 '12. Section',
 '12.1 Subsection',
 '12.1.1 Subsubsection']

因为它是按字符串的字典顺序操作的，而不是你的版本作为“数字”…因为它是按字符串的字典顺序操作的，而不是你的版本作为“数字”…那么有没有其他排序算法可以代替堆来做这件事呢？我正在为sublime text 3做一个插件，所以它使用的是Python 3，但我不确定是哪一个正确尝试了它，肯定是版本<3.5，因为我得到了

TypeError:merge（）得到了一个意外的关键字参数“key”

@dingo\u d sublime是Python 3.3，所以你必须输入元组；第一个元素是section函数的输出，第二个元素是原始字符串。然后您可以合并，然后提取。您也可以使用

sorted（）

函数而不是合并。感谢您的帮助，Kasramvd的答案成功了，所以我接受他的答案+1作为解释：）那么有没有其他排序算法可以代替堆来完成这个任务呢？我正在为sublime text 3做一个插件，所以它使用的是Python 3，但我不确定是哪一个正确尝试了它，肯定是版本<3.5，因为我得到了

TypeError:merge（）得到了一个意外的关键字参数“key”

@dingo\u d sublime是Python 3.3，所以你必须输入元组；第一个元素是section函数的输出，第二个元素是原始字符串。然后您可以合并，然后提取。您也可以使用

sorted（）

函数而不是合并。感谢您的帮助，Kasramvd的答案成功了，所以我接受他的答案+1解释如下：）