python heapq排序列表错误?
我试图将列表排序为一个列表,其中包含章节、子章节和子章节的编号和名称。程序如下所示:python heapq排序列表错误?,python,sorting,Python,Sorting,我试图将列表排序为一个列表,其中包含章节、子章节和子章节的编号和名称。程序如下所示: import heapq sections = ['1. Section', '2. Section', '3. Section', '4. Section', '5. Section', '6. Section', '7. Section', '8. Section', '9. Section', '10. Section', '11. Section', '12. Section'] subsection
import heapq
sections = ['1. Section', '2. Section', '3. Section', '4. Section', '5. Section', '6. Section', '7. Section', '8. Section', '9. Section', '10. Section', '11. Section', '12. Section']
subsections = ['1.1 Subsection', '1.2 Subsection', '1.3 Subsection', '1.4 Subsection', '2.1 Subsection', '4.1 My subsection', '7.1 Subsection', '8.1 Subsection', '12.1 Subsection']
subsubsections = ['1.2.1 Subsubsection', '1.2.2 Subsubsection', '1.4.1 Subsubsection', '2.1.1 Subsubsection', '7.1.1 Subsubsection', '8.1.1 Subsubsection', '12.1.1 Subsubsection']
sorted_list = list(heapq.merge(sections, subsections, subsubsections))
print(sorted_list)
我得到的是:
['1. Section', '1.1 Subsection', '1.2 Subsection', '1.2.1 Subsubsection', '1.2.2 Subsubsection', '1.3 Subsection', '1.4 Subsection', '1.4.1 Subsubsection', '2. Section', '2.1 Subsection', '2.1.1 Subsubsection', '3. Section', '4. Section', '4.1 My subsection', '5. Section', '6. Section', '7. Section', '7.1 Subsection', '7.1.1 Subsubsection', '8. Section', '8.1 Subsection', '12.1 Subsection', '8.1.1 Subsubsection', '12.1.1 Subsubsection', '9. Section', '10. Section', '11. Section', '12. Section']
我的第12小节和第8小节位于第8小节,而不是第12小节
为什么会这样?最初的列表被排序,一切都很顺利,显然排到了第10位
我不知道为什么会发生这种情况,有没有办法根据列表中的数字更好地将其分类到“树”中?我正在构建一个目录,它将返回(一旦我过滤掉列表)
请注意8.1小节后面的12.1小节和8.1.1小节后面的12.1.1子小节。您的列表在人眼看来可能会排序。但是对于Python,您的输入不是完全排序的,因为它按字典顺序对字符串进行排序。这意味着
'12'
按排序顺序排在'8'
之前,因为只比较第一个字符
因此,合并是完全正确的;在看到'8.1'
字符串后,会遇到以'12.1'
开头的字符串,但以'8.1.1'
开头的字符串随后会被排序
必须使用键函数从字符串中提取整数元组才能正确排序:
section = lambda s: [int(d) for d in s.partition(' ')[0].split('.') if d]
heapq.merge(sections, subsections, subsubsections, key=section))
请注意,键
参数仅在Python 3.5及更高版本中可用;在早期的版本中,你必须进行手工装饰合并未装饰的舞蹈
演示(使用Python 3.6):
键控合并很容易后端口到Python 3.3和3.4:
import heapq
def _heappop_max(heap):
lastelt = heap.pop()
if heap:
returnitem = heap[0]
heap[0] = lastelt
heapq._siftup_max(heap, 0)
return returnitem
return lastelt
def _heapreplace_max(heap, item):
returnitem = heap[0]
heap[0] = item
heapq._siftup_max(heap, 0)
return returnitem
def merge(*iterables, key=None, reverse=False):
h = []
h_append = h.append
if reverse:
_heapify = heapq._heapify_max
_heappop = _heappop_max
_heapreplace = _heapreplace_max
direction = -1
else:
_heapify = heapify
_heappop = heappop
_heapreplace = heapreplace
direction = 1
if key is None:
for order, it in enumerate(map(iter, iterables)):
try:
next = it.__next__
h_append([next(), order * direction, next])
except StopIteration:
pass
_heapify(h)
while len(h) > 1:
try:
while True:
value, order, next = s = h[0]
yield value
s[0] = next() # raises StopIteration when exhausted
_heapreplace(h, s) # restore heap condition
except StopIteration:
_heappop(h) # remove empty iterator
if h:
# fast case when only a single iterator remains
value, order, next = h[0]
yield value
yield from next.__self__
return
for order, it in enumerate(map(iter, iterables)):
try:
next = it.__next__
value = next()
h_append([key(value), order * direction, value, next])
except StopIteration:
pass
_heapify(h)
while len(h) > 1:
try:
while True:
key_value, order, value, next = s = h[0]
yield value
value = next()
s[0] = key(value)
s[2] = value
_heapreplace(h, s)
except StopIteration:
_heappop(h)
if h:
key_value, order, value, next = h[0]
yield value
yield from next.__self__
装饰排序-取消装饰合并非常简单,如下所示:
def decorate(iterable, key):
for elem in iterable:
yield key(elem), elem
sorted = [v for k, v in heapq.merge(
decorate(sections, section), decorate(subsections, section)
decorate(subsubsections, section))]
因为您的输入已经排序,所以使用合并排序更有效。最后,您可以使用sorted()
但是:
from itertools import chain
result = sorted(chain(sections, subsections, subsubsections), key=section)
您的列表在人眼看来可能会排序。但是对于Python,您的输入不是完全排序的,因为它按字典顺序对字符串进行排序。这意味着
'12'
按排序顺序排在'8'
之前,因为只比较第一个字符
因此,合并是完全正确的;在看到'8.1'
字符串后,会遇到以'12.1'
开头的字符串,但以'8.1.1'
开头的字符串随后会被排序
必须使用键函数从字符串中提取整数元组才能正确排序:
section = lambda s: [int(d) for d in s.partition(' ')[0].split('.') if d]
heapq.merge(sections, subsections, subsubsections, key=section))
请注意,键
参数仅在Python 3.5及更高版本中可用;在早期的版本中,你必须进行手工装饰合并未装饰的舞蹈
演示(使用Python 3.6):
键控合并很容易后端口到Python 3.3和3.4:
import heapq
def _heappop_max(heap):
lastelt = heap.pop()
if heap:
returnitem = heap[0]
heap[0] = lastelt
heapq._siftup_max(heap, 0)
return returnitem
return lastelt
def _heapreplace_max(heap, item):
returnitem = heap[0]
heap[0] = item
heapq._siftup_max(heap, 0)
return returnitem
def merge(*iterables, key=None, reverse=False):
h = []
h_append = h.append
if reverse:
_heapify = heapq._heapify_max
_heappop = _heappop_max
_heapreplace = _heapreplace_max
direction = -1
else:
_heapify = heapify
_heappop = heappop
_heapreplace = heapreplace
direction = 1
if key is None:
for order, it in enumerate(map(iter, iterables)):
try:
next = it.__next__
h_append([next(), order * direction, next])
except StopIteration:
pass
_heapify(h)
while len(h) > 1:
try:
while True:
value, order, next = s = h[0]
yield value
s[0] = next() # raises StopIteration when exhausted
_heapreplace(h, s) # restore heap condition
except StopIteration:
_heappop(h) # remove empty iterator
if h:
# fast case when only a single iterator remains
value, order, next = h[0]
yield value
yield from next.__self__
return
for order, it in enumerate(map(iter, iterables)):
try:
next = it.__next__
value = next()
h_append([key(value), order * direction, value, next])
except StopIteration:
pass
_heapify(h)
while len(h) > 1:
try:
while True:
key_value, order, value, next = s = h[0]
yield value
value = next()
s[0] = key(value)
s[2] = value
_heapreplace(h, s)
except StopIteration:
_heappop(h)
if h:
key_value, order, value, next = h[0]
yield value
yield from next.__self__
装饰排序-取消装饰合并非常简单,如下所示:
def decorate(iterable, key):
for elem in iterable:
yield key(elem), elem
sorted = [v for k, v in heapq.merge(
decorate(sections, section), decorate(subsections, section)
decorate(subsubsections, section))]
因为您的输入已经排序,所以使用合并排序更有效。最后,您可以使用sorted()
但是:
from itertools import chain
result = sorted(chain(sections, subsections, subsubsections), key=section)
正如在其他答案中所解释的,您必须指定一个排序方法,否则python将按字典顺序对字符串进行排序。如果您使用的是python 3.5+,则可以在
merge
函数中使用key
参数,在python 3.5中,您可以使用itertools.chain
和sorted
,作为一种通用方法,您可以使用regex来查找数字并将其转换为int:
In [18]: from itertools import chain
In [19]: import re
In [23]: sorted(chain.from_iterable((sections, subsections, subsubsections)),
key = lambda x: [int(i) for i in re.findall(r'\d+', x)])
Out[23]:
['1. Section',
'1.1 Subsection',
'1.2 Subsection',
'1.2.1 Subsubsection',
'1.2.2 Subsubsection',
'1.3 Subsection',
'1.4 Subsection',
'1.4.1 Subsubsection',
'2. Section',
'2.1 Subsection',
'2.1.1 Subsubsection',
'3. Section',
'4. Section',
'4.1 My subsection',
'5. Section',
'6. Section',
'7. Section',
'7.1 Subsection',
'7.1.1 Subsubsection',
'8. Section',
'8.1 Subsection',
'8.1.1 Subsubsection',
'9. Section',
'10. Section',
'11. Section',
'12. Section',
'12.1 Subsection',
'12.1.1 Subsubsection']
正如在其他答案中所解释的,您必须指定一个排序方法,否则python将按字典顺序对字符串进行排序。如果您使用的是python 3.5+,则可以在
merge
函数中使用key
参数,在python 3.5中,您可以使用itertools.chain
和sorted
,作为一种通用方法,您可以使用regex来查找数字并将其转换为int:
In [18]: from itertools import chain
In [19]: import re
In [23]: sorted(chain.from_iterable((sections, subsections, subsubsections)),
key = lambda x: [int(i) for i in re.findall(r'\d+', x)])
Out[23]:
['1. Section',
'1.1 Subsection',
'1.2 Subsection',
'1.2.1 Subsubsection',
'1.2.2 Subsubsection',
'1.3 Subsection',
'1.4 Subsection',
'1.4.1 Subsubsection',
'2. Section',
'2.1 Subsection',
'2.1.1 Subsubsection',
'3. Section',
'4. Section',
'4.1 My subsection',
'5. Section',
'6. Section',
'7. Section',
'7.1 Subsection',
'7.1.1 Subsubsection',
'8. Section',
'8.1 Subsection',
'8.1.1 Subsubsection',
'9. Section',
'10. Section',
'11. Section',
'12. Section',
'12.1 Subsection',
'12.1.1 Subsubsection']
因为它是按字符串的字典顺序操作的,而不是你的版本作为“数字”…因为它是按字符串的字典顺序操作的,而不是你的版本作为“数字”…那么有没有其他排序算法可以代替堆来做这件事呢?我正在为sublime text 3做一个插件,所以它使用的是Python 3,但我不确定是哪一个正确尝试了它,肯定是版本<3.5,因为我得到了
TypeError:merge()得到了一个意外的关键字参数“key”
@dingo\u d sublime是Python 3.3,所以你必须输入元组;第一个元素是section函数的输出,第二个元素是原始字符串。然后您可以合并,然后提取。您也可以使用sorted()
函数而不是合并。感谢您的帮助,Kasramvd的答案成功了,所以我接受他的答案+1作为解释:)那么有没有其他排序算法可以代替堆来完成这个任务呢?我正在为sublime text 3做一个插件,所以它使用的是Python 3,但我不确定是哪一个正确尝试了它,肯定是版本<3.5,因为我得到了TypeError:merge()得到了一个意外的关键字参数“key”
@dingo\u d sublime是Python 3.3,所以你必须输入元组;第一个元素是section函数的输出,第二个元素是原始字符串。然后您可以合并,然后提取。您也可以使用sorted()
函数而不是合并。感谢您的帮助,Kasramvd的答案成功了,所以我接受他的答案+1解释如下:)