Python 自然地对列表进行排序，将字母数字值移动到末尾_Python_Sorting_Natural Sort

Python 自然地对列表进行排序，将字母数字值移动到末尾

python sorting

Python 自然地对列表进行排序，将字母数字值移动到末尾,python,sorting,natural-sort,Python,Sorting,Natural Sort,我有一个要自然排序的字符串列表： c = ['0', '1', '10', '11', '2', '2Y', '3', '3Y', '4', '4Y', '5', '5Y', '6', '7', '8', '9', '9Y'] 除了自然排序之外，我还希望将所有不是纯数字字符串的条目移到末尾。我的预期输出是： ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '2Y', '3Y', '4Y', '5Y', '9Y']

我有一个要自然排序的字符串列表：

c = ['0', '1', '10', '11', '2', '2Y', '3', '3Y', '4', '4Y', '5', '5Y', '6', '7', '8', '9', '9Y']

除了自然排序之外，我还希望将所有不是纯数字字符串的条目移到末尾。我的预期输出是：

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '2Y', '3Y', '4Y', '5Y', '9Y']

请注意，所有内容都必须进行排序，即使是字母数字字符串

我知道我可以使用

natsort

软件包来获得我想要的东西，但单凭这一点并不能满足我的需要。我需要通过两个排序调用来实现这一点——一个调用自然排序，另一个调用将非纯数字字符串移到末尾

import natsort as ns
r = sorted(ns.natsorted(c), key=lambda x: not x.isdigit())

print(r)
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '2Y', '3Y', '4Y', '5Y', '9Y']

我想知道是否可以巧妙地使用

natsort

，并将其简化为单个排序调用

natsort

有一个函数natsort\u键
，它将项转换为元组，并根据元组进行排序

因此，您可以将其用作：

sorted(c, key=lambda x: (not x.isdigit(), *ns.natsort_key(x)))

您也可以在不使用iterable解包的情况下使用它，因为在这种情况下，我们有两个2元组，如果第一个项目出现平局，它将因此比较

natsort\u键的结果

调用：

sorted(c, key=lambda x: (not x.isdigit(), ns.natsort_key(x)))

我很感谢威廉·范·昂森发布了他的答案。但是，我应该注意到，原始函数的性能要快一个数量级。考虑到PM2环的建议，以下是两种方法之间的一些基准：

设置

c = \
['0',
 '1',
 '10',
 '11',
 '2',
 '2Y',
 '3',
 '3Y',
 '4',
 '4Y',
 '5',
 '5Y',
 '6',
 '7',
 '8',
 '9',
 '9Y']
d = c * (1000000 // len(c) + 1)  # approximately 1M elements

对原版高性能的解释是因为Tim Sort似乎对几乎排序的列表进行了高度优化

健全性检查

x = sorted(d, key=lambda x: (not x.isdigit(), ns.natsort_key(x)))
y = sorted(ns.natsorted(d), key=str.isdigit, reverse=True)

all(i == j for i, j in zip(x, y))
True

实际上，您可以使用

natsorted

和正确选择

键来执行此操作
>>> ns.natsorted(d, key=lambda x: (not x.isdigit(), x))
['0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '2Y',
 '3Y',
 '4Y',
 '5Y',
 '9Y']

In [13]: %timeit sorted(d, key=lambda x: (not x.isdigit(), ns.natsort_key(x)))
1 loop, best of 3: 33.3 s per loop

In [14]: natsort_key = ns.natsort_keygen()

In [15]: %timeit sorted(d, key=lambda x: (not x.isdigit(), natsort_key(x)))
1 loop, best of 3: 11.2 s per loop

In [16]: %timeit sorted(ns.natsorted(d), key=str.isdigit, reverse=True)
1 loop, best of 3: 9.77 s per loop

In [17]: %timeit ns.natsorted(d, key=lambda x: (not x.isdigit(), x))
1 loop, best of 3: 23.8 s per loop

键返回一个元组，原始输入作为第二个元素。数字字符串放在前面，所有其他字符串放在后面，然后对子集进行单独排序
作为旁注，使用natsort\u键
，该键在natsort
3.0.4版时已被弃用（如果在解释器中打开弃用警告
，您将看到该键，并且该函数现在未被记录）。实际上效率很低。。。最好使用返回自然排序键的natort\u keygen
natsort_key
在引擎盖下调用此函数，因此对于每个输入，您都创建一个新函数，然后调用它一次
下面我重复所示的测试，并使用natsort
方法添加我的解决方案，以及使用natsort\u-keygen
而不是natsort\u-key
添加其他解决方案的计时
>>> ns.natsorted(d, key=lambda x: (not x.isdigit(), x))
['0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '2Y',
 '3Y',
 '4Y',
 '5Y',
 '9Y']

In [13]: %timeit sorted(d, key=lambda x: (not x.isdigit(), ns.natsort_key(x)))
1 loop, best of 3: 33.3 s per loop

In [14]: natsort_key = ns.natsort_keygen()

In [15]: %timeit sorted(d, key=lambda x: (not x.isdigit(), natsort_key(x)))
1 loop, best of 3: 11.2 s per loop

In [16]: %timeit sorted(ns.natsorted(d), key=str.isdigit, reverse=True)
1 loop, best of 3: 9.77 s per loop

In [17]: %timeit ns.natsorted(d, key=lambda x: (not x.isdigit(), x))
1 loop, best of 3: 23.8 s per loop

为什么要担心执行第二种排序？TimSort针对包含已排序子序列的排序序列进行了优化，因此第二次排序将非常快。您可以通过消除lambda来提高性能：key=str.isdigit，reverse=True@PM2Ring如果您能详细阐述一下并将其转化为答案，我将不胜感激！嗯，这是一个3.6版本的函数吗？@cᴏʟᴅsᴘᴇᴇᴅ: 不，我设法让它在Python-3.5.3中工作。Natsort是5.1.0。我的错，iterable解包是python3.5+（我在4）。这不是问题。@cᴏʟᴅsᴘᴇᴇᴅ: 事实上，我不认为解包是必要的，因为它会递归地对元素进行排序。没错，但是如果你在key函数中展平元组，解包可能会稍微快一点。
In [13]: %timeit sorted(d, key=lambda x: (not x.isdigit(), ns.natsort_key(x)))
1 loop, best of 3: 33.3 s per loop

In [14]: natsort_key = ns.natsort_keygen()

In [15]: %timeit sorted(d, key=lambda x: (not x.isdigit(), natsort_key(x)))
1 loop, best of 3: 11.2 s per loop

In [16]: %timeit sorted(ns.natsorted(d), key=str.isdigit, reverse=True)
1 loop, best of 3: 9.77 s per loop

In [17]: %timeit ns.natsorted(d, key=lambda x: (not x.isdigit(), x))
1 loop, best of 3: 23.8 s per loop