Python 如何按数字而不是字符串对列表排序？_Python_Regex_List_Sorting

Python 如何按数字而不是字符串对列表排序？

python regex list sorting

Python 如何按数字而不是字符串对列表排序？,python,regex,list,sorting,Python,Regex,List,Sorting,我有以下代码： import glob, os outdir = './output/' nstring = 'testdat_2014-12-31' nfilelist = sorted(glob.glob((outdir+'/*{}*.nc').format(nstring))) 我从中获得nfilelist： ['testdat_2014-12-31-21_H1.nc', 'testdat_2014-12-31-21_H10.nc', 'testdat_2014-12-31-21_H

我有以下代码：

import glob, os
outdir = './output/'
nstring = 'testdat_2014-12-31'
nfilelist = sorted(glob.glob((outdir+'/*{}*.nc').format(nstring)))

我从中获得

nfilelist

：

['testdat_2014-12-31-21_H1.nc',
 'testdat_2014-12-31-21_H10.nc',
 'testdat_2014-12-31-21_H11.nc',
 'testdat_2014-12-31-21_H12.nc',
 'testdat_2014-12-31-21_H2.nc',
 'testdat_2014-12-31-21_H3.nc',
 'testdat_2014-12-31-21_H4.nc',
 'testdat_2014-12-31-21_H5.nc',
 'testdat_2014-12-31-21_H6.nc',
 'testdat_2014-12-31-21_H7.nc',
 'testdat_2014-12-31-21_H8.nc',
 'testdat_2014-12-31-21_H9.nc']

最后的H1-H12数字反映了我想要如何对其进行排序。但现在，H10-H12被夹在中间。如何从H1-H12排序

Regex不是我的强项，我无法前进

我试着分开，走了这么远：

nfilelist[0].split('_')[-1].split('.')
['H1', 'nc']

假设您希望它们按int值排序，则可以按以下方式使用：

import re

nfiles  = ['testdat_2014-12-31-21_H1.nc',
 'testdat_2014-12-31-21_H10.nc',
 'testdat_2014-12-31-21_H11.nc',
 'testdat_2014-12-31-21_H12.nc',
 'testdat_2014-12-31-21_H2.nc',
 'testdat_2014-12-31-21_H3.nc',
 'testdat_2014-12-31-21_H4.nc',
 'testdat_2014-12-31-21_H5.nc',
 'testdat_2014-12-31-21_H6.nc',
 'testdat_2014-12-31-21_H7.nc',
 'testdat_2014-12-31-21_H8.nc',
 'testdat_2014-12-31-21_H9.nc']

result = sorted(nfiles, key=lambda x: int(re.search('H(\d+)\.nc', x).group(1)))

print(result)

输出

['testdat_2014-12-31-21_H1.nc', 'testdat_2014-12-31-21_H2.nc', 'testdat_2014-12-31-21_H3.nc', 'testdat_2014-12-31-21_H4.nc', 'testdat_2014-12-31-21_H5.nc', 'testdat_2014-12-31-21_H6.nc', 'testdat_2014-12-31-21_H7.nc', 'testdat_2014-12-31-21_H8.nc', 'testdat_2014-12-31-21_H9.nc', 'testdat_2014-12-31-21_H10.nc', 'testdat_2014-12-31-21_H11.nc', 'testdat_2014-12-31-21_H12.nc']

解释

模式

'H（\d+）\.nc'

表示匹配任何一组数字

（\d+）

，前面是

，后面是

.nc

。并使用

.group（1）

获取数字组。然后将这些数字组转换为

int

，并将它们用作排序的键

无正则表达式

如果要完全避免使用正则表达式，请使用以下函数作为键：

def key(element):
    digits = (ix for ix in element.split('_')[-1] if ix.isdigit())
    return int(''.join(digits))

result = sorted(nfiles, key=key)

print(result)

注意

最后，如果要按字符串值排序，只需删除对int函数的调用。

假设要按int值排序，可以按以下方式使用：

import re

nfiles  = ['testdat_2014-12-31-21_H1.nc',
 'testdat_2014-12-31-21_H10.nc',
 'testdat_2014-12-31-21_H11.nc',
 'testdat_2014-12-31-21_H12.nc',
 'testdat_2014-12-31-21_H2.nc',
 'testdat_2014-12-31-21_H3.nc',
 'testdat_2014-12-31-21_H4.nc',
 'testdat_2014-12-31-21_H5.nc',
 'testdat_2014-12-31-21_H6.nc',
 'testdat_2014-12-31-21_H7.nc',
 'testdat_2014-12-31-21_H8.nc',
 'testdat_2014-12-31-21_H9.nc']

result = sorted(nfiles, key=lambda x: int(re.search('H(\d+)\.nc', x).group(1)))

print(result)

输出

['testdat_2014-12-31-21_H1.nc', 'testdat_2014-12-31-21_H2.nc', 'testdat_2014-12-31-21_H3.nc', 'testdat_2014-12-31-21_H4.nc', 'testdat_2014-12-31-21_H5.nc', 'testdat_2014-12-31-21_H6.nc', 'testdat_2014-12-31-21_H7.nc', 'testdat_2014-12-31-21_H8.nc', 'testdat_2014-12-31-21_H9.nc', 'testdat_2014-12-31-21_H10.nc', 'testdat_2014-12-31-21_H11.nc', 'testdat_2014-12-31-21_H12.nc']

解释

模式

'H（\d+）\.nc'

表示匹配任何一组数字

（\d+）

，前面是

，后面是

.nc

。并使用

.group（1）

获取数字组。然后将这些数字组转换为

int

，并将它们用作排序的键

无正则表达式

如果要完全避免使用正则表达式，请使用以下函数作为键：

def key(element):
    digits = (ix for ix in element.split('_')[-1] if ix.isdigit())
    return int(''.join(digits))

result = sorted(nfiles, key=key)

print(result)

注意

最后，如果要按字符串值排序，只需删除对int函数的调用。

而不是

sorted（）

函数，请使用模块中的

natsorted（）

函数：

（名称

natsort

表示自然排序，与词典排序相反。）

使用模块中的

natsorted（）

函数代替

sorted（）

函数：

（名称

natsort

表示自然排序，而不是字典排序）。

您排序的名称具有简单而规则的结构；您可以不调用正则表达式而生存下来。通过将名称的第一部分置于“_H”之后，然后将其第一部分置于“.”之前，并将结果转换为整数，对名称进行排序：

sorted(nfilelist, 
       key=lambda x: int(x.split("_H")[1].split(".")[0]))
#['testdat_2014-12-31-21_H1.nc', 'testdat_2014-12-31-21_H2.nc', 
# 'testdat_2014-12-31-21_H3.nc', 'testdat_2014-12-31-21_H4.nc', 
# 'testdat_2014-12-31-21_H5.nc', 'testdat_2014-12-31-21_H6.nc', 
# 'testdat_2014-12-31-21_H7.nc', 'testdat_2014-12-31-21_H8.nc', 
# 'testdat_2014-12-31-21_H9.nc', 'testdat_2014-12-31-21_H10.nc', 
# 'testdat_2014-12-31-21_H11.nc', 'testdat_2014-12-31-21_H12.nc']

您排序的名称具有简单且规则的结构；您可以不调用正则表达式而生存下来。通过将名称的第一部分置于“_H”之后，然后将其第一部分置于“.”之前，并将结果转换为整数，对名称进行排序：

sorted(nfilelist, 
       key=lambda x: int(x.split("_H")[1].split(".")[0]))
#['testdat_2014-12-31-21_H1.nc', 'testdat_2014-12-31-21_H2.nc', 
# 'testdat_2014-12-31-21_H3.nc', 'testdat_2014-12-31-21_H4.nc', 
# 'testdat_2014-12-31-21_H5.nc', 'testdat_2014-12-31-21_H6.nc', 
# 'testdat_2014-12-31-21_H7.nc', 'testdat_2014-12-31-21_H8.nc', 
# 'testdat_2014-12-31-21_H9.nc', 'testdat_2014-12-31-21_H10.nc', 
# 'testdat_2014-12-31-21_H11.nc', 'testdat_2014-12-31-21_H12.nc']

无需使用正则表达式就可以实现这一点

result = sorted(nfilelist, key=lambda x: (len(x), x))

该键首先将这些文件名与

越长的数字越大

如果数字的长度相同，则比较数字或字符串的长度相同

与其他答案的速度比较如下：

| Method            | Timing                       |
+-------------------+------------------------------+
| Using natsort     | 219 µs  ± 1.13 µs per loop   |
| Daniel's regex    | 14.2 µs ± 434  ns per loop   |
| Daniel's no-regex | 14.2 µs ± 101  ns per loop   |
| DYZ's split based | 7.50 µs ± 240  ns per loop   |
| This answer       | 2.77 µs ± 46.6 ns per loop   |

计时是使用在2.7 GHz Intel Core i7上运行的iPython3.7中的

%timeit

获得的。您无需使用正则表达式即可实现此目的

result = sorted(nfilelist, key=lambda x: (len(x), x))

该键首先将这些文件名与

越长的数字越大

如果数字的长度相同，则比较数字或字符串的长度相同

与其他答案的速度比较如下：

| Method            | Timing                       |
+-------------------+------------------------------+
| Using natsort     | 219 µs  ± 1.13 µs per loop   |
| Daniel's regex    | 14.2 µs ± 434  ns per loop   |
| Daniel's no-regex | 14.2 µs ± 101  ns per loop   |
| DYZ's split based | 7.50 µs ± 240  ns per loop   |
| This answer       | 2.77 µs ± 46.6 ns per loop   |

使用运行在2.7 GHz Intel Core i7上的iPython3.7中的

%timeit

获得计时。请参阅@maximusdooku是否要按int值或字符串值排序？请参阅@maximusdooku是否要按int值或字符串值排序？如果您有一个名为testdat_2018-12-31-21_H0.nc；的文件，此解决方案将失败。）@LakshayGarg根据OP，这是不可能的：

nstring='testdat_2014-12-31'

。如果您有一个名为testdat_2018-12-31-21_H0.nc；的文件，此解决方案将失败。）@LakshayGarg根据OP，这是不可能的：

nstring='testdat_2014-12-31'

。