Python 按xarray中的值筛选文件列表，其中文件名包含填充数字_Python_Glob

Python 按xarray中的值筛选文件列表，其中文件名包含填充数字

python

Python 按xarray中的值筛选文件列表，其中文件名包含填充数字,python,glob,Python,Glob,我有一长串（.ca 3000个元素）文件名，格式如下： 'path/00001_type.png' 每个文件的ID都是零填充的，最高可达1000（即01000_type.png），类型可以有3个值（圆形、椭圆形、立方体）还有一个xarray exclude，它的值标识了我要以该格式从列表中排除的文件。为了引用我使用的这些值： exclude = exclude.values exclude = [5, 8, 10, 20,..., 204] 目标为所有类型生成一个列表，其中不包含在排除

我有一长串（.ca 3000个元素）文件名，格式如下：

'path/00001_type.png'

每个文件的ID都是零填充的，最高可达1000（即01000_type.png），类型可以有3个值（圆形、椭圆形、立方体）

还有一个xarray exclude，它的值标识了我要以该格式从列表中排除的文件。为了引用我使用的这些值：

exclude  = exclude.values
exclude = [5, 8, 10, 20,..., 204]

目标

为所有类型生成一个列表，其中不包含在排除列表中具有ID的文件：

files = 
['path/00001_type.png','path/00002_type.png','path/00003_type.png','path/00004_type.png','path/00006_type.png','path/00007_type.png','path/0000_type.png','path/00009_type.png', 'path/00011_type.png']

我曾尝试使用regex和glob模块来选择文件，但我无法找到正确的方法来搜索列表，并考虑填充和文件路径的其余部分

我还想知道是否有比这更有效的方法

我尝试过的例子

files = []
for file in filenames:
    for ID not in exclude:
        if file.glob("*{:05d}_type.png".format(ID)) in item_list2[1]:
             files.append(e) 
files

使用正则表达式

演示：

import re
import os

filenames = ['path/00001_type.png','path/00002_type.png','path/00003_type.png','path/00004_type.png', 'path/00005_type.png', 'path/00006_type.png','path/00007_type.png','path/00008_type.png','path/00009_type.png', 'path/00011_type.png']
exclude = [5, 8]
files = []

for file in filenames:
    m = re.search(r"(\d+)", os.path.basename(file))    #Get Int from file name
    if m:
        if int(m.group(1)) not in exclude:  #Check in exclude list
            files.append(file)
print(files)

['path/00001_type.png',
 'path/00002_type.png',
 'path/00003_type.png',
 'path/00004_type.png',
 'path/00006_type.png',
 'path/00007_type.png',
 'path/00009_type.png',
 'path/00011_type.png']

输出：

import re
import os

filenames = ['path/00001_type.png','path/00002_type.png','path/00003_type.png','path/00004_type.png', 'path/00005_type.png', 'path/00006_type.png','path/00007_type.png','path/00008_type.png','path/00009_type.png', 'path/00011_type.png']
exclude = [5, 8]
files = []

for file in filenames:
    m = re.search(r"(\d+)", os.path.basename(file))    #Get Int from file name
    if m:
        if int(m.group(1)) not in exclude:  #Check in exclude list
            files.append(file)
print(files)

['path/00001_type.png',
 'path/00002_type.png',
 'path/00003_type.png',
 'path/00004_type.png',
 'path/00006_type.png',
 'path/00007_type.png',
 'path/00009_type.png',
 'path/00011_type.png']

什么是项目清单2？“for ID not in exclude”应该做什么呢？item_list2应该是文件名，这只是我自己尝试处理的一个临时变量。“for ID not in exclude”希望能抓取“filenames”（一个列表）中没有零填充ID的文件。这对我不起作用，我想这是因为我的路径中有很多数字？真正的路径是：/home/home02/mm18lhf/.fastai/data/Nx256\u s200000.0\u N500study\u N2000train/train/Try使用

os.path.basename

Updated snippet