Python 如何从文件名中获取信息，并将其分成几个部分保存在列表中_Python_Python 3.x_Filenames_Glob_H5py

Python 如何从文件名中获取信息，并将其分成几个部分保存在列表中

python python-3.x

Python 如何从文件名中获取信息，并将其分成几个部分保存在列表中,python,python-3.x,filenames,glob,h5py,Python,Python 3.x,Filenames,Glob,H5py,我需要将这段代码转换成一个函数，从目录中.h5文件的文件名中获取信息。我对python非常陌生，希望我在这里的解释有意义。下面是代码，下面是需要解析的数据文件名示例 atl06_dir = 'ATL06 files' filenames = glob.glob(atl06_dir + '/*h5') year_selected = 2019 filenames_selected = list() for filename in filenames: product, year, month

我需要将这段代码转换成一个函数，从目录中.h5文件的文件名中获取信息。我对python非常陌生，希望我在这里的解释有意义。下面是代码，下面是需要解析的数据文件名示例

atl06_dir = 'ATL06 files'
filenames = glob.glob(atl06_dir + '/*h5')
year_selected = 2019
filenames_selected = list()
for filename in filenames:
   product, year, month, day, hour, minute, second, track, cycle, granule, release, version = icesat2_data_utils.h5FilenameParts(os.path.basename(filename)) 
#need to replace this line with a function that grabs from the filename. This one does not work
   if int(year) == year_selected: 
      filenames_selected.append(filename)

如何让此部分读取.h5文件的文件名，并根据您在此处的文件名示例中看到的名称分隔输出名称的不同部分：

ATL06[yyyymmdd][hhmmss][ttttccss][vvv_rr].h5

我觉得如果让它读名字中的某些字符，我会走上正确的道路，比如：

# product ATL06 = 0 to 5
# year yyyy = indexes 8 to 12
# month mm = 12 to 14
# day dd = 14 to 16
# hour hh = 18 to 20
# minute mm = 20 to 22
# second ss = 22 to 24
# Reference ground track tttt = 27 to 31
# cycle cc = 31 to 33
# orbital segment ss = 33 to 35
# version vvv = 38 to 44

您可以根据自己的需求进行切片，例如

string ="ATL06_[yyyymmdd][hhmmss][ttttccss][vvv_rr].h5"
product = string[0:5]

要提取

[]

中的内容，可以使用

regex

模块：

import re
string ="ATL06_[yyyymmdd][hhmmss][ttttccss][vvv_rr].h5"
re.findall('\[(.*?)\]',string)

这将返回匹配字符串的列表：

['yyyymmdd', 'hhmmss', 'ttttccss', 'vvv_rr']

这个字符串中的每个元素都是一个列表，如果需要，可以单独切片

string ="ATL06_[yyyymmdd][hhmmss][ttttccss][vvv_rr].h5"
product = string[0:5]

要提取

[]

中的内容，可以使用

regex

模块：

import re
string ="ATL06_[yyyymmdd][hhmmss][ttttccss][vvv_rr].h5"
re.findall('\[(.*?)\]',string)

这将返回匹配字符串的列表：

['yyyymmdd', 'hhmmss', 'ttttccss', 'vvv_rr']

这个字符串中的每个元素都是一个列表，如果需要，可以将其单独切片

注意

glob.glob（）

创建一个文件名列表。使用迭代器版本可以避免这种情况：

glob.iglob（）

。对于glob.iglob（atl06_dir+'/*h5'）中的文件名，您修改的

for

循环看起来像这样的

注意

glob.glob（）

创建了一个文件名列表。使用迭代器版本可以避免这种情况：

glob.iglob（）

。对于glob.iglob（atl06_dir+'/*h5'）中的文件名，修改后的

for

循环看起来像这样