如何在Python中从文件名中提取数字?
我只需要从文件名中提取数字,例如: 间隙点1.shp 间隙点23.shp 间隙点109.shp如何在Python中从文件名中提取数字?,python,Python,我只需要从文件名中提取数字,例如: 间隙点1.shp 间隙点23.shp 间隙点109.shp 如何使用Python从这些文件中提取数字?我需要将它合并到一个for循环中。因此,您没有留下任何关于这些文件的位置和获取方式的描述,但我假设您可以使用该模块获取文件名 至于从名称中提取数字,最好使用正则表达式,如下所示: import re def get_numbers_from_filename(filename): return re.search(r'\d+', filename).g
如何使用Python从这些文件中提取数字?我需要将它合并到一个
for
循环中。因此,您没有留下任何关于这些文件的位置和获取方式的描述,但我假设您可以使用该模块获取文件名
至于从名称中提取数字,最好使用正则表达式,如下所示:
import re
def get_numbers_from_filename(filename):
return re.search(r'\d+', filename).group(0)
然后,要将其包含在for循环中,您需要在每个文件名上运行该函数:
for filename in os.listdir(myfiledirectory):
print get_numbers_from_filename(filename)
或者类似的东西。您可以使用正则表达式:
regex = re.compile(r'\d+')
然后,要获取匹配的字符串:
regex.findall(filename)
这将返回包含数字的字符串列表。如果确实需要整数,可以使用int
:
[int(x) for x in regex.findall(filename)]
如果每个文件名中只有一个数字,您可以使用
regex.search(filename.group)(0)
(如果您确定它将生成匹配项)。如果未找到匹配项,则上行将生成一个AttributeError,表示如果只有一个数字,则非类型
没有属性组
:
filter(lambda x: x.isdigit(), filename)
Hear是我的代码,我从google scholar下载文件后,将发表论文的年份带到文件名的第一个位置。 主文件通常是这样构造的:Author+publishedYear.pdf因此,通过实现此代码,文件名将变成:publishedYear+Author.pdf
# Renaming Pdf according to number extraction
# You want to rename a pdf file, so the digits of document published year comes first.
# Use regular expersion
# As long as you implement this file, the other pattern will be accomplished to your filename.
# import libraries
import re
import os
# Change working directory to this folder
address = os.getcwd ()
os.chdir(address)
# defining a class with two function
class file_name:
# Define a function to extract any digits
def __init__ (self, filename):
self.filename = filename
# Because we have tow pattern, we must define tow function.
# First function for pattern as : schrodinger1990.pdf
def number_extrction_pattern_non_digits_first (filename):
pattern = (r'(\D+)(\d+)(\.pdf)')
digits_pattern_non_digits_first = re.search(pattern, filename, re.IGNORECASE).group (2)
non_digits_pattern_non_digits_first = re.search(pattern, filename, re.IGNORECASE).group (1)
return digits_pattern_non_digits_first, non_digits_pattern_non_digits_first
# Second function for pattern as : 1993schrodinger.pdf
def number_extrction_pattern_digits_first (filename):
pattern = (r'(\d+)(\D+)(\.pdf)')
digits_pattern_digits_first = re.search(pattern, filename, re.IGNORECASE).group (1)
non_digits_pattern_digits_first = re.search(pattern, filename, re.IGNORECASE).group (2)
return digits_pattern_digits_first, non_digits_pattern_digits_first
if __name__ == '__main__':
# Define a pattern to check filename pattern
pattern_check1 = (r'(\D+)(\d+)(\.pdf)')
# Declare each file address.
for filename in os.listdir(address):
if filename.endswith('.pdf'):
if re.search(pattern_check1, filename, re.IGNORECASE):
digits = file_name.number_extrction_pattern_non_digits_first (filename)[0]
non_digits = file_name.number_extrction_pattern_non_digits_first (filename)[1]
os.rename(filename, digits + non_digits + '.pdf')
# Else other pattern exists.
else :
digits = file_name.number_extrction_pattern_digits_first (filename)[0]
non_digits = file_name.number_extrction_pattern_digits_first (filename)[1]
os.rename(filename, digits + non_digits + '.pdf')