Python 将多个excel文件导入pandas并基于文件名创建列_Python_Pandas

Python 将多个excel文件导入pandas并基于文件名创建列

python pandas

Python 将多个excel文件导入pandas并基于文件名创建列,python,pandas,Python,Pandas,我在一个文件夹中有多个excel文件，我想一起读取和合并，但在合并时，我想根据文件名添加列 'D:\\156667_Report.xls', 'D:\\192059_Report.xls', 'D:\\254787_Report.xls', 'D:\\263421_Report.xls', 'D:\\273554_Report.xls', 'D:\\280163_Report.xls', 'D:\\307928_Report.xls' 我可以用下面的脚本读取熊猫中的这些文件 path =r'D

我在一个文件夹中有多个excel文件，我想一起读取和合并，但在合并时，我想根据文件名添加列

'D:\\156667_Report.xls',
'D:\\192059_Report.xls',
'D:\\254787_Report.xls',
'D:\\263421_Report.xls',
'D:\\273554_Report.xls',
'D:\\280163_Report.xls',
'D:\\307928_Report.xls'

我可以用下面的脚本读取熊猫中的这些文件

path =r'D:\' # use your path
allFiles = glob.glob(path + "/*.xls")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
   df = pd.read_excel(file_,index_col=None, header=0)
   list_.append(df)

frame = pd.concat(list_)

我想在我读取的所有文件中添加列为

code

。代码将是文件名中的数字，例如

1566667192059

为什么不匹配

foo = re.match('\.*_Report', file_)
num = foo[:6]`
df['Code']= num

在循环内部？

可以这样做的一个方法是在

列表中使用join
，isdigit

isdigit
将仅从文件名（列表中）中获取数字，而join
函数将它们重新合并为1
为了清楚起见，您可以将for
循环更改为：
for file_ in allFiles:
   df = pd.read_excel(file_,index_col=None, header=0)
   df['Code'] = ''.join(str(i) for i in file_ if i.isdigit())
   list_.append(df)

这将在每个df中添加一个名为code
的列