如何在python中读取数千个excel文件？_Python_Python 3.x_Excel_Pandas_Performance

如何在python中读取数千个excel文件？

python python-3.x excel pandas performance

如何在python中读取数千个excel文件？,python,python-3.x,excel,pandas,performance,Python,Python 3.x,Excel,Pandas,Performance,我有1000多个excel文件（.xlsx），每个文件的大小从30 MB到70 MB不等我需要阅读所有这些内容，并将它们组合成一个大数据框架 glob和pd.read_excel是一个明显的选择，但是由于pd.read_excel非常慢，所以代码运行了一整天使用python还有其他更快的方法吗？有没有可能将并行化？关于这个主题，这里有几个答案：您可以将pd.read\u csv替换为read\u excel，给出如下答案将熊猫作为pd导入导入操作系统，全局 df=pd.concat（m

我有1000多个excel文件（.xlsx），每个文件的大小从30 MB到70 MB不等

我需要阅读所有这些内容，并将它们组合成一个大数据框架

glob和pd.read_excel是一个明显的选择，但是由于pd.read_excel非常慢，所以代码运行了一整天

使用python还有其他更快的方法吗？有没有可能将并行化？

关于这个主题，这里有几个答案：

您可以将pd.read\u csv替换为read\u excel，给出如下答案

将熊猫作为pd导入
导入操作系统，全局
df=pd.concat（map（pd.read\u excel，glob.glob（os.path.join（“”，*.xlsx）））

这就像是将数据帧分块保存。将每个块附加到输出文件：

import os, pandas

folder = 'path_to/your_folder'
out_file = 'path_to/your_output.xls'

# for the first line of the out file
header = True

# find files in your_folder 
for file in  os.listdir(folder):
    # get the excel file path 
    file_path = os.path.join(folder, path)

    # create a dataframe from the excel file
    df = pandas.read_excel(file_path)

    # append df to the output file like a chunk
    df.to_csv(out_file, header=header, mode='a')

    header = False

那么结果将有50 Gb左右？假设你能做到这一点，你会用什么工具来阅读/使用它？一个大数据帧在内存方面会很昂贵，我怀疑pandas是否能处理它，你可以在上一次迭代的最后一行结束的地方继续读写dfs，这有助于复制你所说的“合并”是什么意思？若要在不进行更多管理的情况下连接这些文件，或在内存中包含所有文件的更复杂要求？@SyKer-如果所有excel文件的格式都相同。。。不，你不能将它们附加到同一个文件中，那么大的文件很快就会用完行（这就引出了一个问题，为什么它们这么大，Excel不能容纳那么多数据）。