Excel 使用python在CSV中存储输出数据_Excel_Pandas_Csv

Excel 使用python在CSV中存储输出数据

excel pandas csv

Excel 使用python在CSV中存储输出数据,excel,pandas,csv,Excel,Pandas,Csv,我从分布在不同文件夹中的不同excel表格中提取了数据，我从2015年到2019年对文件夹进行了数字组织，每个文件夹有12个子文件夹，从1到12。以下是我的代码： import os from os import walk import pandas as pd path = r'C:\Users\Sarah\Desktop\IOMTest' my_files = [] for (dirpath, dirnames, filenames) in walk(path): my_file

我从分布在不同文件夹中的不同excel表格中提取了数据，我从2015年到2019年对文件夹进行了数字组织，每个文件夹有12个子文件夹，从1到12。以下是我的代码：

import os
from os import walk
import pandas as pd 

path = r'C:\Users\Sarah\Desktop\IOMTest'
my_files = []
for (dirpath, dirnames, filenames) in walk(path):
    my_files.extend([os.path.join(dirpath, fname) for fname in filenames])


all_sheets = []
for file_name in my_files:

    #Display sheets names using pandas
    pd.set_option('display.width',300)
    mosul_file = file_name
    xl = pd.ExcelFile(mosul_file)
    mosul_df = xl.parse(0, header=[1], index_col=[0,1,2])

    #Read Excel and Select columns

    mosul_file = pd.read_excel(file_name, sheet_name = 0 , 
    index_clo=None, na_values= ['NA'], usecols = "A, E, G, H , L , M" )

    #Remove NaN values

    data_mosul_df = mosul_file.apply (pd.to_numeric, errors='coerce')
    data_mosul_df = mosul_file.dropna()
    print(data_mosul_df)

然后，我将提取的列保存在csv文件中

def save_frames(frames, output_path):

        for frame in frames:
            frame.to_csv(output_path, mode='a+', header=False)

if __name__ == '__main__':
       frames =[pd.DataFrame(data_mosul_df)]
       save_frames(frames, r'C:\Users\Sarah\Desktop\tt\c.csv')

我的问题是，当我打开csv文件时，它似乎不存储所有数据，而只存储它读取的最后一张excel工作表，有时存储最后两张excel工作表。但是，当我在Spyder的控制台中打印数据时，我看到所有数据都被处理了

    data_mosul_df = mosul_file.apply (pd.to_numeric, errors='coerce')
    data_mosul_df = mosul_file.dropna()
    print(data_mosul_df)

下图显示了创建的输出csv。我想知道这是否是因为从A列到E列的信息是相同的？这就是为什么它会被覆盖？

我想知道如何修改代码，以便按时间顺序从文件夹2015到2019提取和存储数据，并考虑每个文件夹中1到12个accout子文件夹，以及如何创建存储所有数据的csv？谢谢你

重写你的循环：

for file_name in my_files:

    #Display sheets names using pandas
    pd.set_option('display.width',300)
    mosul_file = file_name
    xl = pd.ExcelFile(mosul_file)
    mosul_df = xl.parse(0, header=[1], index_col=[0,1,2])

    #Read Excel and Select columns
    mosul_file = pd.read_excel(file_name, sheet_name = 0 , 
    index_clo=None, na_values= ['NA'], usecols = "A, E, G, H , L , M" )

    #Remove NaN values
    data_mosul_df = mosul_file.apply (pd.to_numeric, errors='coerce')
    data_mosul_df = mosul_file.dropna()

    #Make a list of df's
    all_sheets.append(data_mosul_df)

重写保存帧：

重写主要内容：

您正在覆盖循环中的数据\u mosul\u df，您需要收集所有数据\u mosul\u df结果…首先，启动df=pd.DataFrame。其次，读取df_uu=pd.read_excel。最后一个df=pd.concat[df，df_u3;]。做第二个和最后一个循环。返回函数的结果。@SergeyBushmanov谢谢，抱歉，我有点困惑，这些步骤应该在数据\u mosul\u df之后完成？

def save_frames(frames, output_path):
    frames.to_csv(output_path, mode='a+', header=False)

if __name__ == '__main__':
   frames = pd.concat(all_sheets)
   save_frames(frames, r'C:\Users\Sarah\Desktop\tt\c.csv')