Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/331.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python,从数据帧开始创建新数据_Python_Pandas_Dataframe - Fatal编程技术网

Python,从数据帧开始创建新数据

Python,从数据帧开始创建新数据,python,pandas,dataframe,Python,Pandas,Dataframe,原始电子表格有两列。我想根据给定的条件(根据月份)选择行,并将它们放入新文件中 原始文件如下所示: 我正在使用的代码: 导入操作系统 作为pd进口熊猫 working_folder = "C:\\My Documents\\" file_list = ["Jan.xlsx", "Feb.xlsx", "Mar.xlsx"] with open(working_folder + '201703-1.csv', 'a') as f03: for fl in file_list:

原始电子表格有两列。我想根据给定的条件(根据月份)选择行,并将它们放入新文件中

原始文件如下所示:

我正在使用的代码: 导入操作系统 作为pd进口熊猫

working_folder = "C:\\My Documents\\"

file_list = ["Jan.xlsx", "Feb.xlsx", "Mar.xlsx"]

with open(working_folder + '201703-1.csv', 'a') as f03:
    for fl in file_list:
        df = pd.read_excel(working_folder + fl)
        df_201703 = df[df.ARRIVAL.between(20170301, 20170331)] 
        df_201703.to_csv(f03, header = True)

with open(working_folder + '201702-1.csv', 'a') as f02:
    for fl in file_list:
        df = pd.read_excel(working_folder + fl)
        df_201702 = df[df.ARRIVAL.between(20170201, 20170231)] 
        df_201702.to_csv(f02, header = True)

with open(working_folder + '201701-1.csv', 'a') as f01:
    for fl in file_list:
        df = pd.read_excel(working_folder + fl)
        df_201701 = df[df.ARRIVAL.between(20170101, 20170131)] 
        df_201701.to_csv(f01, header = True)
结果如下:

我想作出的改进:

  • 将它们另存为xlsx文件,而不是.csv文件
  • 没有第一个索引列
  • 仅保留1行(顶部)标题(现在每个csv有3行标题)

  • 我该怎么做?谢谢。

    我认为需要一起创建数据帧的
    列表,
    concat
    ,然后写入文件:

    dfs1 = []
    
    for fl in file_list:
        df = pd.read_excel(working_folder + fl)
        dfs1.append(df[df.ARRIVAL.between(20170101, 20170131)] )
    
    pd.concat(dfs1).to_excel('201701-1.xlsx', index = False)
    
    应通过列表理解简化的内容:

    file_list = ["Jan.xlsx", "Feb.xlsx", "Mar.xlsx"]
    dfs1 = [pd.read_excel(working_folder + fl).query('20170101 >= ARRIVAL >=20170131') for fl in file_list]
    
    pd.concat(dfs1).to_excel('201701-1.xlsx', index = False)