Python 将CSV文件与其他列组合并对齐列_Python_Pandas

Python 将CSV文件与其他列组合并对齐列

python pandas

Python 将CSV文件与其他列组合并对齐列,python,pandas,Python,Pandas,在过去3年中，我们每天都会收到交易对手的文件。这意味着我们现在有1000多个文件。根据不同的日子，每一排都有5000到15000行我试图通过谷歌搜索和VisualStudio代码研究将其与Python结合起来。为了测试，我只取了每个月最后一天的文件。总共有33个文件文件是这样的 File 1: Header_1 Header_2 Header 3 0 2 1 2 3 2 4

在过去3年中，我们每天都会收到交易对手的文件。这意味着我们现在有1000多个文件。根据不同的日子，每一排都有5000到15000行

我试图通过谷歌搜索和VisualStudio代码研究将其与Python结合起来。为了测试，我只取了每个月最后一天的文件。总共有33个文件

文件是这样的

File 1:

Header_1  Header_2      Header 3 
0         2             1
2         3             2 
4                       3  


File 2:     

Header_1   Header_4      Header_3  Header_2
6          4             3         1
8          5             4 
10

Desired Output
Header_1   Header_2   Header_3   Header_4 File_Name
0          2          1                   File 1
2          3          2                   File 1
4                     3                   File 1
6          1          3          4        File 2
8                     4          5        File 2
10

我用于尝试此操作的代码是：

import os
import pandas as pd
import glob

#set working directory
os.chdir("/filepath/")

globbed_files = glob.glob("*.csv") #creates a list of all csv files
print(globbed_files)
data = [] # pd.concat takes a list of dataframes as an agrument
for csv in globbed_files:
    frame = pd.read_csv(csv)
    data.append(frame)
    print (frame) #to check while running whether the frame was correct


bigframe = pd.concat(data, ignore_index=True, keys=globbed_files) 
bigframe.to_csv("output.csv")

如果需要，我可以放弃文件名，空单元格可以是NaN，也可以是空的。但是现在我的标题没有对齐，我将得到完全不匹配的列。

您的代码似乎正常工作。我刚刚添加了文件名列和重新排列的列

import pandas as pd
# you can use your own files here, I'm just using this to test
df1 = pd.DataFrame({"header_1":[1,2,3,4],"header_2": [2,3,4,6]})
df2 = pd.DataFrame({"header_4":[1,2,3,4],"header_3": [2,3,4,5]})

globbed_files = [df1,df2] #creates a list of all csv files
print(globbed_files)
data = [] # pd.concat takes a list of dataframes as an argument
i=1 # use this to set the file name counter
for csv in globbed_files:
    frame = csv
    frame["File_Name"] = "File " + str(i) # File_Name values are set here
    data.append(frame)
    i+=1

bigframe = pd.concat(data, ignore_index=True, keys=globbed_files)
bigframe = bigframe.reindex(sorted(df.columns), axis=1) # this arranges your columns alphabetically
bigframe = bigframe[ [ col for col in bigframe.columns if col != 'File_Name' ] + ['File_Name'] ] # takes File_Name column to the end

这似乎是与数据相关的问题。是否在

frame=pd.read\u csv（csv）

之后创建了数据帧？因此，从我在VSCode终端输入中看到的情况来看，帧似乎创建正确。