Pandas 如何在循环期间通过在列上连接列来合并多个csv文件
这是我的问题。我有100个文件,它们都有两列:“时间坡度”和“坡度”。我想创建一个包含所有内容的文件。以下是一个例子:Pandas 如何在循环期间通过在列上连接列来合并多个csv文件,pandas,csv,merge,iteration,multiple-columns,Pandas,Csv,Merge,Iteration,Multiple Columns,这是我的问题。我有100个文件,它们都有两列:“时间坡度”和“坡度”。我想创建一个包含所有内容的文件。以下是一个例子: -----file 1---- 2001.1 10 2001.2 20 2001.3 12 2001.4 4 2001.5 1 2001.6 13 -----file 2---- 2001.3 20 2001.4 15 2001.5 6 -----file 3---- 2001.6 15 2
-----file 1----
2001.1 10
2001.2 20
2001.3 12
2001.4 4
2001.5 1
2001.6 13
-----file 2----
2001.3 20
2001.4 15
2001.5 6
-----file 3----
2001.6 15
2001.7 15
2001.8 15
2001.9 20
2002.0 23
**The expected result is:**
------- output file ---------
date file1 file2 file3
2001.1 10 NAN NAN
2001.2 20 NAN NAN
2001.3 12 NAN NAN
2001.4 4 15 NAN
2001.5 1 6 NAN
2001.6 13 NAN 15
2001.7 NAN NAN 15
2001.8 NAN NAN 15
2001.9 NAN NAN 20
2002.0 NAN NAN 23
以下是我尝试过的:
import pandas as pd
import os, glob
import numpy as np
filename_list = []
file_path = r"C:\Users\Path"
for file in glob.glob(path + "/*.csv"):
filename_list.append(file)
from numpy import genfromtxt
df_ini = pd.read_csv('output.csv') #IN FILE OUTPUT THERE ARE ALREADY TWO COLUMNS WITH VALUES
df_ini.columns=['time_slopes','slope']
for filename in filename_list:
with open(filename, 'r') as f:
# convert numpy array into DataFrame
numpyarray = genfromtxt(f, delimiter=',')
df = pd.DataFrame({'time_slopes':numpyarray[:, 0],'slope':numpyarray[:, 1]})
# remove NaN values:
df = df.dropna(how='all')
# re-index file:
df.reset_index(drop=True, inplace=True)
# merge file:
dfmerge = df_ini.merge(df,on='time_slopes',how='left')
dfmerge.to_csv("output.csv", sep=',', index=False)
这段代码只返回两列——第一列(来自df_ini)和最后一列(来自文件号100)……在每次迭代中,最后一列被重写而不是添加。
日期文件1文件100
2001.1 10南
有人知道怎么解决这个问题吗?
谢谢 这可能会对你有所帮助
file_1 = pd.DataFrame({'date': [2001.1, 2001.2, 2001.3], 'slope': [10, 20, 12]})
file_2 = pd.DataFrame({'date': [2001.4, 2001.5, 2001.6], 'slope': [20, 15, 6]})
file_3 = pd.DataFrame({'date': [2001.6, 2001.7, 2001.8], 'slope': [30, 40, 90]})
df_list = [file_1, file_2, file_3]
for df in df_list:
df.index = df['date']
df.drop(['date'], axis=1, inplace=True)
final_df = pd.concat(df_list, axis=1, ignore_index=True)
final_df = final_df.reset_index()
print(final_df)
输出:
date 0 1 2
0 2001.1 10.0 NaN NaN
1 2001.2 20.0 NaN NaN
2 2001.3 12.0 NaN NaN
3 2001.4 NaN 20.0 NaN
4 2001.5 NaN 15.0 NaN
5 2001.6 NaN 6.0 30.0
6 2001.7 NaN NaN 40.0
7 2001.8 NaN NaN 90.0