Python 使用Pandas合并两个CSV文件

Python 使用Pandas合并两个CSV文件,python,pandas,csv,concatenation,pandas-datareader,Python,Pandas,Csv,Concatenation,Pandas Datareader,我试图使用Pandas合并两个不同的csv文件,但在合并过程中遇到错误 第一个文件是aapl.csv,如下所示: Date Close High Low Open Volume Symbol AAPL 2017-05-25 153.87 154.3500 153.0300 153.7300 19235

我试图使用Pandas合并两个不同的csv文件,但在合并过程中遇到错误

第一个文件是aapl.csv,如下所示:

          Date   Close      High       Low      Open    Volume
Symbol                                                            
AAPL    2017-05-25  153.87  154.3500  153.0300  153.7300  19235598
AAPL    2017-05-26  153.61  154.2400  153.3100  154.0000  21927637
Corr
0.01
0.02
第二个文件是corr_column.csv,如下所示:

          Date   Close      High       Low      Open    Volume
Symbol                                                            
AAPL    2017-05-25  153.87  154.3500  153.0300  153.7300  19235598
AAPL    2017-05-26  153.61  154.2400  153.3100  154.0000  21927637
Corr
0.01
0.02
我想以一种方式合并它们,即“Corr”在“Volume”之后显示为一列

我已尝试使用pd.concat,如文档中所述:

这是我的代码:

import datetime as dt
import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
import pandas_datareader.data as web
from mpl_finance import candlestick_ohlc
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter, MonthLocator, YearLocator, DayLocator
style.use( 'ggplot' )


##start = dt.datetime( 2017, 5, 29 )
##end = dt.datetime( 2018, 5, 29 )
##
##
##df = web.DataReader( AAPL, 'morningstar', start, end )
##
##df.to_csv( aapl.csv )

df = pd.read_csv( '/Users/zubairjohal/Documents/aapl.csv' ,         parse_dates=True, index_col=0 )
df_ohlc = df


corr_data = pd.read_csv( '/Users/zubairjohal/Documents/corr_column.csv', parse_dates=True, index_col=0 )


corr_data.dropna( inplace=True )


df.dropna( inplace=True )


merged = pd.concat( [ df, corr_data ], axis=1 )

merged.to_csv( 'combine2.csv', index=False )

print( merged )
但是,在打印时,我遇到了一个错误,如下所示:

Traceback (most recent call last):
File "/Users/zubairjohal/Documents/nw5.py", line 34, in <module>
merged = pd.concat( [ df, corr_data ], axis=1 )
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 226, in concat
return op.get_result()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 423, in get_result
copy=self.copy)
File     "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-  packages/pandas/core/internals.py", line 5425, in concatenate_block_managers
return BlockManager(blocks, axes)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/internals.py", line 3282, in __init__
self._verify_integrity()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/internals.py", line 3493, in _verify_integrity
construction_error(tot_items, block.shape[1:], self.axes)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/internals.py", line 4843, in construction_error
passed, implied))
ValueError: Shape of passed values is (6, 68896), indices imply (6, 514)
回溯(最近一次呼叫最后一次):
文件“/Users/zubairjohal/Documents/nw5.py”,第34行,在
合并=pd.concat([df,corr_数据],轴=1)
concat中的文件“/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site packages/pandas/core/reformate/concat.py”,第226行
返回操作获取结果()
文件“/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site packages/pandas/core/reformate/concat.py”,第423行,在get_result中
复制=自我复制)
文件“/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/internals.py”,第5425行,位于连接块管理器中
返回块管理器(块、轴)
文件“/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site packages/pandas/core/internals.py”,第3282行,在__
自我验证完整性()
文件“/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site packages/pandas/core/internals.py”,第3493行,在“验证完整性”中
构造错误(总项目、块形状[1]、自轴)
文件“/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site packages/pandas/core/internals.py”,第4843行,在构造错误中
通过,暗示)
ValueError:传递值的形状为(668896),索引暗示为(6514)
如果您有任何建议、参考或其他选择,我们将不胜感激。

您可以尝试以下方法:

pd.concat([df_ohlc.reset_index(), corr_data], axis=1).set_index("Symbol")
产出:

        Close        Date    High     Low    Open      Volume  Corr
Symbol                                                              
AAPL   153.87  2017-05-25  154.35  153.03  153.73  19235598.0  0.01
AAPL   153.61  2017-05-26  154.24  153.31  154.00  21927637.0  0.02

如果您的数据帧以AAPL作为索引,而corr没有索引的方式打印,则此方法有效。

谢谢您的输入。我尝试了您的更改,但得到一个新错误“Key error:'Index'”。我已经检查过了,这两个文件的行数相同,即262行。有什么建议吗?事实上,我道歉。粘贴数据时,我漏掉了一个列标题。我已经用df_ohlc数据的样子更新了我的帖子。请看一看@MohammedALANI@zcdp我做了一个编辑,只需将
索引
替换为
符号
@MohammedALANI,就可以了。谢谢同时,如果我有一个csv文件,类似于core_列,数据作为列表而不是列(例如:[0.01,0.02….1.00],在这种情况下如何进行合并?@zcdp无需担心!请验证我的答案,以便与您有相同问题的人能够轻松找到解决方案