Python 连接数据帧时出错
我试图创建一个包含大量股票的数据框架,最终将这些股票发送到MySQL数据库。我需要将所有单独的数据帧连接在一起,保持它们的名称和日期的唯一性目前我遇到的问题是,代码的连接部分抛出了一个错误,我尝试了合并,但这样做会丢失每个数据帧的名称值,因此不适合我的需要。我也研究过使用面板,但是我读到.to_sql函数仅用于数据帧。任何帮助都将不胜感激Python 连接数据帧时出错,python,pandas,Python,Pandas,我试图创建一个包含大量股票的数据框架,最终将这些股票发送到MySQL数据库。我需要将所有单独的数据帧连接在一起,保持它们的名称和日期的唯一性目前我遇到的问题是,代码的连接部分抛出了一个错误,我尝试了合并,但这样做会丢失每个数据帧的名称值,因此不适合我的需要。我也研究过使用面板,但是我读到.to_sql函数仅用于数据帧。任何帮助都将不胜感激 exchList =['A','AA','AAL','AAP','AAPL','ABBV','ABC','ABT','ACN','ADBE','ADI','A
exchList =['A','AA','AAL','AAP','AAPL','ABBV','ABC','ABT','ACN','ADBE','ADI','ADM','ADP','ADS','ADSK','AEE','AEP']
main_df = pd.DataFrame()
start = datetime.datetime(2000,1,1)
end = datetime.date.today()
for ticker in exchList:
df = web.DataReader(ticker, "yahoo",start, end)
df.reset_index(level=df.index.names, inplace=True)
if main_df.empty:
main_df = df
else:
main_df = main_df.join(df)
错误如下
ValueError: columns overlap but no suffix specified: Index(['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'], dtype='object')
if main_df.empty:
main_df = df
else:
main_df = pd.concate([main_df,df])
有一种更优雅的方法可以做到这一点——一步读取所有股票代码的数据到
Pandas.Panel
,然后将Panel
读取到DataFrame
:
In [126]: p = web.DataReader(exchList, "yahoo",start, end)
In [129]: p.to_frame()
Out[129]:
Open High Low Close Volume Adj Close
Date minor
2000-01-03 A 78.749999 78.937500 67.374999 72.000003 4674300.0 46.106304
AAPL 104.874997 112.499998 101.687501 111.937502 133949200.0 3.625643
ABC 15.500000 15.750000 15.250000 15.562500 2784800.0 3.297376
ABT 35.249948 35.999945 34.749947 34.999948 10635000.0 9.517434
ADBE 67.250000 67.500000 64.250000 65.562500 7384400.0 16.274673
ADI 93.500000 93.875000 88.000000 90.187500 3655600.0 32.584012
ADM 11.999999 12.062499 11.875000 11.999999 984600.0 7.798824
ADP 53.499906 53.937406 51.937409 51.999911 2698800.0 28.858381
ADSK 34.000000 34.625000 32.125000 33.375000 2845600.0 8.052905
AEE 32.562500 32.625000 31.562500 32.312500 700800.0 13.102718
... ... ... ... ... ... ...
2017-02-23 ABT 45.029999 45.509998 44.849998 45.400002 9389100.0 45.400002
ACN 122.589996 122.709999 121.730003 122.480003 1428000.0 122.480003
ADBE 120.099998 120.150002 118.029999 118.830002 2381700.0 118.830002
ADI 82.150002 82.160004 81.029999 81.610001 2277500.0 81.610001
ADM 44.799999 45.270000 44.490002 45.090000 3256200.0 45.090000
ADP 100.790001 101.779999 100.489998 101.639999 1459300.0 101.639999
ADS 240.589996 243.520004 239.279999 242.419998 650800.0 242.419998
ADSK 86.690002 87.370003 85.919998 87.099998 1368000.0 87.099998
AEE 54.230000 54.270000 53.689999 54.070000 1438100.0 54.070000
AEP 65.550003 66.089996 65.309998 66.010002 2272900.0 66.010002
[63153 rows x 6 columns]
您可能还需要重置多索引:
In [130]: p.to_frame().reset_index()
Out[130]:
Date minor Open High Low Close Volume Adj Close
0 2000-01-03 A 78.749999 78.937500 67.374999 72.000003 4674300.0 46.106304
1 2000-01-03 AAPL 104.874997 112.499998 101.687501 111.937502 133949200.0 3.625643
2 2000-01-03 ABC 15.500000 15.750000 15.250000 15.562500 2784800.0 3.297376
3 2000-01-03 ABT 35.249948 35.999945 34.749947 34.999948 10635000.0 9.517434
4 2000-01-03 ADBE 67.250000 67.500000 64.250000 65.562500 7384400.0 16.274673
5 2000-01-03 ADI 93.500000 93.875000 88.000000 90.187500 3655600.0 32.584012
6 2000-01-03 ADM 11.999999 12.062499 11.875000 11.999999 984600.0 7.798824
7 2000-01-03 ADP 53.499906 53.937406 51.937409 51.999911 2698800.0 28.858381
8 2000-01-03 ADSK 34.000000 34.625000 32.125000 33.375000 2845600.0 8.052905
9 2000-01-03 AEE 32.562500 32.625000 31.562500 32.312500 700800.0 13.102718
... ... ... ... ... ... ... ... ...
63143 2017-02-23 ABT 45.029999 45.509998 44.849998 45.400002 9389100.0 45.400002
63144 2017-02-23 ACN 122.589996 122.709999 121.730003 122.480003 1428000.0 122.480003
63145 2017-02-23 ADBE 120.099998 120.150002 118.029999 118.830002 2381700.0 118.830002
63146 2017-02-23 ADI 82.150002 82.160004 81.029999 81.610001 2277500.0 81.610001
63147 2017-02-23 ADM 44.799999 45.270000 44.490002 45.090000 3256200.0 45.090000
63148 2017-02-23 ADP 100.790001 101.779999 100.489998 101.639999 1459300.0 101.639999
63149 2017-02-23 ADS 240.589996 243.520004 239.279999 242.419998 650800.0 242.419998
63150 2017-02-23 ADSK 86.690002 87.370003 85.919998 87.099998 1368000.0 87.099998
63151 2017-02-23 AEE 54.230000 54.270000 53.689999 54.070000 1438100.0 54.070000
63152 2017-02-23 AEP 65.550003 66.089996 65.309998 66.010002 2272900.0 66.010002
[63153 rows x 8 columns]
此错误意味着您尝试连接的两个数据帧具有相同的列名。为了正确地连接这两个,您需要指定完全连接方法
DataFrame.join(其他,on=None,how='left',lsuffix='',rsuffix='',sort=False)
因此,您需要指定左后缀或右后缀。例如
df1.columns = ['A','B']
df2.columns = ['B','C']
将这两个列连接在一起时,如果不在“B”上连接,则需要指定要添加到任一数据帧列名的后缀,以便在连接的数据帧中不存在重复的列名
我认为您希望将这些表连接在一起,而不是连接它们。要执行此操作,请将代码更改为以下内容
ValueError: columns overlap but no suffix specified: Index(['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'], dtype='object')
if main_df.empty:
main_df = df
else:
main_df = pd.concate([main_df,df])
更好的方法是列出所有帧,然后在最后将它们合并
import pandas as pd
l_dfs = list()
for ticker in exchList:
df = web.DataReader(ticker, "yahoo",start, end)
df.reset_index(level=df.index.names, inplace=True)
l_dfs.append(df)
df = pd.concate(l_dfs)
举个例子让我们更容易帮助你。谢谢你,这正是我想要的@用户3170242,很高兴我能帮上忙:-)谢谢你接受我的回答!