Python 连接数据帧时出错

Python 连接数据帧时出错,python,pandas,Python,Pandas,我试图创建一个包含大量股票的数据框架,最终将这些股票发送到MySQL数据库。我需要将所有单独的数据帧连接在一起,保持它们的名称和日期的唯一性目前我遇到的问题是,代码的连接部分抛出了一个错误,我尝试了合并,但这样做会丢失每个数据帧的名称值,因此不适合我的需要。我也研究过使用面板,但是我读到.to_sql函数仅用于数据帧。任何帮助都将不胜感激 exchList =['A','AA','AAL','AAP','AAPL','ABBV','ABC','ABT','ACN','ADBE','ADI','A

我试图创建一个包含大量股票的数据框架,最终将这些股票发送到MySQL数据库。我需要将所有单独的数据帧连接在一起,保持它们的名称和日期的唯一性目前我遇到的问题是,代码的连接部分抛出了一个错误,我尝试了合并,但这样做会丢失每个数据帧的名称值,因此不适合我的需要。我也研究过使用面板,但是我读到.to_sql函数仅用于数据帧。任何帮助都将不胜感激

exchList =['A','AA','AAL','AAP','AAPL','ABBV','ABC','ABT','ACN','ADBE','ADI','ADM','ADP','ADS','ADSK','AEE','AEP']
main_df = pd.DataFrame()
start = datetime.datetime(2000,1,1)
end =  datetime.date.today()



for ticker in exchList:
   df = web.DataReader(ticker, "yahoo",start, end)
   df.reset_index(level=df.index.names, inplace=True)
   if main_df.empty:
       main_df = df
   else:
       main_df = main_df.join(df)
错误如下

ValueError: columns overlap but no suffix specified: Index(['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'], dtype='object')
   if main_df.empty:
       main_df = df
   else:
       main_df = pd.concate([main_df,df])

有一种更优雅的方法可以做到这一点——一步读取所有股票代码的数据到
Pandas.Panel
,然后将
Panel
读取到
DataFrame

In [126]: p = web.DataReader(exchList, "yahoo",start, end)

In [129]: p.to_frame()
Out[129]:
                        Open        High         Low       Close       Volume   Adj Close
Date       minor
2000-01-03 A       78.749999   78.937500   67.374999   72.000003    4674300.0   46.106304
           AAPL   104.874997  112.499998  101.687501  111.937502  133949200.0    3.625643
           ABC     15.500000   15.750000   15.250000   15.562500    2784800.0    3.297376
           ABT     35.249948   35.999945   34.749947   34.999948   10635000.0    9.517434
           ADBE    67.250000   67.500000   64.250000   65.562500    7384400.0   16.274673
           ADI     93.500000   93.875000   88.000000   90.187500    3655600.0   32.584012
           ADM     11.999999   12.062499   11.875000   11.999999     984600.0    7.798824
           ADP     53.499906   53.937406   51.937409   51.999911    2698800.0   28.858381
           ADSK    34.000000   34.625000   32.125000   33.375000    2845600.0    8.052905
           AEE     32.562500   32.625000   31.562500   32.312500     700800.0   13.102718
...                      ...         ...         ...         ...          ...         ...
2017-02-23 ABT     45.029999   45.509998   44.849998   45.400002    9389100.0   45.400002
           ACN    122.589996  122.709999  121.730003  122.480003    1428000.0  122.480003
           ADBE   120.099998  120.150002  118.029999  118.830002    2381700.0  118.830002
           ADI     82.150002   82.160004   81.029999   81.610001    2277500.0   81.610001
           ADM     44.799999   45.270000   44.490002   45.090000    3256200.0   45.090000
           ADP    100.790001  101.779999  100.489998  101.639999    1459300.0  101.639999
           ADS    240.589996  243.520004  239.279999  242.419998     650800.0  242.419998
           ADSK    86.690002   87.370003   85.919998   87.099998    1368000.0   87.099998
           AEE     54.230000   54.270000   53.689999   54.070000    1438100.0   54.070000
           AEP     65.550003   66.089996   65.309998   66.010002    2272900.0   66.010002

[63153 rows x 6 columns]
您可能还需要重置多索引:

In [130]: p.to_frame().reset_index()
Out[130]:
            Date minor        Open        High         Low       Close       Volume   Adj Close
0     2000-01-03     A   78.749999   78.937500   67.374999   72.000003    4674300.0   46.106304
1     2000-01-03  AAPL  104.874997  112.499998  101.687501  111.937502  133949200.0    3.625643
2     2000-01-03   ABC   15.500000   15.750000   15.250000   15.562500    2784800.0    3.297376
3     2000-01-03   ABT   35.249948   35.999945   34.749947   34.999948   10635000.0    9.517434
4     2000-01-03  ADBE   67.250000   67.500000   64.250000   65.562500    7384400.0   16.274673
5     2000-01-03   ADI   93.500000   93.875000   88.000000   90.187500    3655600.0   32.584012
6     2000-01-03   ADM   11.999999   12.062499   11.875000   11.999999     984600.0    7.798824
7     2000-01-03   ADP   53.499906   53.937406   51.937409   51.999911    2698800.0   28.858381
8     2000-01-03  ADSK   34.000000   34.625000   32.125000   33.375000    2845600.0    8.052905
9     2000-01-03   AEE   32.562500   32.625000   31.562500   32.312500     700800.0   13.102718
...          ...   ...         ...         ...         ...         ...          ...         ...
63143 2017-02-23   ABT   45.029999   45.509998   44.849998   45.400002    9389100.0   45.400002
63144 2017-02-23   ACN  122.589996  122.709999  121.730003  122.480003    1428000.0  122.480003
63145 2017-02-23  ADBE  120.099998  120.150002  118.029999  118.830002    2381700.0  118.830002
63146 2017-02-23   ADI   82.150002   82.160004   81.029999   81.610001    2277500.0   81.610001
63147 2017-02-23   ADM   44.799999   45.270000   44.490002   45.090000    3256200.0   45.090000
63148 2017-02-23   ADP  100.790001  101.779999  100.489998  101.639999    1459300.0  101.639999
63149 2017-02-23   ADS  240.589996  243.520004  239.279999  242.419998     650800.0  242.419998
63150 2017-02-23  ADSK   86.690002   87.370003   85.919998   87.099998    1368000.0   87.099998
63151 2017-02-23   AEE   54.230000   54.270000   53.689999   54.070000    1438100.0   54.070000
63152 2017-02-23   AEP   65.550003   66.089996   65.309998   66.010002    2272900.0   66.010002

[63153 rows x 8 columns]

此错误意味着您尝试连接的两个数据帧具有相同的列名。为了正确地连接这两个,您需要指定完全连接方法
DataFrame.join(其他,on=None,how='left',lsuffix='',rsuffix='',sort=False)

因此,您需要指定左后缀或右后缀。例如

df1.columns = ['A','B']
df2.columns = ['B','C']
将这两个列连接在一起时,如果不在“B”上连接,则需要指定要添加到任一数据帧列名的后缀,以便在连接的数据帧中不存在重复的列名


我认为您希望将这些表连接在一起,而不是连接它们。要执行此操作,请将代码更改为以下内容

ValueError: columns overlap but no suffix specified: Index(['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'], dtype='object')
   if main_df.empty:
       main_df = df
   else:
       main_df = pd.concate([main_df,df])
更好的方法是列出所有帧,然后在最后将它们合并

import pandas as pd
l_dfs = list()
for ticker in exchList:
   df = web.DataReader(ticker, "yahoo",start, end)
   df.reset_index(level=df.index.names, inplace=True)
   l_dfs.append(df)
df = pd.concate(l_dfs)

举个例子让我们更容易帮助你。谢谢你,这正是我想要的@用户3170242,很高兴我能帮上忙:-)谢谢你接受我的回答!