Python 合并具有重叠列的数据帧

Python 合并具有重叠列的数据帧,python,pandas,Python,Pandas,我有两个数据帧,可以使用以下代码创建: import yfinance as yf symbols = ['QQQ', 'GBTC'] df1 = yf.download(symbols, start="2019-01-01", end="2019-01-07") symbols = ['GBTC', 'TLT'] df2 = yf.download(symbols, start="2019-01-01", end="20

我有两个数据帧,可以使用以下代码创建:

import yfinance as yf

symbols = ['QQQ', 'GBTC']
df1 = yf.download(symbols, start="2019-01-01", end="2019-01-07")

symbols = ['GBTC', 'TLT']
df2 = yf.download(symbols, start="2019-01-01", end="2019-01-07")
df1
df2
的内容如下

> df1
           Adj Close              Close              High               Low  \
                GBTC         QQQ   GBTC         QQQ  GBTC         QQQ  GBTC   
Date                                                                          
2018-12-31     3.965  152.132996  3.965  154.259995  4.15  154.979996  3.95   
2019-01-02     4.620  152.744461  4.620  154.880005  4.65  155.750000  4.13   
2019-01-03     4.520  147.754257  4.520  149.820007  4.62  153.259995  4.32   
2019-01-04     4.530  154.075851  4.530  156.229996  4.65  157.000000  4.41   

                         Open               Volume            
                   QQQ   GBTC         QQQ     GBTC       QQQ  
Date                                                          
2018-12-31  152.710007  4.140  154.470001  3829000  53015300  
2019-01-02  150.880005  4.155  150.990005  2948200  58576700  
2019-01-03  149.490005  4.325  152.600006  1503000  74820200  
2019-01-04  151.740005  4.585  152.339996  2020700  74709300

> df2
           Adj Close              Close              High               Low  \
                GBTC         TLT   GBTC         TLT  GBTC         TLT  GBTC   
Date                                                                          
2018-12-31     3.965  116.845848  3.965  121.510002  4.15  121.559998  3.95   
2019-01-02     4.620  117.461304  4.620  122.150002  4.65  122.160004  4.13   
2019-01-03     4.520  118.797966  4.520  123.540001  4.62  123.860001  4.32   
2019-01-04     4.530  117.422844  4.530  122.110001  4.65  122.559998  4.41   

                         Open               Volume            
                   TLT   GBTC         TLT     GBTC       TLT  
Date                                                          
2018-12-31  120.459999  4.140  120.650002  3829000  17409000  
2019-01-02  121.339996  4.155  121.660004  2948200  19841500  
2019-01-03  122.230003  4.325  122.290001  1503000  21187000  
2019-01-04  121.650002  4.585  122.339996  2020700  12970200  
df1
df2
都包含
GBTC

如何将
df1
df2
合并到具有以下内容的新数据帧中

> df3
           Adj Close                          Close                          \
                GBTC         QQQ         TLT   GBTC         QQQ         TLT   
Date                                                                          
2018-12-31     3.965  152.132996  116.845848  3.965  154.259995  121.510002   
2019-01-02     4.620  152.744461  117.461304  4.620  154.880005  122.150002   
2019-01-03     4.520  147.754257  118.797966  4.520  149.820007  123.540001   
2019-01-04     4.530  154.075851  117.422844  4.530  156.229996  122.110001   

            High                           Low                           Open  \
            GBTC         QQQ         TLT  GBTC         QQQ         TLT   GBTC   
Date                                                                            
2018-12-31  4.15  154.979996  121.559998  3.95  152.710007  120.459999  4.140   
2019-01-02  4.65  155.750000  122.160004  4.13  150.880005  121.339996  4.155   
2019-01-03  4.62  153.259995  123.860001  4.32  149.490005  122.230003  4.325   
2019-01-04  4.65  157.000000  122.559998  4.41  151.740005  121.650002  4.585   

                                     Volume                      
                   QQQ         TLT     GBTC       QQQ       TLT  
Date                                                             
2018-12-31  154.470001  120.650002  3829000  53015300  17409000  
2019-01-02  150.990005  121.660004  2948200  58576700  19841500  
2019-01-03  152.600006  122.290001  1503000  74820200  21187000  
2019-01-04  152.339996  122.339996  2020700  74709300  12970200 
我可能有多个重叠的列

这似乎无法实现我的目标。

  • unstack()
  • 从df2中选择新值作为首选项
  • 使用
    pivot()
日期 ('Adj Close','GBTC') ('Adj Close','QQQ') ('Adj Close','TLT') (“关闭”、“GBTC”) (“关闭”、“QQQ”) (“关闭”、“TLT”) (“高”、“GBTC”) (“高”、“QQQ”) (“高”、“TLT”) (“低”、“GBTC”) (“低”、“QQQ”) (“低”、“TLT”) (“开放”、“GBTC”) (“开放”、“QQQ”) (“开放”、“TLT”) (“卷”、“GBTC”) (“卷”、“QQQ”) (“卷”、“TLT”) 2019-01-02 00:00:00 4.62 152.744 117.461 4.62 154.88 122.15 4.65 155.75 122.16 4.13 150.88 121.34 4.155 150.99 121.66 2.9482e+06 5.85767e+07 1.98415e+07 2019-01-03 00:00:00 4.52 147.754 118.798 4.52 149.82 123.54 4.62 153.26 123.86 4.32 149.49 122.23 4.325 152.6 122.29 1.503e+06 7.48202e+07 2.1187e+07 2019-01-04 00:00:00 4.53 154.076 117.423 4.53 156.23 122.11 4.65 157 122.56 4.41 151.74 121.65 4.585 152.34 122.34 2.0207e+06 7.47093e+07 1.29702e+07
为什么要在
GBTC
列而不是索引上进行合并?两个dfs的索引是否与您显示的相同?不,有时
df
s可能有不同的索引。嗯,您可以尝试
df1.merge(df2).sort_index(axis=1)
?@arhr,因为我想用
yfinance
下载一些新股票的价格。然后将下载的数据帧合并到我现有的数据帧中。我已经下载了一些股票的价格。现在我想将新的数据帧合并到旧的数据帧中。
dfm = pd.merge(df1.unstack().to_frame().reset_index(), df2.unstack().to_frame().reset_index(), on=["level_0","level_1","Date"],how="outer")
(dfm.assign(**{"0_y":dfm["0_y"].fillna(dfm["0_x"])})
 .drop(columns="0_x")
 .rename(columns={"0_y":0})
 .pivot(index=["level_0","level_1"], columns="Date", values=0).T
)