Python 合并具有重叠列的数据帧
我有两个数据帧,可以使用以下代码创建:Python 合并具有重叠列的数据帧,python,pandas,Python,Pandas,我有两个数据帧,可以使用以下代码创建: import yfinance as yf symbols = ['QQQ', 'GBTC'] df1 = yf.download(symbols, start="2019-01-01", end="2019-01-07") symbols = ['GBTC', 'TLT'] df2 = yf.download(symbols, start="2019-01-01", end="20
import yfinance as yf
symbols = ['QQQ', 'GBTC']
df1 = yf.download(symbols, start="2019-01-01", end="2019-01-07")
symbols = ['GBTC', 'TLT']
df2 = yf.download(symbols, start="2019-01-01", end="2019-01-07")
df1
和df2
的内容如下
> df1
Adj Close Close High Low \
GBTC QQQ GBTC QQQ GBTC QQQ GBTC
Date
2018-12-31 3.965 152.132996 3.965 154.259995 4.15 154.979996 3.95
2019-01-02 4.620 152.744461 4.620 154.880005 4.65 155.750000 4.13
2019-01-03 4.520 147.754257 4.520 149.820007 4.62 153.259995 4.32
2019-01-04 4.530 154.075851 4.530 156.229996 4.65 157.000000 4.41
Open Volume
QQQ GBTC QQQ GBTC QQQ
Date
2018-12-31 152.710007 4.140 154.470001 3829000 53015300
2019-01-02 150.880005 4.155 150.990005 2948200 58576700
2019-01-03 149.490005 4.325 152.600006 1503000 74820200
2019-01-04 151.740005 4.585 152.339996 2020700 74709300
> df2
Adj Close Close High Low \
GBTC TLT GBTC TLT GBTC TLT GBTC
Date
2018-12-31 3.965 116.845848 3.965 121.510002 4.15 121.559998 3.95
2019-01-02 4.620 117.461304 4.620 122.150002 4.65 122.160004 4.13
2019-01-03 4.520 118.797966 4.520 123.540001 4.62 123.860001 4.32
2019-01-04 4.530 117.422844 4.530 122.110001 4.65 122.559998 4.41
Open Volume
TLT GBTC TLT GBTC TLT
Date
2018-12-31 120.459999 4.140 120.650002 3829000 17409000
2019-01-02 121.339996 4.155 121.660004 2948200 19841500
2019-01-03 122.230003 4.325 122.290001 1503000 21187000
2019-01-04 121.650002 4.585 122.339996 2020700 12970200
df1
和df2
都包含GBTC
列
如何将df1
和df2
合并到具有以下内容的新数据帧中
> df3
Adj Close Close \
GBTC QQQ TLT GBTC QQQ TLT
Date
2018-12-31 3.965 152.132996 116.845848 3.965 154.259995 121.510002
2019-01-02 4.620 152.744461 117.461304 4.620 154.880005 122.150002
2019-01-03 4.520 147.754257 118.797966 4.520 149.820007 123.540001
2019-01-04 4.530 154.075851 117.422844 4.530 156.229996 122.110001
High Low Open \
GBTC QQQ TLT GBTC QQQ TLT GBTC
Date
2018-12-31 4.15 154.979996 121.559998 3.95 152.710007 120.459999 4.140
2019-01-02 4.65 155.750000 122.160004 4.13 150.880005 121.339996 4.155
2019-01-03 4.62 153.259995 123.860001 4.32 149.490005 122.230003 4.325
2019-01-04 4.65 157.000000 122.559998 4.41 151.740005 121.650002 4.585
Volume
QQQ TLT GBTC QQQ TLT
Date
2018-12-31 154.470001 120.650002 3829000 53015300 17409000
2019-01-02 150.990005 121.660004 2948200 58576700 19841500
2019-01-03 152.600006 122.290001 1503000 74820200 21187000
2019-01-04 152.339996 122.339996 2020700 74709300 12970200
我可能有多个重叠的列
这似乎无法实现我的目标。unstack()
- 从df2中选择新值作为首选项
- 使用
pivot()
为什么要在
GBTC
列而不是索引上进行合并?两个dfs的索引是否与您显示的相同?不,有时df
s可能有不同的索引。嗯,您可以尝试df1.merge(df2).sort_index(axis=1)
?@arhr,因为我想用yfinance
下载一些新股票的价格。然后将下载的数据帧合并到我现有的数据帧中。我已经下载了一些股票的价格。现在我想将新的数据帧合并到旧的数据帧中。
dfm = pd.merge(df1.unstack().to_frame().reset_index(), df2.unstack().to_frame().reset_index(), on=["level_0","level_1","Date"],how="outer")
(dfm.assign(**{"0_y":dfm["0_y"].fillna(dfm["0_x"])})
.drop(columns="0_x")
.rename(columns={"0_y":0})
.pivot(index=["level_0","level_1"], columns="Date", values=0).T
)