Python 熊猫合并两个具有不同日期和列的数据框
我需要合并两个数据帧,如下所示: 我尝试了Python 熊猫合并两个具有不同日期和列的数据框,python,pandas,dataframe,Python,Pandas,Dataframe,我需要合并两个数据帧,如下所示: 我尝试了内部,左侧连接,但得到了重复的值。我的钥匙是日期和类别 df1 date categories cost clicks impression conversion 02-11-20 categories 5 153999 12 80 2 03-11-20 categories 1 9366463 31 135
内部
,左侧
连接,但得到了重复的值。我的钥匙是日期和类别
df1
date categories cost clicks impression conversion
02-11-20 categories 5 153999 12 80 2
03-11-20 categories 1 9366463 31 135 4
03-11-20 categories 2 2738528 21 167 2
03-11-20 Others 4177461 19 94 1
03-11-20 categories 3 1747084 4 21 2
04-11-20 categories 4 5812003 35 220 1
04-11-20 categories 5 8490241 41 225 2
df2
date categories sales deal
02-11-20 categories 5 117810 1
04-11-20 categories 4 1487500 3
04-11-20 categories 6 299999 1
04-11-20 Others 79106 1
desired output
date categories cost clicks impression conversion sales deal
02-11-20 categories 5 153999 12 80 2 117810 1
03-11-20 categories 1 9366463 31 135 4 na na
03-11-20 categories 2 2738528 21 167 2 na na
03-11-20 Others 4177461 19 94 1 na na
03-11-20 categories 3 1747084 4 21 2 na na
04-11-20 categories 4 5812003 35 220 1 1487500 3
04-11-20 categories 5 8490241 41 225 2 na na
04-11-20 Others na na na na 79106 1
04-11-20 categories 6 na na na na 299999 1
谢谢您应该使用外部
联接,并指定合并应基于的两列-注意,您应该在列表中提供列
outer
连接使用两个帧中的键,并为两个数据帧中缺少的行插入NaN的
new = df1.merge(df2, on=['date','categories'], how='outer')
其中打印:
date categories cost ... conversion sales deal
0 2020-02-11 categories 5 153999.0 ... 2.0 117810.0 1.0
1 2020-03-11 categories 1 9366463.0 ... 4.0 NaN NaN
2 2020-03-11 categories 2 2738528.0 ... 2.0 NaN NaN
3 2020-03-11 Others 4177461.0 ... 1.0 NaN NaN
4 2020-03-11 categories 3 1747084.0 ... 2.0 NaN NaN
5 2020-04-11 categories 4 5812003.0 ... 1.0 1487500.0 3.0
6 2020-04-11 categories 5 8490241.0 ... 2.0 NaN NaN
7 2020-04-11 categories 6 NaN ... NaN 299999.0 1.0
8 2020-04-11 Others NaN ... NaN 79106.0 1.0
使用pd。通过传递密钥合并,并使用how=“outer”
:
请看一个例子
import pandas as pd
pd.merge(left, right, on=["key1","key2"], how="outer")