Python 将数据帧与一些匹配的列名称合并会导致重复的列
我有两个数据框,其中有一些重叠的列,我正试图为给定的Python 将数据帧与一些匹配的列名称合并会导致重复的列,python,pandas,join,merge,Python,Pandas,Join,Merge,我有两个数据框,其中有一些重叠的列,我正试图为给定的符号和日期合并它们。但当我这样做时,并没有填充丢失的数据,而是添加了带有后缀的新列 df1 Investor Date Name Symbol Price Amount Income 0 Mike 2019 Q4 A Inc AAA NaN 100 NaN 1 Bill 2019 Q4 C Inc CCC NaN 200 NaN 2 J
符号
和日期
合并它们。但当我这样做时,并没有填充丢失的数据,而是添加了带有后缀的新列
df1
Investor Date Name Symbol Price Amount Income
0 Mike 2019 Q4 A Inc AAA NaN 100 NaN
1 Bill 2019 Q4 C Inc CCC NaN 200 NaN
2 John 2018 Q4 A Inc AAA NaN 200 NaN
3 Faye 2018 Q4 D Inc DDD NaN 300 NaN
4 Joe 2019 Q2 A Inc AAA NaN 300 NaN
5 Hank 2019 Q2 S Inc SSS NaN 100 NaN
df2
Date Name Symbol Price Income
0 2019 Q4 A Inc AAA 5 10
1 2019 Q4 B Inc BBB 3 20
2 2019 Q4 C Inc CCC 33 30
3 2019 Q4 D Inc DDD 30 40
4 2018 Q4 A Inc AAA 23 20
5 2018 Q4 B Inc BBB 4 30
6 2018 Q4 C Inc CCC 136 40
7 2018 Q4 D Inc DDD 6 50
8 2018 Q4 E Inc EEE 1 90
我希望我的输出看起来像:
Investor Date Name Symbol Price Amount Income
0 Mike 2019 Q4 A Inc AAA 5.0 100 10.0
1 Bill 2019 Q4 C Inc CCC 33.0 200 30.0
2 John 2018 Q4 A Inc AAA 23.0 200 20.0
3 Faye 2018 Q4 D Inc DDD 6.0 300 50.0
4 Joe 2019 Q2 A Inc AAA NaN 300 NaN
5 Hank 2019 Q2 S Inc SSS NaN 100 NaN
但是当我执行df3=pd.merge(df1,df2,on=['Date','Symbol'],how='left')
时,我得到:
Investor Date Name_x Symbol ... Income_x Name_y Price_y Income_y
0 Mike 2019 Q4 A Inc AAA ... NaN A Inc 5.0 10.0
1 Bill 2019 Q4 C Inc CCC ... NaN C Inc 33.0 30.0
2 John 2018 Q4 A Inc AAA ... NaN A Inc 23.0 20.0
3 Faye 2018 Q4 D Inc DDD ... NaN D Inc 6.0 50.0
4 Joe 2019 Q2 A Inc AAA ... NaN NaN NaN NaN
5 Hank 2019 Q2 S Inc SSS ... NaN NaN NaN NaN
我做错了什么
df1 = `df1 = {'Investor': {0: 'Mike', 1: 'Bill', 2: 'John', 3: 'Faye', 4: 'Joe', 5: 'Hank'}, 'Date': {0: '2019 Q4', 1: '2019 Q4', 2: '2018 Q4', 3: '2018 Q4', 4: '2019 Q2', 5: '2019 Q2'}, 'Name': {0: 'A Inc', 1: 'C Inc', 2: 'A Inc', 3: 'D Inc', 4: 'A Inc', 5: 'S Inc'}, 'Symbol': {0: 'AAA', 1: 'CCC', 2: 'AAA', 3: 'DDD', 4: 'AAA', 5: 'SSS'}, 'Price': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'Amount': {0: 100, 1: 200, 2: 200, 3: 300, 4: 300, 5: 100}, 'Income': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}}`
df2 = {'Date': {0: '2019 Q4', 1: '2019 Q4', 2: '2019 Q4', 3: '2019 Q4', 4: '2018 Q4', 5: '2018 Q4', 6: '2018 Q4', 7: '2018 Q4', 8: '2018 Q4'}, 'Name': {0: 'A Inc', 1: 'B Inc', 2: 'C Inc', 3: 'D Inc', 4: 'A Inc', 5: 'B Inc', 6: 'C Inc', 7: 'D Inc', 8: 'E Inc'}, 'Symbol': {0: 'AAA', 1: 'BBB', 2: 'CCC', 3: 'DDD', 4: 'AAA', 5: 'BBB', 6: 'CCC', 7: 'DDD', 8: 'EEE'}, 'Price': {0: 5, 1: 3, 2: 33, 3: 30, 4: 23, 5: 4, 6: 136, 7: 6, 8: 1}, 'Income': {0: 10, 1: 20, 2: 30, 3: 40, 4: 20, 5: 30, 6: 40, 7: 50, 8: 90}}
df3 = pd.merge(df1, df2, on=['Date', 'Symbol'], how='left')
这是因为在两个数据帧上都有
名称、收入和价格。如果不需要重复项,则应选择所需的列:
(df1[['Investor', 'Name', 'Date','Symbol','Amount']]
.merge(df2.drop('Name', axis=1),
on=['Date','Symbol'],
how='left')
)
输出:
Investor Name Date Symbol Amount Price Income
0 Mike A Inc 2019 Q4 AAA 100 5.0 10.0
1 Bill C Inc 2019 Q4 CCC 200 33.0 30.0
2 John A Inc 2018 Q4 AAA 200 23.0 20.0
3 Faye D Inc 2018 Q4 DDD 300 6.0 50.0
4 Joe A Inc 2019 Q2 AAA 300 NaN NaN
5 Hank S Inc 2019 Q2 SSS 100 NaN NaN
这是因为在两个数据帧上都有名称、收入和价格。如果不需要重复项,则应选择所需的列:
(df1[['Investor', 'Name', 'Date','Symbol','Amount']]
.merge(df2.drop('Name', axis=1),
on=['Date','Symbol'],
how='left')
)
输出:
Investor Name Date Symbol Amount Price Income
0 Mike A Inc 2019 Q4 AAA 100 5.0 10.0
1 Bill C Inc 2019 Q4 CCC 200 33.0 30.0
2 John A Inc 2018 Q4 AAA 200 23.0 20.0
3 Faye D Inc 2018 Q4 DDD 300 6.0 50.0
4 Joe A Inc 2019 Q2 AAA 300 NaN NaN
5 Hank S Inc 2019 Q2 SSS 100 NaN NaN
因此,我需要通过在(df1[['Investor'、'Name'、'Date'、'Symbol'、'Amount']]
)中列出它来指定我想要从df1
中保留的每一列。是的,您可以删除这些列,类似于df2.drop(['Name'],axis=1)
。如果不明确列出,就无法保留每一列?基本上,df1
中没有df2
中不存在的列,我想保留df1
中的所有列-只要填充df2
中存在的缺失值即可。因此,我需要指定df1
通过在中列出它(df1[['Investor','Name','Date','Symbol','Amount']]
?是的,您可以删除列,类似于df2.drop(['Name',axis=1)
。如果不明确列出,就无法保留每一列?基本上,df1
中没有df2
中不存在的列,我想保留df1
中的所有列-如果df2
中存在缺少的值,只需填充它们即可。