Python 如何根据特定规则将列从一个数据帧添加到另一个数据帧
我正试着把桌子和桌子结合起来 因此,为了创建2个数据帧,我执行以下操作:Python 如何根据特定规则将列从一个数据帧添加到另一个数据帧,python,pandas,Python,Pandas,我正试着把桌子和桌子结合起来 因此,为了创建2个数据帧,我执行以下操作: url = 'https://www.cia.gov/library/publications/the-world- factbook/fields/2127.html' url2 = 'https://www.cia.gov/library/publications/the-world- factbook/rankorder/2004rank.html' d = {'TOTAL FERTILITY RATE(CHIL
url = 'https://www.cia.gov/library/publications/the-world-
factbook/fields/2127.html'
url2 = 'https://www.cia.gov/library/publications/the-world-
factbook/rankorder/2004rank.html'
d = {'TOTAL FERTILITY RATE(CHILDREN BORN/WOMAN)':'TFR'}
d2 = {'Country','GDP - PER CAPITA (PPP)':'GDP (PPP)'}
df = pd.read_html(url, header=0)[0].rename(columns=d)
df2 = pd.read_html(url2, header=0)[0].rename(columns=d2)
df['TFR'] = pd.to_numeric(df['TFR'].str[:-31])
现在,我从df2创建一个子数据帧:
df21 = df2[['Country','GDP (PPP)']]
因此,我最终得到了df21,其中包含了国家名称及其GDP。现在,我想比较两个数据框,并根据每个国家的名称(在df和df2中都有一列包含国家名称)为df中的每个国家分配GDP(PPP)值。有什么办法吗 与左连接一起使用或:
如果
df2['country']
中的国家/地区值在df['country']
中不存在,则创建NaN
:
print (df[df['GDP (PPP)'].isna()])
Country TFR GDP (PPP)
43 Christmas Island NaN NaN
44 Cocos (Keeling) Islands NaN NaN
78 Gaza Strip 4.13 NaN
154 Norfolk Island NaN NaN
165 Pitcairn Islands NaN NaN
191 Somalia 5.80 NaN
198 Svalbard NaN NaN
230 World 2.42 NaN
df['GDP (PPP)'] = df['Country'].map(df2.set_index('Country')['GDP (PPP)'])
print (df.head())
Country TFR GDP (PPP)
0 Afghanistan 5.12 $2,000
1 Albania 1.51 $12,500
2 Algeria 2.70 $15,200
3 American Samoa 2.68 $11,200
4 Andorra 1.40 $49,900
print (df[df['GDP (PPP)'].isna()])
Country TFR GDP (PPP)
43 Christmas Island NaN NaN
44 Cocos (Keeling) Islands NaN NaN
78 Gaza Strip 4.13 NaN
154 Norfolk Island NaN NaN
165 Pitcairn Islands NaN NaN
191 Somalia 5.80 NaN
198 Svalbard NaN NaN
230 World 2.42 NaN