Python 如果来自另一个dataframe的列和来自原始dataframe的列具有匹配值,则在原始dataframe中创建新列
我有两个Python数据帧。其中一个有超过90000行。如果第二个数据帧的列值与原始数据帧中的值匹配,我想从另一个数据帧在原始数据帧中创建一个新列 例如,如果给我两个这样的数据帧:Python 如果来自另一个dataframe的列和来自原始dataframe的列具有匹配值,则在原始dataframe中创建新列,python,pandas,dataframe,join,Python,Pandas,Dataframe,Join,我有两个Python数据帧。其中一个有超过90000行。如果第二个数据帧的列值与原始数据帧中的值匹配,我想从另一个数据帧在原始数据帧中创建一个新列 例如,如果给我两个这样的数据帧: countries = {'Country':['India','South Korea', 'France', 'Austria', 'India','Spain', 'France', 'Algeria'
countries = {'Country':['India','South Korea', 'France', 'Austria', 'India','Spain',
'France', 'Algeria', 'Angola','Spain','Belgium','Austria'],
'Capital':['Delhi', 'Seoul', 'Paris', 'Vienna', 'Delhi', 'Madrid', 'Paris',
'Algiers','Luanda','Madrid','Brussels','Vienna'],
'Landmark':['TajMahal','Seoul Tower','EiffelTower','Belvedere Palace', 'TajMahal',
'La Sagrada','EiffelTower','Algiers Memorial','Ruacana Falls','La
'Sagrada','Grand Place','Belvedere Palace']
}
language = {'Country':['India','South Korea', 'France', 'Algeria', 'Angola', 'Spain',
'Belgium', 'Austria'],
'Language':['Hindi', 'Korean', 'French', 'Arabic', 'Portuguese', 'Spanish',
'Dutch', 'German']
}
>>df1
Country Capital Landmark
0 India Delhi TajMahal
1 South Korea Seoul Seoul Tower
2 France Paris EiffelTower
3 Austria Vienna Belvedere Palace
4 India Delhi TajMahal
5 Spain Madrid La Sagrada
6 France Paris EiffelTower
7 Algeria Algiers Algiers Memorial
8 Angola Luanda Ruacana Falls
9 Spain Madrid La Sagrada
10 Belgium Brussels Grand Place
11 Austria Vienna Belvedere Palace
Country Capital Landmark Language
0 India Delhi TajMahal Hindi
1 South Korea Seoul Seoul Tower Korean
2 France Paris EiffelTower French
3 Austria Vienna Belvedere Palace German
4 India Delhi TajMahal Hindi
5 Spain Madrid La Sagrada Spanish
6 France Paris EiffelTower French
7 Algeria Algiers Algiers Memorial Arabic
8 Angola Luanda Ruacana Falls Portuguese
9 Spain Madrid La Sagrada Spanish
10 Belgium Brussels Grand Place Dutch
11 Austria Vienna Belvedere Palace German
>>df2
Country Language
0 India Hindi
1 South Korea Korean
2 France French
3 Algeria Arabic
4 Angola Portuguese
5 Spain Spanish
6 Belgium Dutch
7 Austria German
我希望得到这样的结果:
>>df1
Country Capital Landmark
0 India Delhi TajMahal
1 South Korea Seoul Seoul Tower
2 France Paris EiffelTower
3 Austria Vienna Belvedere Palace
4 India Delhi TajMahal
5 Spain Madrid La Sagrada
6 France Paris EiffelTower
7 Algeria Algiers Algiers Memorial
8 Angola Luanda Ruacana Falls
9 Spain Madrid La Sagrada
10 Belgium Brussels Grand Place
11 Austria Vienna Belvedere Palace
Country Capital Landmark Language
0 India Delhi TajMahal Hindi
1 South Korea Seoul Seoul Tower Korean
2 France Paris EiffelTower French
3 Austria Vienna Belvedere Palace German
4 India Delhi TajMahal Hindi
5 Spain Madrid La Sagrada Spanish
6 France Paris EiffelTower French
7 Algeria Algiers Algiers Memorial Arabic
8 Angola Luanda Ruacana Falls Portuguese
9 Spain Madrid La Sagrada Spanish
10 Belgium Brussels Grand Place Dutch
11 Austria Vienna Belvedere Palace German
我曾经尝试过使用嵌套for循环,但是我的python代码进入了一个无限循环,我想杀死这个程序来摆脱它。这是我收到的错误消息:
ValueError回溯(最近一次调用)
在里面
---->1 df2['Countrylanguage']=语言
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in_u____;setitem___;
3368其他:
3369#设置列
->3370自我设置项目(键、值)
3371
3372 def_设置项_切片(自身、键、值):
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in_set_项(self、key、value)
3443
3444自我确保有效索引(值)
->3445 value=self.\u sanitize\u列(键,值)
3446 NDFrame.\u设置\u项(自身、键、值)
3447
/_sanitize_列中的Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py(self、key、value、broadcast)
3628
3629#把我变成一个废物
->3630值=清理索引(值,self.index,copy=False)
3631如果不存在(值,(np.ndarray,索引)):
3632如果isinstance(值,列表)和len(值)>0:
/sanitize_索引中的Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/internals/construction.py(数据、索引、副本)
517
518如果len(数据)!=len(索引):
-->519 raise VALUERROR('值的长度与索引的长度不匹配')
520
521如果存在(数据、ABSS)且未复制:
ValueError:值的长度与索引的长度不匹配
向原始数据帧添加新列的正确方法是什么
谢谢你的帮助 有很多方法可以做到这一点,包括
合并、加入、映射
,下面是其中之一
df1.merge(df2)
或者,我建议创建以下词典并进行map
language = {'India': 'Hindi',
'South Korea': 'Korean',
'France': 'French',
'Algeria': 'Arabic',
'Angola': 'Portuguese',
'Spain': 'Spanish',
'Belgium': 'Dutch',
'Austria': 'German'}
df1['Language'] = df1['Country'].map(language)
谢谢,但是合并对我的示例数据帧有效,但是,当我尝试将>90k行和19列的数据帧与第二个数据帧形状(38233,2)合并时,我得到了一个空的数据帧。结果数据帧的形状是(0,19),而不是(975197,19)。