Python 比较两个不同数据框中的列(如果找到匹配项),将电子邮件从df2复制到df1

Python 比较两个不同数据框中的列(如果找到匹配项),将电子邮件从df2复制到df1,python,pandas,numpy,Python,Pandas,Numpy,我有两个不同列名的数据帧,每个数据帧有10行。我要做的是比较列值,如果它们匹配,将电子邮件地址从df2复制到df1。我看过这个例子,但我的列名不同。我也看到了np。其中使用了多个条件,但当我这样做时,会产生以下错误: ValueError: Wrong number of items passed 2, placement implies 1 我想做什么: 我想做的是比较df1的第一行2列(first,last_-mage)和df2列的所有行(first_-small,last_-small)

我有两个不同列名的数据帧,每个数据帧有10行。我要做的是比较列值,如果它们匹配,将电子邮件地址从df2复制到df1。我看过这个例子,但我的列名不同。我也看到了
np。其中
使用了多个条件,但当我这样做时,会产生以下错误:

ValueError: Wrong number of items passed 2, placement implies 1
我想做什么:

我想做的是比较df1的第一行2列(first,last_-mage)和df2列的所有行(first_-small,last_-small),如果找到匹配项,则从df2中的特定列获取电子邮件地址,并将其分配给df1中的新列。有人能帮我吗?我只复制了下面的相关代码,只是在新的电子邮件中添加了5条新记录,还不能完全正常工作

最初我做的是将df1['first']与df2['first']进行比较

data1 = {"first":["alice", "bob", "carol"],
         "last_huge":["foo", "bar", "baz"],
         "street_huge": ["Jaifo Road", "Wetib Ridge", "Ucagi View"],
         "city_huge": ["Egviniw", "Manbaali", "Ismazdan"],
         "age_huge": ["23", "30", "36"],
         "state_huge": ["MA", "LA", "CA"],
         "zip_huge": ["89899", "78788", "58999"]}

df1 = pd.DataFrame(data1)

data2 = {"first_small":["alice", "bob", "carol"],
         "last_small":["foo", "bar", "baz"],
         "street_small": ["Jsdffo Road", "sdf Ridge", "sdfff View"],
         "city_huge": ["paris", "london", "rome"],
         "age_huge": ["28", "40", "56"],
         "state_huge": ["GA", "EA", "BA"],
         "zip_huge": ["89859", "78728", "56999"],
         "email_small":["alice@xyz.com", "bob@abc.com", "carol@jkl.com"],
         "dob": ["31051989", "31051980", "31051981"],
         "country": ["UK", "US", "IT"],
         "company": ["microsoft", "apple", "google"],
         "source": ["bing", "yahoo", "google"]}

df2 = pd.DataFrame(data2)

df1['new_email'] = np.where((df1[['first']] == df2[['first_small']]), df2[['email_small']], np.nan)
现在它只在新的_电子邮件中添加了5条记录,其余的都是nan。并告诉我这个错误:

ValueError: Can only compare identically-labeled Series objects

尝试合并:

(df1.merge(df2[["first_small", "last_small", "email_small"]], 
           how="left", 
           left_on=["first", "last_huge"], 
           right_on=["first_small", "last_small"])
    .drop(['first_small','last_small'], 1))
例如:

data1 = {"first":["alice", "bob", "carol"], 
         "last_huge":["foo", "bar", "baz"]}
df1 = pd.DataFrame(data1)

data2 = {"first_small":["alice", "bob", "carol"], 
         "last_small":["foo", "bar", "baz"],
         "email_small":["alice@xyz.com", "bob@abc.com", "carol@jkl.com"]}
df2 = pd.DataFrame(data2)

(df1.merge(df2[["first_small", "last_small", "email_small"]], 
           how="left", 
           left_on=["first", "last_huge"], 
           right_on=["first_small", "last_small"])
    .drop(['first_small','last_small'], 1))
输出:

   first last_huge    email_small
0  alice       foo  alice@xyz.com
1    bob       bar    bob@abc.com
2  carol       baz  carol@jkl.com

通过使用andrew_reece的示例数据:-)
pd.concat

pd.concat([df1.set_index(["first", "last_huge"]),df2.set_index(["first_small", "last_small"])['email_small']],axis=1).reset_index().dropna()
Out[23]: 
   first last_huge    email_small
0  alice       foo  alice@xyz.com
1    bob       bar    bob@abc.com
2  carol       baz  carol@jkl.com
通过使用您的数据

pd.concat([df1.set_index(["first", "last_huge"]),df2.set_index(["first_small", "last_small"])['email_small']],axis=1).reset_index()
Out[97]: 
   first last_huge age_huge city_huge state_huge  street_huge zip_huge  \
0  alice       foo       23   Egviniw         MA   Jaifo Road    89899   
1    bob       bar       30  Manbaali         LA  Wetib Ridge    78788   
2  carol       baz       36  Ismazdan         CA   Ucagi View    58999   
     email_small  
0  alice@xyz.com  
1    bob@abc.com  
2  carol@jkl.com  
使用
map

df1['email_small']=(df1['first']+df1['last_huge']).map(df2.set_index(df2['first_small']+df2['last_small'])['email_small'])
df1
Out[115]: 
  age_huge city_huge  first last_huge state_huge  street_huge zip_huge  \
0       23   Egviniw  alice       foo         MA   Jaifo Road    89899   
1       30  Manbaali    bob       bar         LA  Wetib Ridge    78788   
2       36  Ismazdan  carol       baz         CA   Ucagi View    58999   
     email_small  
0  alice@xyz.com  
1    bob@abc.com  
2  carol@jkl.com  

当您可以内联提供具有代表性的示例数据时,请不要发布数据截图。它使其他人更容易帮助您。好的,我会解决它将公共字段移动到索引,然后
concat
是一个巧妙的技巧。@andrew_reece这里唯一的限制是键不是唯一的。谢谢!但是第一列和最后一列被重命名为级别0和级别1,除级别0、级别1和电子邮件外,其他列都是NaN_small@Wcan您的数据与样本数据相同吗?很抱歉,上面给出的样本数据不相同,但我更改了它。不客气!如果此解决方案解决了您的问题,请通过单击答案旁边的复选标记将其标记为接受。一个问题,如果df1行中的区块大小不同于df2,这不是问题,对吗?只要每个名-姓对在
df2
中只有一个电子邮件地址,那么,如果
df1
df2
的长度不同,就不重要了。(我想这就是你所说的块大小。)如果
df1
中的某些行在
df2
中没有匹配的电子邮件,
email\u small
字段将显示为
NaN
。有一个小问题,在实际文件中,data1“first”的列名是“first\u name”,data2的列名“first\u small”也是“first\u name”. drop语句实际上是从data1和data2中删除列,如何防止它从data1中删除列,而只从data2中删除列?如果您想保留
first\u name
,只需将
first\u name
包含在
on=[]
列表中,并将其从
drop
列表中删除即可。该列将不会重复。