Python支持数据帧交叉引用和新列生成
我想生成一个数据框,其中包含一个人可能最喜欢的蜡笔颜色列表,基于他们最喜欢的颜色。我有两个包含必要信息的数据帧:Python支持数据帧交叉引用和新列生成,python,pandas,dataframe,Python,Pandas,Dataframe,我想生成一个数据框,其中包含一个人可能最喜欢的蜡笔颜色列表,基于他们最喜欢的颜色。我有两个包含必要信息的数据帧: df1 = pd.DataFrame({'person':['Jeff','Marie','Jenna','Mike'], 'color':['blue', 'purple', 'brown', 'green']}, columns=['person','color']) df2 = pd.DataFrame({'possible_crayons':['christmas red'
df1 = pd.DataFrame({'person':['Jeff','Marie','Jenna','Mike'], 'color':['blue', 'purple', 'brown', 'green']}, columns=['person','color'])
df2 = pd.DataFrame({'possible_crayons':['christmas red','infra red','scarlet','sunset orange', 'neon carrot','lemon','forest green','pine','navy','aqua','periwinkle','royal purple'],'color':['red','red','red','orange','orange','yellow','green','green','blue','blue','purple','purple']}, columns=['possible_crayons','color'])
我希望通过将df1颜色条目与df2颜色条目匹配,并在df1的新列中以列表形式返回相应的可能的_crayons值,从而将一个数据库与另一个数据库进行对比。任何未找到匹配项的术语都将标记为N/a。因此,所需的输出为:
person favorite_color possible_crayons_list
Jeff blue [navy, aqua]
Marie purple [periwinkle, royal purple]
Jenna brown NaN
Mike green [forest green, pink]
我试过:
mergedDF = pd.merge(df1, df2, how='left')
但是,这会导致以下结果:
person color possible_crayons
0 Jeff blue navy
1 Jeff blue aqua
2 Marie purple periwinkle
3 Marie purple royal purple
4 Jenna brown NaN
5 Mike green forest green
6 Mike green pine
有什么方法可以实现我想要的列表输出吗?使用以下方法:
df1 = pd.DataFrame({'person':['Jeff','Marie','Jenna','Mike'], 'color':['blue', 'purple', 'brown', 'green']}, columns=['person','color'])
df2 = pd.DataFrame({'possible_crayons':['christmas red','infra red','scarlet','sunset orange', 'neon carrot','lemon','forest green','pine','navy','aqua','periwinkle','royal purple'],'color':['red','red','red','orange','orange','yellow','green','green','blue','blue','purple','purple']}, columns=['possible_crayons','color'])
tmp = df2.groupby('color')['possible_crayons'].apply(list)
mergedDF = df1.merge(tmp, how='left', left_on='color', right_index=True)
print(mergedDF)
我们可以使用how='left'
,然后使用作为_index=False
:
new_df= ( df1.merge(df2,how='left',on='color')
.groupby(['color','person'],as_index=False).agg(list) )
输出
print(new_df)
color person possible_crayons
0 blue Jeff [navy, aqua]
1 brown Jenna [nan]
2 green Mike [forest green, pine]
3 purple Marie [periwinkle, royal purple]
mergedDF2=mergedDF.groupby('color')[“可能的”蜡笔]。应用(列表)。重置索引(name='new'u可能的”蜡笔)
可能的重复您能否提供一些有关此解决方案工作原理的详细信息?