Python支持数据帧交叉引用和新列生成

Python支持数据帧交叉引用和新列生成,python,pandas,dataframe,Python,Pandas,Dataframe,我想生成一个数据框,其中包含一个人可能最喜欢的蜡笔颜色列表,基于他们最喜欢的颜色。我有两个包含必要信息的数据帧: df1 = pd.DataFrame({'person':['Jeff','Marie','Jenna','Mike'], 'color':['blue', 'purple', 'brown', 'green']}, columns=['person','color']) df2 = pd.DataFrame({'possible_crayons':['christmas red'

我想生成一个数据框,其中包含一个人可能最喜欢的蜡笔颜色列表,基于他们最喜欢的颜色。我有两个包含必要信息的数据帧:

df1 = pd.DataFrame({'person':['Jeff','Marie','Jenna','Mike'], 'color':['blue', 'purple', 'brown', 'green']}, columns=['person','color'])

df2 = pd.DataFrame({'possible_crayons':['christmas red','infra red','scarlet','sunset orange', 'neon carrot','lemon','forest green','pine','navy','aqua','periwinkle','royal purple'],'color':['red','red','red','orange','orange','yellow','green','green','blue','blue','purple','purple']}, columns=['possible_crayons','color'])
我希望通过将df1颜色条目与df2颜色条目匹配,并在df1的新列中以列表形式返回相应的可能的_crayons值,从而将一个数据库与另一个数据库进行对比。任何未找到匹配项的术语都将标记为N/a。因此,所需的输出为:

person favorite_color possible_crayons_list  
Jeff   blue           [navy, aqua]  
Marie  purple         [periwinkle, royal purple]  
Jenna  brown          NaN  
Mike   green          [forest green, pink]
我试过:

mergedDF = pd.merge(df1, df2, how='left')
但是,这会导致以下结果:

  person   color possible_crayons  
0   Jeff    blue             navy  
1   Jeff    blue             aqua  
2  Marie  purple       periwinkle  
3  Marie  purple     royal purple  
4  Jenna   brown              NaN  
5   Mike   green     forest green  
6   Mike   green             pine  
有什么方法可以实现我想要的列表输出吗?

使用以下方法:

df1 = pd.DataFrame({'person':['Jeff','Marie','Jenna','Mike'], 'color':['blue', 'purple', 'brown', 'green']}, columns=['person','color'])
df2 = pd.DataFrame({'possible_crayons':['christmas red','infra red','scarlet','sunset orange', 'neon carrot','lemon','forest green','pine','navy','aqua','periwinkle','royal purple'],'color':['red','red','red','orange','orange','yellow','green','green','blue','blue','purple','purple']}, columns=['possible_crayons','color'])

tmp = df2.groupby('color')['possible_crayons'].apply(list)
mergedDF = df1.merge(tmp, how='left', left_on='color', right_index=True)

print(mergedDF)
我们可以使用
how='left'
,然后使用
作为_index=False

new_df= ( df1.merge(df2,how='left',on='color')
             .groupby(['color','person'],as_index=False).agg(list) )
输出

print(new_df)
    color person            possible_crayons
0    blue   Jeff                [navy, aqua]
1   brown  Jenna                       [nan]
2   green   Mike        [forest green, pine]
3  purple  Marie  [periwinkle, royal purple]

mergedDF2=mergedDF.groupby('color')[“可能的”蜡笔]。应用(列表)。重置索引(name='new'u可能的”蜡笔)

可能的重复您能否提供一些有关此解决方案工作原理的详细信息?