Python 单击groupby并检查一行的值是否在另一行的值中

Python 单击groupby并检查一行的值是否在另一行的值中,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我想对客户进行分组,并将计数为1的项目与计数大于1的项目进行匹配,如果所有项目都匹配,则将可能的合并id添加到新列中。例如:客户1,id=3项目在id=2中,因此这是一个匹配,可分配的合并id为1,同样,对于客户2,id=7是计数1,项目在id=5项目中,所以匹配和可能的合并id是4 我的数据帧: count custmr id items 3 Customer1 1 Cabbage, beet, Okra, root 3 Customer1

我想对客户进行分组,并将计数为1的项目与计数大于1的项目进行匹配,如果所有项目都匹配,则将可能的合并id添加到新列中。例如:客户1,id=3项目在id=2中,因此这是一个匹配,可分配的合并id为1,同样,对于客户2,id=7是计数1,项目在id=5项目中,所以匹配和可能的合并id是4

我的数据帧:

    count custmr    id  items
    3   Customer1   1   Cabbage, beet, Okra, root
    3   Customer1   2   Apple, Banana, Mango ,Pears, leafs
    1   Customer1   3   Mango leafs
    1   Customer1   4   tomato root
    4   Customer2   5   grapes,leach,guava,pappaya
    2   Customer2   6   blackberry,blueberry
    1   Customer2   7   pappaya
  count custmr     id        items                        probable_merge_id
    3   Customer1   1   Cabbage, beet, Okra, root   
    3   Customer1   2   Apple, Banana, Mango ,Pears, leafs  
    1   Customer1   3   Mango leafs                             2
    1   Customer1   4   tomato root 
    4   Customer2   5   grapes,leach,guava,pappaya  
    2   Customer2   6   blackberry,blueberry    
    1   Customer2   7   pappaya                                 4
预期输出:

    count custmr    id  items
    3   Customer1   1   Cabbage, beet, Okra, root
    3   Customer1   2   Apple, Banana, Mango ,Pears, leafs
    1   Customer1   3   Mango leafs
    1   Customer1   4   tomato root
    4   Customer2   5   grapes,leach,guava,pappaya
    2   Customer2   6   blackberry,blueberry
    1   Customer2   7   pappaya
  count custmr     id        items                        probable_merge_id
    3   Customer1   1   Cabbage, beet, Okra, root   
    3   Customer1   2   Apple, Banana, Mango ,Pears, leafs  
    1   Customer1   3   Mango leafs                             2
    1   Customer1   4   tomato root 
    4   Customer2   5   grapes,leach,guava,pappaya  
    2   Customer2   6   blackberry,blueberry    
    1   Customer2   7   pappaya                                 4

首先通过
merge
创建交叉连接,通过
count=1
进行过滤,将字符串转换为
set
s,以便进行比较。上次为
地图创建
系列

df1 = df.merge(df, on='custmr')
df1 = df1[(df1['count_x'] == 1)]
df1['items_x'] = df1['items_x'].str.split('\s+|,\s*').apply(set)
df1['items_y'] = df1['items_y'].str.split('\s+|,\s*').apply(set)
df1 = df1[ df1['items_x'] < df1['items_y']]
print (df1)
    count_x     custmr  id_x         items_x  count_y  id_y  \
9         1  Customer1     3  {Mango, leafs}        3     2   
22        1  Customer2     7       {pappaya}        4     5   

                                 items_y  
9   {Mango, Pears, leafs, Apple, Banana}  
22       {grapes, pappaya, leach, guava}  

s = df1.set_index('id_x')['id_y']
print (s)
id_x
3    2
7    5
Name: id_y, dtype: int64

df['probable_merge_id'] = df['id'].map(s)
print (df)
   count     custmr  id                           items  probable_merge_id
0      3  Customer1   1          Cabbage,beet,Okra,root                NaN
1      3  Customer1   2  Apple,Banana,Mango,Pears,leafs                NaN
2      1  Customer1   3                     Mango leafs                2.0
3      1  Customer1   4                     tomato root                NaN
4      4  Customer2   5      grapes,leach,guava,pappaya                NaN
5      2  Customer2   6            blackberry,blueberry                NaN
6      1  Customer2   7                         pappaya                5.0
df1=df.merge(df,on='custmr')
df1=df1[(df1['count_x']==1)]
df1['items_x']=df1['items_x'].str.split('\s+|,\s*')。应用(集)
df1['items_y']=df1['items_y'].str.split('\s+|,\s*')。应用(集)
df1=df1[df1['items\u x']
首先通过
merge
创建交叉联接,通过
count=1
过滤,将字符串转换为
s,以便进行比较。上次为
地图创建
系列

df1 = df.merge(df, on='custmr')
df1 = df1[(df1['count_x'] == 1)]
df1['items_x'] = df1['items_x'].str.split('\s+|,\s*').apply(set)
df1['items_y'] = df1['items_y'].str.split('\s+|,\s*').apply(set)
df1 = df1[ df1['items_x'] < df1['items_y']]
print (df1)
    count_x     custmr  id_x         items_x  count_y  id_y  \
9         1  Customer1     3  {Mango, leafs}        3     2   
22        1  Customer2     7       {pappaya}        4     5   

                                 items_y  
9   {Mango, Pears, leafs, Apple, Banana}  
22       {grapes, pappaya, leach, guava}  

s = df1.set_index('id_x')['id_y']
print (s)
id_x
3    2
7    5
Name: id_y, dtype: int64

df['probable_merge_id'] = df['id'].map(s)
print (df)
   count     custmr  id                           items  probable_merge_id
0      3  Customer1   1          Cabbage,beet,Okra,root                NaN
1      3  Customer1   2  Apple,Banana,Mango,Pears,leafs                NaN
2      1  Customer1   3                     Mango leafs                2.0
3      1  Customer1   4                     tomato root                NaN
4      4  Customer2   5      grapes,leach,guava,pappaya                NaN
5      2  Customer2   6            blackberry,blueberry                NaN
6      1  Customer2   7                         pappaya                5.0
df1=df.merge(df,on='custmr')
df1=df1[(df1['count_x']==1)]
df1['items_x']=df1['items_x'].str.split('\s+|,\s*')。应用(集)
df1['items_y']=df1['items_y'].str.split('\s+|,\s*')。应用(集)
df1=df1[df1['items\u x']
到目前为止,您尝试了哪些代码?你在哪里卡住了?到目前为止你试过什么代码?你在哪里卡住了?