Python 根据其他列条件提取重复行

Python 根据其他列条件提取重复行,python,pandas,dataframe,Python,Pandas,Dataframe,给定一个数据帧 df1 # CustomerId Product csp # adp141 Toaster 1 # adp141 Toaster 4 # 65782 Toaster 1 # 65782 Radio 2 # 74285 Radio 1 # 45984 Radio 1 # 55868 To

给定一个数据帧

df1

#  CustomerId  Product      csp
#      adp141    Toaster     1
#      adp141    Toaster     4
#      65782     Toaster     1
#      65782     Radio       2
#      74285     Radio       1
#      45984     Radio       1
#      55868     Toaster     1
#      55868     Radio       4
#      adp485    Radio       1
#      adp485    Radio       1
我尝试在id上复制数据,其中我在列(csp)中只有1和4,在列(CustomerId)中有相同的id号:

我如何才能拥有这个最终的数据帧

df1

#  CustomerId  Product      csp
#      adp141    Toaster     1
#      adp141    Toaster     4
#      65782     Toaster     1
#      65782     Radio       2
#      74285     Radio       1
#      45984     Radio       1
#      55868     Toaster     1
#      55868     Radio       4
#      adp485    Radio       1
#      adp485    Radio       1
最终结果:

#客户ID产品csp
#adp141烤面包机1
#adp141烤面包机4
#55868烤面包机1
#55868无线电4

让我们试试
过滤器

df=df.groupby('CustomerId').filter(lambda x : pd.Series([1,4]).isin(x['csp']).all())
Out[72]: 
  CustomerId  Product  csp
0     adp141  Toaster    1
1     adp141  Toaster    4
6      55868  Toaster    1
7      55868    Radio    4

来自@YOBEN_S的这个想法非常有帮助,让我想起了元组不变性的优点:

通过
CustomerId
获取
csp
列的分组:

outcome = (df
           .groupby(["CustomerId"])
           .csp
           .agg(tuple)
           .isin([(1,4)])
          )

outcome

CustomerId
45984     False
55868      True
65782     False
74285     False
adp141     True
adp485    False
Name: csp, dtype: bool
CustomerId
设置为索引,并使用
output
变量中的布尔表达式进行过滤:

#you can add reset_index to match your expected output
df.set_index("CustomerId").loc[outcome]

             Product    csp
CustomerId      
  adp141    Toaster     1
  adp141    Toaster     4
  55868     Toaster     1
  55868     Radio       4