Python 根据其他列条件提取重复行_Python_Pandas_Dataframe

Python 根据其他列条件提取重复行

python pandas dataframe

Python 根据其他列条件提取重复行,python,pandas,dataframe,Python,Pandas,Dataframe,给定一个数据帧 df1 # CustomerId Product csp # adp141 Toaster 1 # adp141 Toaster 4 # 65782 Toaster 1 # 65782 Radio 2 # 74285 Radio 1 # 45984 Radio 1 # 55868 To

给定一个数据帧

df1

#  CustomerId  Product      csp
#      adp141    Toaster     1
#      adp141    Toaster     4
#      65782     Toaster     1
#      65782     Radio       2
#      74285     Radio       1
#      45984     Radio       1
#      55868     Toaster     1
#      55868     Radio       4
#      adp485    Radio       1
#      adp485    Radio       1

我尝试在id上复制数据，其中我在列（csp）中只有1和4，在列（CustomerId）中有相同的id号：

我如何才能拥有这个最终的数据帧

df1

#  CustomerId  Product      csp
#      adp141    Toaster     1
#      adp141    Toaster     4
#      65782     Toaster     1
#      65782     Radio       2
#      74285     Radio       1
#      45984     Radio       1
#      55868     Toaster     1
#      55868     Radio       4
#      adp485    Radio       1
#      adp485    Radio       1

最终结果：

#客户ID产品csp
#adp141烤面包机1
#adp141烤面包机4
#55868烤面包机1
#55868无线电4

让我们试试

过滤器

df=df.groupby('CustomerId').filter(lambda x : pd.Series([1,4]).isin(x['csp']).all())
Out[72]: 
  CustomerId  Product  csp
0     adp141  Toaster    1
1     adp141  Toaster    4
6      55868  Toaster    1
7      55868    Radio    4

来自@YOBEN_S的这个想法非常有帮助，让我想起了元组不变性的优点：

通过

CustomerId

获取

csp

列的分组：

outcome = (df
           .groupby(["CustomerId"])
           .csp
           .agg(tuple)
           .isin([(1,4)])
          )

outcome

CustomerId
45984     False
55868      True
65782     False
74285     False
adp141     True
adp485    False
Name: csp, dtype: bool

将

CustomerId

设置为索引，并使用

output

变量中的布尔表达式进行过滤：

#you can add reset_index to match your expected output
df.set_index("CustomerId").loc[outcome]

             Product    csp
CustomerId      
  adp141    Toaster     1
  adp141    Toaster     4
  55868     Toaster     1
  55868     Radio       4