Python 根据其他列条件提取重复行
给定一个数据帧Python 根据其他列条件提取重复行,python,pandas,dataframe,Python,Pandas,Dataframe,给定一个数据帧 df1 # CustomerId Product csp # adp141 Toaster 1 # adp141 Toaster 4 # 65782 Toaster 1 # 65782 Radio 2 # 74285 Radio 1 # 45984 Radio 1 # 55868 To
df1
# CustomerId Product csp
# adp141 Toaster 1
# adp141 Toaster 4
# 65782 Toaster 1
# 65782 Radio 2
# 74285 Radio 1
# 45984 Radio 1
# 55868 Toaster 1
# 55868 Radio 4
# adp485 Radio 1
# adp485 Radio 1
我尝试在id上复制数据,其中我在列(csp)中只有1和4,在列(CustomerId)中有相同的id号:
我如何才能拥有这个最终的数据帧
df1
# CustomerId Product csp
# adp141 Toaster 1
# adp141 Toaster 4
# 65782 Toaster 1
# 65782 Radio 2
# 74285 Radio 1
# 45984 Radio 1
# 55868 Toaster 1
# 55868 Radio 4
# adp485 Radio 1
# adp485 Radio 1
最终结果:
#客户ID产品csp
#adp141烤面包机1
#adp141烤面包机4
#55868烤面包机1
#55868无线电4
让我们试试过滤器
df=df.groupby('CustomerId').filter(lambda x : pd.Series([1,4]).isin(x['csp']).all())
Out[72]:
CustomerId Product csp
0 adp141 Toaster 1
1 adp141 Toaster 4
6 55868 Toaster 1
7 55868 Radio 4
来自@YOBEN_S的这个想法非常有帮助,让我想起了元组不变性的优点: 通过
CustomerId
获取csp
列的分组:
outcome = (df
.groupby(["CustomerId"])
.csp
.agg(tuple)
.isin([(1,4)])
)
outcome
CustomerId
45984 False
55868 True
65782 False
74285 False
adp141 True
adp485 False
Name: csp, dtype: bool
将CustomerId
设置为索引,并使用output
变量中的布尔表达式进行过滤:
#you can add reset_index to match your expected output
df.set_index("CustomerId").loc[outcome]
Product csp
CustomerId
adp141 Toaster 1
adp141 Toaster 4
55868 Toaster 1
55868 Radio 4