Python 熊猫有条件申请_Python_Pandas_Apply_Loc

Python 熊猫有条件申请

python pandas

Python 熊猫有条件申请,python,pandas,apply,loc,Python,Pandas,Apply,Loc,我有不同状态的客户副本，因为每个客户订阅/产品都有一行。我想为客户生成一个新的\u状态，如果要“取消”，每个订阅状态必须一起“取消” 我用过： df['duplicated'] = df.groupby('customer', as_index=False)['customer'].cumcount() 分隔索引中的每个重复项以指示重复值 Customer | Status | new_status | duplicated X |canceled| | 0

我有不同状态的客户副本，因为每个客户订阅/产品都有一行。我想为客户生成一个

新的\u状态

，如果要“取消”，每个订阅状态必须一起“取消”

我用过：

df['duplicated'] = df.groupby('customer', as_index=False)['customer'].cumcount()

分隔索引中的每个重复项以指示重复值

Customer | Status | new_status | duplicated
 X       |canceled|            | 0
 X       |canceled|            | 1
 X       |active  |            | 2
 Y       |canceled|            | 0
 A       |canceled|            | 0
 A       |canceled|            | 1
 B       |active  |            | 0
 B       |canceled|            | 1

因此，我想使用.apply和/或.loc生成：

Customer | Status | new_status | duplicated
 X       |canceled|            | 0
 X       |canceled|            | 1
 X       |active  |            | 2
 Y       |canceled|            | 0
 A       |canceled| canceled   | 0
 A       |canceled| canceled   | 1
 B       |active  |            | 0
 B       |canceled|            | 1

据我所知，您可以尝试：

df['new_status']=(df.groupby('Customer')['Status'].
  transform(lambda x: x.eq('canceled').all()).map({True:'cancelled'})).fillna(df.new_status)
print(df)

    Customer    Status new_status  duplicated
0   X         canceled             0         
1   X         canceled             1         
2   X         active               2         
3   Y         canceled  cancelled  0         
4   A         canceled  cancelled  0         
5   A         canceled  cancelled  1         
6   B         active               0         
7   B         canceled             1

更改预期的o/p后编辑：

df['new_status']=(df.groupby('Customer')['Status'].
             transform(lambda x: x.duplicated(keep=False)&(x.eq('canceled').all()))
                         .map({True:'cancelled',False:''}))
print(df)

  Customer    Status new_status  duplicated
0   X         canceled             0         
1   X         canceled             1         
2   X         active               2         
3   Y         canceled             0         
4   A         canceled  cancelled  0         
5   A         canceled  cancelled  1         
6   B         active               0         
7   B         canceled             1

比较列by for

==

并使用with检查每组的所有值是否为

True

s，然后将

Customer

by与

keep=False

进行比较，以返回所有副本。最后按位

和

（

和

）链接在一起，并通过以下方式设置值：

Y不重复，但必须取消。公式可以包含唯一值，也可以不包含唯一值。性能重要吗？

apply

或

transform

中的调用函数应该很慢，如果数据帧较大。我认为如果没有必要，应该是1800 cols。我认为可以使用。应用并感谢我正在验证值，但看起来正确。我编辑了这篇文章只是为了尽可能清楚，但第一个o/p更接近我想要的。谢谢。我想进一步了解您是如何做到这一点的，以及何时使用.map、.eq和。transform@RicardoFernandes没问题。因此

x.eq（'cancelled'）。all（）

检查状态中的所有分组项是否等于cancelled，因此use

all（）

返回true。而不是使用map将true替换为1，将false替换为空白。最好是删除部分代码，我想你会理解的。：）如果有什么问题，让我知道。干杯非常感谢你。我不知道.transform和.eq方法well@RicardoFernandes-是的，它是按

系列进行分组的。顺便说一句，两种解决方案都是正确的。如果我的答案或其他答案有用，别忘了。谢谢
m1 = df['Status'].eq('canceled').groupby(df['Customer']).transform('all')
m2 = df['Customer'].duplicated(keep=False)

df['new_status'] = np.where(m1 & m2, 'cancelled', '')
print (df)
  Customer    Status new_status  duplicated
0        X  canceled                      0
1        X  canceled                      1
2        X    active                      2
3        Y  canceled                      0
4        A  canceled  cancelled           0
5        A  canceled  cancelled           1
6        B    active                      0
7        B  canceled                      1