Python 比较表中一列中的两个重复值
我有一列客户名称,当客户有2个产品时,该列会重复。我必须创建一个新状态,根据情况将客户状态分组为一个状态。因此,我必须将客户X与另一个X进行比较,以生成一个新的状态Python 比较表中一列中的两个重复值,python,pandas,Python,Pandas,我有一列客户名称,当客户有2个产品时,该列会重复。我必须创建一个新状态,根据情况将客户状态分组为一个状态。因此,我必须将客户X与另一个X进行比较,以生成一个新的状态 Customer|Status |Cancaled_at|new status X |Active |- | X |Canceled|2019-xx-xx | Y |Active |- | Z |Active |- | A
Customer|Status |Cancaled_at|new status
X |Active |- |
X |Canceled|2019-xx-xx |
Y |Active |- |
Z |Active |- |
A |Canceled|- |
所需输出:
Customer|Status |Cancaled_at|new status
X |Active |- |Canceled
X |Canceled|2019-xx-xx |Canceled
Y |Active |- |
Z |Active |- |
A |Canceled|- |
我认为你需要:
df = pd.DataFrame({'Customer':['X','X','Y','Z','A'], 'status':['active','canceled','active','active','canceled'],
'Canceled_at':[None, '2019-01-01', None, None,None]})
df['new_status'] = np.where((df['status']=='canceled') & (~df['Canceled_at'].isnull()), 'canceled', None)
df['new_status'] = df.groupby('Customer')['new_status'].bfill()
print(df)
输出:
Canceled_at Customer status new_status
0 None X active canceled
1 2019-01-01 X canceled canceled
2 None Y active None
3 None Z active None
4 None A canceled None
有一种简单的方法可以在pandas中查找所有重复的值:
df['new_status'][(df.duplicated('Customer', False))] = 'Canceled'
这将使数据帧的Customer列具有重复值的new\u状态列Canceled
。此代码使用,并且:
生成以下输出:
Customer Status Cancaled_at new_status
0 X Active - Canceled
1 X Canceled 2019-xx-xx Canceled
2 Y Active -
3 Z Active -
4 A Canceled - Canceled
请共享预期输出在这种情况下,如果我有来自客户X的已取消产品,则输出将是一个新状态,在X和X中都有一个已取消。我将在最后一行编辑不希望已取消
。
Customer Status Cancaled_at new_status
0 X Active - Canceled
1 X Canceled 2019-xx-xx Canceled
2 Y Active -
3 Z Active -
4 A Canceled - Canceled