Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/293.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 比较表中一列中的两个重复值_Python_Pandas - Fatal编程技术网

Python 比较表中一列中的两个重复值

Python 比较表中一列中的两个重复值,python,pandas,Python,Pandas,我有一列客户名称,当客户有2个产品时,该列会重复。我必须创建一个新状态,根据情况将客户状态分组为一个状态。因此,我必须将客户X与另一个X进行比较,以生成一个新的状态 Customer|Status |Cancaled_at|new status X |Active |- | X |Canceled|2019-xx-xx | Y |Active |- | Z |Active |- | A

我有一列客户名称,当客户有2个产品时,该列会重复。我必须创建一个新状态,根据情况将客户状态分组为一个状态。因此,我必须将客户X与另一个X进行比较,以生成一个新的状态

Customer|Status  |Cancaled_at|new status
X       |Active  |-          |
X       |Canceled|2019-xx-xx |
Y       |Active  |-          |
Z       |Active  |-          |
A       |Canceled|-          |
所需输出:

Customer|Status  |Cancaled_at|new status
X       |Active  |-          |Canceled
X       |Canceled|2019-xx-xx |Canceled
Y       |Active  |-          |
Z       |Active  |-          |
A       |Canceled|-          |
我认为你需要:

df = pd.DataFrame({'Customer':['X','X','Y','Z','A'], 'status':['active','canceled','active','active','canceled'],
    'Canceled_at':[None, '2019-01-01', None, None,None]})


df['new_status'] = np.where((df['status']=='canceled') & (~df['Canceled_at'].isnull()), 'canceled', None)
df['new_status'] = df.groupby('Customer')['new_status'].bfill()

print(df)
输出:

 Canceled_at   Customer    status new_status                                                                                           
0        None        X    active   canceled                                                                                           
1  2019-01-01        X  canceled   canceled                                                                                           
2        None        Y    active       None                                                                                           
3        None        Z    active       None                                                                                           
4        None        A  canceled       None

有一种简单的方法可以在pandas中查找所有重复的值:

df['new_status'][(df.duplicated('Customer', False))] = 'Canceled'
这将使数据帧的Customer列具有重复值的
new\u状态
Canceled

此代码使用,并且:

生成以下输出:

    Customer    Status      Cancaled_at new_status
0   X           Active      -           Canceled
1   X           Canceled    2019-xx-xx  Canceled
2   Y           Active      -   
3   Z           Active      -   
4   A           Canceled    -           Canceled

请共享预期输出在这种情况下,如果我有来自客户X的已取消产品,则输出将是一个新状态,在X和X中都有一个已取消。我将在最后一行编辑不希望
已取消
    Customer    Status      Cancaled_at new_status
0   X           Active      -           Canceled
1   X           Canceled    2019-xx-xx  Canceled
2   Y           Active      -   
3   Z           Active      -   
4   A           Canceled    -           Canceled