Python 如何使用唯一值和条件值在数据帧内计数?
拿这个Python 如何使用唯一值和条件值在数据帧内计数?,python,pandas,dataframe,Python,Pandas,Dataframe,拿这个df: df = pd.DataFrame({'client_id':[0, 0, 0, 1, 1, 1, 2, 2, 2], 'key':['0_382','0_382','0_356','1_365',float('nan'),'1_365',float('nan'),'2_284','2_405'], 'operation':['buy','sell','sell','buy','transfer','buy
df
:
df = pd.DataFrame({'client_id':[0, 0, 0, 1, 1, 1, 2, 2, 2],
'key':['0_382','0_382','0_356','1_365',float('nan'),'1_365',float('nan'),'2_284','2_405'],
'operation':['buy','sell','sell','buy','transfer','buy','fee','buy','buy']})
我需要创建一个名为pos\u id
的列,该列将为每一行提供一个增量值(1,2,3…),用于client\u id
和key
的唯一值,并使用一个条件跳过transfer
和fee
操作的值
结果应该是这样的:
client_id key operation pos_id
0 0 0_382 buy 1
1 0 0_382 sell 1
2 0 0_356 sell 2
3 1 1_365 buy 1
4 1 NaN transfer NaN
5 1 1_365 buy 1
6 2 NaN fee NaN
7 2 2_284 buy 1
8 2 2_405 buy 2
这里有两种方法
第一种方法将['client\u id','key']
分组到'client\u id'
中相同的'pos\u id'
,而不管它们是否连续出现
使用where
屏蔽要忽略的行,然后groupby
+ngroup
和sort=False
将计算唯一的组合。然后减去每组中的最小值,得到从1开始的计数器
s = (df.where(~df['operation'].isin(['transfer', 'fee']))
.groupby(['client_id', 'key'], sort=False).ngroup()
.replace(-1, np.NaN)) # ngroup makes NaN group keys -1.
df['pos_id'] = s - s.groupby(df['client_id']).transform('min') + 1
此方法要求输入至少按
'client\u id'
排序,然后仅在相同'pos\u id'
连续时将相同的密钥分组。删除要忽略的行,然后检查每行中的差异,并在'client\u id'
s = (df.where(~df['operation'].isin(['transfer', 'fee']))
.dropna(how='all'))
s = s['key'].ne(s['key'].shift()) | s['client_id'].ne(s['client_id'].shift())
df['pos_id'] = s.groupby(df['client_id']).cumsum()
对于您的输入,结果如下:
client_id key operation pos_id
0 0 0_382 buy 1.0
1 0 0_382 sell 1.0
2 0 0_356 sell 2.0
3 1 1_365 buy 1.0
4 1 NaN transfer NaN
5 1 1_365 buy 1.0
6 2 NaN fee NaN
7 2 2_284 buy 1.0
8 2 2_405 buy 2.0
client_id key operation pos_id
0 0 0_382 buy 1.0
1 0 0_382 sell 1.0
2 0 0_356 sell 2.0
3 1 1_365 buy 1.0
4 1 NaN transfer NaN
5 1 1_365 buy 1.0
6 2 NaN fee NaN
7 2 2_284 buy 1.0
8 2 2_405 buy 2.0