Python 如何使用唯一值和条件值在数据帧内计数?

Python 如何使用唯一值和条件值在数据帧内计数?,python,pandas,dataframe,Python,Pandas,Dataframe,拿这个df: df = pd.DataFrame({'client_id':[0, 0, 0, 1, 1, 1, 2, 2, 2], 'key':['0_382','0_382','0_356','1_365',float('nan'),'1_365',float('nan'),'2_284','2_405'], 'operation':['buy','sell','sell','buy','transfer','buy

拿这个
df

df = pd.DataFrame({'client_id':[0, 0, 0, 1, 1, 1, 2, 2, 2],
                   'key':['0_382','0_382','0_356','1_365',float('nan'),'1_365',float('nan'),'2_284','2_405'],
                   'operation':['buy','sell','sell','buy','transfer','buy','fee','buy','buy']})
我需要创建一个名为
pos\u id
的列,该列将为每一行提供一个增量值(1,2,3…),用于
client\u id
key
的唯一值,并使用一个条件跳过
transfer
fee
操作的值

结果应该是这样的:

   client_id    key operation pos_id
0          0  0_382       buy      1
1          0  0_382      sell      1
2          0  0_356      sell      2
3          1  1_365       buy      1
4          1    NaN  transfer    NaN
5          1  1_365       buy      1
6          2    NaN       fee    NaN
7          2  2_284       buy      1
8          2  2_405       buy      2
这里有两种方法

第一种方法将
['client\u id','key']
分组到
'client\u id'
中相同的
'pos\u id'
,而不管它们是否连续出现

使用
where
屏蔽要忽略的行,然后
groupby
+
ngroup
sort=False
将计算唯一的组合。然后减去每组中的最小值,得到从1开始的计数器

s = (df.where(~df['operation'].isin(['transfer', 'fee']))
       .groupby(['client_id', 'key'], sort=False).ngroup()
       .replace(-1, np.NaN))  # ngroup makes NaN group keys -1.

df['pos_id'] = s - s.groupby(df['client_id']).transform('min') + 1

此方法要求输入至少按
'client\u id'
排序,然后仅在相同
'pos\u id'
连续时将相同的密钥分组。删除要忽略的行,然后检查每行中的差异,并在
'client\u id'

s = (df.where(~df['operation'].isin(['transfer', 'fee']))
       .dropna(how='all'))

s = s['key'].ne(s['key'].shift()) | s['client_id'].ne(s['client_id'].shift())
df['pos_id'] = s.groupby(df['client_id']).cumsum()

对于您的输入,结果如下:

   client_id    key operation  pos_id
0          0  0_382       buy     1.0
1          0  0_382      sell     1.0
2          0  0_356      sell     2.0
3          1  1_365       buy     1.0
4          1    NaN  transfer     NaN
5          1  1_365       buy     1.0
6          2    NaN       fee     NaN
7          2  2_284       buy     1.0
8          2  2_405       buy     2.0
   client_id    key operation  pos_id
0          0  0_382       buy     1.0
1          0  0_382      sell     1.0
2          0  0_356      sell     2.0
3          1  1_365       buy     1.0
4          1    NaN  transfer     NaN
5          1  1_365       buy     1.0
6          2    NaN       fee     NaN
7          2  2_284       buy     1.0
8          2  2_405       buy     2.0