Python 基于另一列进行计数的列_Python_Pandas

Python 基于另一列进行计数的列

python pandas

Python 基于另一列进行计数的列,python,pandas,Python,Pandas,我有一个这样的数据框 df = pd.DataFrame({'transaction_id':[12565,12565,12743,12743,13456,13456,13856], 'rep_id':[560,560,560,560,287,287,287]}) 我想创建一个带有如下计数器的新列 transaction_id rep_id trans_num 0 12565 560

我有一个这样的数据框

 df = pd.DataFrame({'transaction_id':[12565,12565,12743,12743,13456,13456,13856],
                'rep_id':[560,560,560,560,287,287,287]})

我想创建一个带有如下计数器的新列

       transaction_id   rep_id  trans_num
    0           12565      560          1
    1           12565      560          1
    2           12743      560          2
    3           12743      560          2
    4           13456      287          1
    5           13456      287          1
    6           13856      287          2

尝试使用

transform

factorize

df['new']=df.groupby('rep_id').transaction_id.transform(lambda x : pd.factorize(x)[0]+1)
df
Out[389]: 
   transaction_id  rep_id  new
0           12565     560    1
1           12565     560    1
2           12743     560    2
3           12743     560    2
4           13456     287    1
5           13456     287    1
6           13856     287    2

根据您的数据（

transaction\u id

不同，如果

rep\u id

不同），我们还可以执行以下操作：

df['new'] = (df['transaction_id'].ne(df['transaction_id'].shift())
    .groupby(df['rep_id']).cumsum()
)

更新：您也可以使用

排名

，尽管其行为有点不同：

df.groupby('rep_id')['transaction_id'].rank('dense').astype(int)

输出：

   transaction_id  rep_id  new
0           12565     560    1
1           12565     560    1
2           12743     560    2
3           12743     560    2
4           13456     287    1
5           13456     287    1
6           13856     287    2

不同的代表id可以有相同的事务id。@我知道了，用不同的方法更新了答案。新的方法似乎有效。感谢您也可以

df.groupby（'rep_id'）['transaction_id'].transform（lambda x:x.astype（'category'）.cat.codes+1）

work@wwnde某个时候，该类别将返回un订单代码~：-）@BENY，我怀疑是这样，因此使用maybe。我将留下我的评论，目的是为了巩固知识，消除人们认为这是一种替代

pd.factorize

的想法。好answer@wwnde总之，

astype（'category'）

将比

factorize

慢一点，因为它需要做一些簿记，而

factorize

直接返回numy数组。我甚至看到

factorize

比

np.unique

更快。也就是说，我通常尝试使用lambda函数来避免

apply/transform

。不过答案还是不错的。