如何在python的dfply包中使用自定义函数

如何在python的dfply包中使用自定义函数,python,dfply,Python,Dfply,我尝试使用dfply包在给定条件的情况下创建累加器列,但使用自定义函数失败 以钻石数据为例: 我想创建一个累加器列,如果price大于500,那么+1,否则+0 我的代码如下: import panda as pd from dfply import * @make_symbolic def accu(s, threshold): cur = 0 res = [] for x in s: if x > threshold:

我尝试使用dfply包在给定条件的情况下创建累加器列,但使用自定义函数失败

以钻石数据为例: 我想创建一个累加器列,如果price大于500,那么+1,否则+0

我的代码如下:

import panda as pd
from dfply import *

@make_symbolic
def accu(s, threshold):
    cur = 0
    res = []
    for x in s:
        if x > threshold:
            cur += 1
        res += [cur]
    return pd.Series(res)


(diamonds >> 
 mask(X.color == 'D', X.cut == 'Premium', X.carat > 0.32) >>
 mutate(row_id = row_number(X.price),        # Get the row number
        accu_id = accu(X.price, 500)) >>     # Get the accumulator, this step failed
 arrange(X.row_id) >>
 head(10)
)
price row_id accu_id
498   1      0
499   2      0
501   3      1
502   4      2
400   5      2
503   6      3
Expect输出将如下所示:

import panda as pd
from dfply import *

@make_symbolic
def accu(s, threshold):
    cur = 0
    res = []
    for x in s:
        if x > threshold:
            cur += 1
        res += [cur]
    return pd.Series(res)


(diamonds >> 
 mask(X.color == 'D', X.cut == 'Premium', X.carat > 0.32) >>
 mutate(row_id = row_number(X.price),        # Get the row number
        accu_id = accu(X.price, 500)) >>     # Get the accumulator, this step failed
 arrange(X.row_id) >>
 head(10)
)
price row_id accu_id
498   1      0
499   2      0
501   3      1
502   4      2
400   5      2
503   6      3

试试
return res
而不是
return pd.Series(res)
@Neapolitan,那就行了!谢谢!