如何在python的dfply包中使用自定义函数_Python_Dfply

如何在python的dfply包中使用自定义函数

python

如何在python的dfply包中使用自定义函数,python,dfply,Python,Dfply,我尝试使用dfply包在给定条件的情况下创建累加器列，但使用自定义函数失败以钻石数据为例：我想创建一个累加器列，如果price大于500，那么+1，否则+0 我的代码如下： import panda as pd from dfply import * @make_symbolic def accu(s, threshold): cur = 0 res = [] for x in s: if x > threshold:

我尝试使用dfply包在给定条件的情况下创建累加器列，但使用自定义函数失败

以钻石数据为例：我想创建一个累加器列，如果price大于500，那么+1，否则+0

我的代码如下：

import panda as pd
from dfply import *

@make_symbolic
def accu(s, threshold):
    cur = 0
    res = []
    for x in s:
        if x > threshold:
            cur += 1
        res += [cur]
    return pd.Series(res)


(diamonds >> 
 mask(X.color == 'D', X.cut == 'Premium', X.carat > 0.32) >>
 mutate(row_id = row_number(X.price),        # Get the row number
        accu_id = accu(X.price, 500)) >>     # Get the accumulator, this step failed
 arrange(X.row_id) >>
 head(10)
)

price row_id accu_id
498   1      0
499   2      0
501   3      1
502   4      2
400   5      2
503   6      3

Expect输出将如下所示：

import panda as pd
from dfply import *

@make_symbolic
def accu(s, threshold):
    cur = 0
    res = []
    for x in s:
        if x > threshold:
            cur += 1
        res += [cur]
    return pd.Series(res)


(diamonds >> 
 mask(X.color == 'D', X.cut == 'Premium', X.carat > 0.32) >>
 mutate(row_id = row_number(X.price),        # Get the row number
        accu_id = accu(X.price, 500)) >>     # Get the accumulator, this step failed
 arrange(X.row_id) >>
 head(10)
)

price row_id accu_id
498   1      0
499   2      0
501   3      1
502   4      2
400   5      2
503   6      3

试试

return res

而不是

return pd.Series（res）

@Neapolitan，那就行了！谢谢！