Python 特征工具-滚压工具

Python 特征工具-滚压工具,python,featuretools,Python,Featuretools,我正在尝试使用featuretools创建自定义原语滚动求和功能,下面是代码:- class RollingSumOnDatetime(TransformPrimitive): """Calculates the rolling sum on a Datetime time index column. Description: Given a list of values and a Datetime time index, retu

我正在尝试使用featuretools创建自定义原语滚动求和功能,下面是代码:-

class RollingSumOnDatetime(TransformPrimitive):
    """Calculates the rolling sum on a Datetime time index column.
    Description:
        Given a list of values and a Datetime time index, return the rolling sum.
    """

    name = "rolling_sum_on_datetime"
    input_types = [Numeric, DatetimeTimeIndex]
    return_type = Numeric
    uses_full_entity = True
    description_template = "the rolling sum of {} on {}"

    def __init__(self, window=None,on=None):
        self.window = window
        self.on = on


    def get_function(self):
        def rolling_sum(to_roll, on_column):
            """method is passed a pandas series"""
            # create a DataFrame that has the both columns in it
            df = pd.DataFrame({to_roll.name: to_roll, on_column.name: on_column})
            rolled_df = df.rolling(window=self.window, on=on_column.name).sum()
            return rolled_df[to_roll.name]

        return rolling_sum


feature_matrix, feature_defs = ft.dfs(
            entityset=es,
            n_jobs=10,
            target_entity="contracts",
            agg_primitives=agg_prim,
            trans_primitives=trans_prim,
            groupby_trans_primitives=[
                RollingSumOnDatetime(window="5D", on=es["days"]["datetime"])
            ],
            max_depth=2,
            drop_contains=["contract_id", "merchant_id"],
        )
代码的第一部分是自定义原语,在第二部分中我调用函数 它给出了一个错误:

ValueError: setting an array element with a sequence.

当您将原语传递给
groupby\u trans\u原语时,需要在=es[“days”][“datetime”]
上删除
。它不是
RollingSumOnDatetime
初始化中的参数,因此不适用

以下是一个最小的、可重复的示例:

来自featuretools.primitives导入聚合Primitive、TransformPrimitive
从featuretools.variable_类型导入数字、日期时间索引
类RollingSumOnDatetime(TransformPrimitive):
“”“计算日期时间索引列上的滚动和。
说明:
给定值列表和日期时间索引,返回滚动和。
"""
name=“在日期时间滚动求和”
输入类型=[数字,日期索引]
返回类型=数字
使用_full_entity=True
description_template=“在{}上{}的滚动和”
定义初始化(自,窗口=无):
self.window=window
def get_功能(自身):
定义滚动总和(到滚动,在滚动列上):
“”方法传递了一个系列“”
#创建一个包含两列的数据帧
df=pd.DataFrame({to_roll.name:to_roll,on_column.name:on_column})
rolling\u df=df.rolling(window=self.window,on=on\u column.name).sum()
返回滚动的\u df[到\u roll.name]
返回滚动和
将featuretools作为ft导入
es=ft.demo.load\u mock\u customer(返回\u entityset=True)
特征矩阵,特征定义=ft.dfs(
entityset=es,
target_entity=“交易”,
agg_原语=[],
trans_原语=[],
groupby_trans_原语=[
滚动sumondateTime(window=“5D”)
]
)
特征定义
如果我们打印出
feature\u defs
我们会得到:

[<Feature: session_id>,
 <Feature: amount>,
 <Feature: product_id>,
 <Feature: ROLLING_SUM_ON_DATETIME(amount, transaction_time, window=5D) by product_id>,
 <Feature: ROLLING_SUM_ON_DATETIME(amount, transaction_time, window=5D) by session_id>,
 <Feature: products.brand>,
 <Feature: sessions.customer_id>,
 <Feature: sessions.device>,
 <Feature: sessions.customers.zip_code>,
 <Feature: ROLLING_SUM_ON_DATETIME(amount, sessions.session_start, window=5D) by product_id>,
 <Feature: ROLLING_SUM_ON_DATETIME(amount, sessions.session_start, window=5D) by session_id>,
 <Feature: ROLLING_SUM_ON_DATETIME(amount, sessions.session_start, window=5D) by sessions.customer_id>]
[,,
,
,
,
,
,
,
,
,
,
,
]
请发布一个。我错过了init中的on部分