Python 特征工具-滚压工具
我正在尝试使用featuretools创建自定义原语滚动求和功能,下面是代码:-Python 特征工具-滚压工具,python,featuretools,Python,Featuretools,我正在尝试使用featuretools创建自定义原语滚动求和功能,下面是代码:- class RollingSumOnDatetime(TransformPrimitive): """Calculates the rolling sum on a Datetime time index column. Description: Given a list of values and a Datetime time index, retu
class RollingSumOnDatetime(TransformPrimitive):
"""Calculates the rolling sum on a Datetime time index column.
Description:
Given a list of values and a Datetime time index, return the rolling sum.
"""
name = "rolling_sum_on_datetime"
input_types = [Numeric, DatetimeTimeIndex]
return_type = Numeric
uses_full_entity = True
description_template = "the rolling sum of {} on {}"
def __init__(self, window=None,on=None):
self.window = window
self.on = on
def get_function(self):
def rolling_sum(to_roll, on_column):
"""method is passed a pandas series"""
# create a DataFrame that has the both columns in it
df = pd.DataFrame({to_roll.name: to_roll, on_column.name: on_column})
rolled_df = df.rolling(window=self.window, on=on_column.name).sum()
return rolled_df[to_roll.name]
return rolling_sum
feature_matrix, feature_defs = ft.dfs(
entityset=es,
n_jobs=10,
target_entity="contracts",
agg_primitives=agg_prim,
trans_primitives=trans_prim,
groupby_trans_primitives=[
RollingSumOnDatetime(window="5D", on=es["days"]["datetime"])
],
max_depth=2,
drop_contains=["contract_id", "merchant_id"],
)
代码的第一部分是自定义原语,在第二部分中我调用函数
它给出了一个错误:
ValueError: setting an array element with a sequence.
当您将原语传递给
groupby\u trans\u原语时,需要在=es[“days”][“datetime”]
上删除。它不是RollingSumOnDatetime
的初始化中的参数,因此不适用
以下是一个最小的、可重复的示例:
来自featuretools.primitives导入聚合Primitive、TransformPrimitive
从featuretools.variable_类型导入数字、日期时间索引
类RollingSumOnDatetime(TransformPrimitive):
“”“计算日期时间索引列上的滚动和。
说明:
给定值列表和日期时间索引,返回滚动和。
"""
name=“在日期时间滚动求和”
输入类型=[数字,日期索引]
返回类型=数字
使用_full_entity=True
description_template=“在{}上{}的滚动和”
定义初始化(自,窗口=无):
self.window=window
def get_功能(自身):
定义滚动总和(到滚动,在滚动列上):
“”方法传递了一个系列“”
#创建一个包含两列的数据帧
df=pd.DataFrame({to_roll.name:to_roll,on_column.name:on_column})
rolling\u df=df.rolling(window=self.window,on=on\u column.name).sum()
返回滚动的\u df[到\u roll.name]
返回滚动和
将featuretools作为ft导入
es=ft.demo.load\u mock\u customer(返回\u entityset=True)
特征矩阵,特征定义=ft.dfs(
entityset=es,
target_entity=“交易”,
agg_原语=[],
trans_原语=[],
groupby_trans_原语=[
滚动sumondateTime(window=“5D”)
]
)
特征定义
如果我们打印出feature\u defs
我们会得到:
[<Feature: session_id>,
<Feature: amount>,
<Feature: product_id>,
<Feature: ROLLING_SUM_ON_DATETIME(amount, transaction_time, window=5D) by product_id>,
<Feature: ROLLING_SUM_ON_DATETIME(amount, transaction_time, window=5D) by session_id>,
<Feature: products.brand>,
<Feature: sessions.customer_id>,
<Feature: sessions.device>,
<Feature: sessions.customers.zip_code>,
<Feature: ROLLING_SUM_ON_DATETIME(amount, sessions.session_start, window=5D) by product_id>,
<Feature: ROLLING_SUM_ON_DATETIME(amount, sessions.session_start, window=5D) by session_id>,
<Feature: ROLLING_SUM_ON_DATETIME(amount, sessions.session_start, window=5D) by sessions.customer_id>]
[,,
,
,
,
,
,
,
,
,
,
,
]
请发布一个。我错过了init中的on部分