Python Scikit:自定义估计器:无法克隆
我已经为自动清理特定数据集编写了自己的估计器。我认为我正确地遵循了scikit规则:Python Scikit:自定义估计器:无法克隆,python,scikit-learn,Python,Scikit Learn,我已经为自动清理特定数据集编写了自己的估计器。我认为我正确地遵循了scikit规则: from sklearn.base import BaseEstimator, TransformerMixin import pandas as pd from pathlib import Path class cleaning(BaseEstimator, TransformerMixin): def __init__(self, to_drop = [], ins_threshold=0.6
from sklearn.base import BaseEstimator, TransformerMixin
import pandas as pd
from pathlib import Path
class cleaning(BaseEstimator, TransformerMixin):
def __init__(self, to_drop = [], ins_threshold=0.6,
corr_threshold=0.7, attribute_filepath='attribute.xlsx'): # no *args or **kargs, provides methods get_params() and set_params()
"""
Parameters:
-----------
to_drop (list) : columns to be dropped
ins_thresholrd (float) : [0.0 - 1.0] insignificant threshold above which columns containing that proportion of NaN get dropped
corr_threshold (float) : [0.0 - 1.0] correlation threshold above which correlated columns get dropped (first one is kept)
attribute_filepath (str of pathlib.Path) : path to the Excel file containing attributes information
"""
self.attribute_filepath = Path(attribute_filepath)
self.ins_threshold = ins_threshold
self.corr_threshold = corr_threshold
self.to_drop = to_drop
self.ins_col = None
self.correlated_col = None
但我还是收到了错误
RuntimeError: Cannot clone object cleaning(attribute_filepath=PosixPath('MyFile.xlsx')), as the constructor either does not set or modifies parameter attribute_filepath
我不明白为什么,
self.attribute\u filepath
在我的\uuu init\uuuu
中有明确的定义?虽然很晚了,但当我遇到类似的问题时,我会尝试给出答案。对我来说,当您将self.attribute\u filepath
设置为与默认参数不同的内容时,您有点违反了scikit学习API约定。
事实上,谈到\uuu init\uuu
的方法并引用:
不应有逻辑,甚至不应有输入验证,参数也不应更改。相应的逻辑应该放在使用参数的地方,通常是合适的。以下是错误的:
如果将自定义转换器传递到GridSearchCV()
,则会发生什么情况?我猜您正在将转换器传递到管道
,而这反过来又传递到GridSearchCV()
,或者您正在执行类似的操作,并因此对获取的实例调用fit()
,具体如下:
\uuuu init\uu
方法)李>
def __init__(self, param1=1, param2=2, param3=3):
# WRONG: parameters should not be modified
if param1 > 1:
param2 += 1
self.param1 = param1
# WRONG: the object's attributes should have exactly the name of
# the argument in the constructor
self.param3 = param2