在DASK随机化搜索CV中实现SMOTEENN
我成功地在管道中使用SMOTEENN和RF实现了一个模型。像这样:在DASK随机化搜索CV中实现SMOTEENN,dask,smote,Dask,Smote,我成功地在管道中使用SMOTEENN和RF实现了一个模型。像这样: import random import numpy as np import pandas as pd from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import RandomizedSearchCV from sklearn.metrics import roc_curve, roc_auc_score, co
import random
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import roc_curve, roc_auc_score, confusion_matrix
from imblearn.over_sampling import SMOTE
from imblearn.combine import SMOTEENN
from imblearn.pipeline import Pipeline
加载数据并获得X_列
,X_测试
,y_列
,以及y_测试
矩阵后,我成功地执行了如下sklearn随机化搜索:
seed = 1706
knn = 10
smoted = SMOTE(sampling_strategy = 'auto',
k_neighbors = knn,
random_state = seed)
mydata = pd.read_csv(datapath)
params_rf = {
'rf__max_depth':[8, 14, 20, 26],
'rf__min_samples_leaf':[8, 15, 22, 29],
'rf__max_features':[6, 12, 18, 24, 30],
'rf__n_estimators':[400, 800]
}
smote_enn = SMOTEENN(smote = smoted)
rf = RandomForestClassifier(criterion = 'gini')
pipeline = Pipeline([('smote_enn', smote_enn), ('rf', rf)]) #<-pipeline with smote and model steps
random.seed(1706)
grid_rf = RandomizedSearchCV(estimator = pipeline,
param_distributions = params_rf,
scoring = 'roc_auc',
cv = 8,
n_jobs = cpu_count()-2,
refit = True,
return_train_score = False,
n_iter = 80)
grid_rf.fit(X_train, y_train.values.ravel())
seed=1706
knn=10
SMOTE=SMOTE(采样策略=‘自动’,
k_近邻=knn,
随机(状态=种子)
mydata=pd.read\u csv(数据路径)
参数rf={
“射频最大深度”:[8,14,20,26],
“rf_uuuMin_uSamples_uLeaf”:[8,15,22,29],
“射频最大功能”:[6,12,18,24,30],
“射频估值器”:[400800]
}
smote_enn=SMOTEENN(smote=smoted)
rf=随机性(标准=‘基尼’)
pipeline=pipeline([('smote_enn',smote_enn),('rf',rf)])\
我已经为dask ml做了一个PR来处理IMBRearn组件,您可以在这里找到它:
您可以将其作为临时解决方案,直到PR被接受。它不起作用的原因是因为dask ml使用的是sklearn的管道
,它不处理拟合重采样
,也不将转换后的y传递到管道中
我已经为dask ml做了一个PR来处理IMBRearn组件,您可以在这里找到它:
您可以将其作为临时解决方案,直到PR被接受。我使用Dask的RandomizedSearchCV遇到了相同的问题。显然,Dask要求您为每个组件实现transform()
方法,而Sklearn的RandomizedSearchCV则没有。我将尝试找到一种方法来解决这个问题。我使用Dask的RandomizedSearchCV遇到了同样的问题。显然,Dask要求您为每个组件实现transform()
方法,而Sklearn的RandomizedSearchCV则没有。我会设法解决这个问题。
from dask_ml.model_selection import RandomizedSearchCV as DaskRandomGridSearchCV
grid_rf = DaskRandomGridSearchCV(estimator = pipeline,
param_distributions = params_rf,
scoring = 'roc_auc',
cv = 8,
###n_jobs = cpu_count()-2, <-not needed b/c of dask
refit = True,
return_train_score = False,
n_iter = 80)
grid_rf.fit(X_train, y_train.values.ravel())
AttributeError: 'SMOTEENN' object has no attribute 'transform'