Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/327.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/haskell/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python sklearn将RandomizedSearchCV与自定义指标和捕获异常一起使用_Python_Scikit Learn_Random Forest_Cross Validation - Fatal编程技术网

Python sklearn将RandomizedSearchCV与自定义指标和捕获异常一起使用

Python sklearn将RandomizedSearchCV与自定义指标和捕获异常一起使用,python,scikit-learn,random-forest,cross-validation,Python,Scikit Learn,Random Forest,Cross Validation,我正在使用sklearn中的RandomizedSearchCV函数和随机森林分类器。 要查看不同的指标,我使用自定义评分 from sklearn.metrics import make_scorer, roc_auc_score, recall_score, matthews_corrcoef, balanced_accuracy_score, accuracy_score acc = make_scorer(accuracy_score) auc_score = make_scorer

我正在使用sklearn中的RandomizedSearchCV函数和随机森林分类器。 要查看不同的指标,我使用自定义评分

from sklearn.metrics import make_scorer, roc_auc_score, recall_score, matthews_corrcoef, balanced_accuracy_score, accuracy_score

acc = make_scorer(accuracy_score)

auc_score = make_scorer(roc_auc_score)
recall = make_scorer(recall_score)
mcc = make_scorer(matthews_corrcoef)
bal_acc = make_scorer(balanced_accuracy_score)

scoring = {"roc_auc_score": auc_score, "recall": recall, "MCC" : mcc, 'Bal_acc' : bal_acc, "Accuracy": acc }
这些自定义记分器用于随机搜索

rf_random = RandomizedSearchCV(estimator=rf, param_distributions=random_grid, n_iter=100, cv=split, verbose=2,
                               random_state=42, n_jobs=-1, error_score=np.nan, scoring = scoring, iid = True, refit="roc_auc_score")
现在的问题是,当我使用自定义拆分时,AUC抛出了一个异常,因为这个精确拆分只有一个类标签

我不想更改拆分,因此是否有可能在RandomizedSearchCV或make_scorer函数中捕获这些异常? 因此,例如,如果其中一个指标没有计算(由于异常),只需输入NaN并继续下一个模型

编辑: 显然,误差_分数不包括模型训练,但不包括度量计算。如果我使用eg-Accurance,一切正常,我只会在只有一个类标签的折叠中得到警告。如果我使用eg AUC作为度量,我仍然会抛出异常

如果能在这里得到一些想法就太好了

解决方案: 定义自定义记分器,但有例外:

def custom_scorer(y_true, y_pred, actual_scorer):
score = np.nan

try:
  score = actual_scorer(y_true, y_pred)
except ValueError: 
  pass

return score
这导致了一个新的指标:

acc = make_scorer(accuracy_score)
recall = make_scorer(custom_scorer, actual_scorer=recall_score)
new_auc = make_scorer(custom_scorer, actual_scorer=roc_auc_score)
mcc = make_scorer(custom_scorer, actual_scorer=matthews_corrcoef)
bal_acc = make_scorer(custom_scorer,actual_scorer=balanced_accuracy_score)

scoring = {"roc_auc_score": new_auc, "recall": recall, "MCC" : mcc, 'Bal_acc' : bal_acc, "Accuracy": acc }
这反过来又可以传递给RandomizedSearchCV的评分参数

我发现的第二个解决方案是:

def custom_auc(clf, X, y_true):
score = np.nan
y_pred = clf.predict_proba(X)
try:
    score = roc_auc_score(y_true, y_pred[:, 1])
except Exception:
    pass

return score
也可以传递给评分参数:

scoring = {"roc_auc_score": custom_auc, "recall": recall, "MCC" : mcc, 'Bal_acc' : bal_acc, "Accuracy": acc }

(改编自)

您可以拥有一个通用记分器,该记分器可以将其他记分器作为输入,检查结果,捕获它们抛出的任何异常,并返回一个固定值

def custom_scorer(y_true, y_pred, actual_scorer):
    score = np.nan

    try:
      score = actual_scorer(y_true, y_pred)
    except Exception: 
      pass

    return score
然后,您可以使用以下命令调用此命令:

acc = make_scorer(custom_scorer, actual_scorer = accuracy_score)
auc_score = make_scorer(custom_scorer, actual_scorer = roc_auc_score, 
                        needs_threshold=True) # <== Added this to get correct roc
recall = make_scorer(custom_scorer, actual_scorer = recall_score)
mcc = make_scorer(custom_scorer, actual_scorer = matthews_corrcoef)
bal_acc = make_scorer(custom_scorer, actual_scorer = balanced_accuracy_score)
acc=制作记分器(自定义记分器,实际记分器=准确记分器)
auc_分数=制造_分数(自定义_分数,实际_分数=roc_auc分数,

需要(阈值=真)#不完全清楚你想要什么。您使用的是
error\u score=np.nan
,它将满足您的要求。你还需要什么吗,或者它没有按预期工作?我在上面补充了这个问题。基本上,它并没有像预期的那样工作,因为即使有错误,我也会得到例外哦,是的,我的错。
error\u score
仅涵盖
estimator.fit()
。你能举一个例子吗?“
AUC正在抛出一个异常,因为这个精确的分割只有一个类标签。
”?“ValueError:在y_true中只有一个类。在这种情况下,ROC AUC分数没有定义”将是我得到的异常(到目前为止)。我在这里看到了你的想法,并且喜欢解决方法。不幸的是,它失败于:in get return\u ForkingPickler.loads(res)AttributeError:无法获取属性“custom\u scorer”,稍后在stacktrace中:任务未能取消序列化。请确保函数的参数都是可拾取的。使用n_jobs=1运行它很好,所以我想这是多线程的问题。@JennyH即使使用
n_jobs=-1
,我也没有收到任何错误。您是否在另一个文件中定义了记分器并尝试导入它,还是在同一个文件中?我已经更新了最简单的例子,现在开始。我仍然让scikit学习0.20.0。更新到0.20.1有帮助,现在它的工作就像一个魅力。对不起,谢谢你的MWE!我让它在一夜之间运行,不幸的是,我再次抛出了一个异常:如果我在AUC中使用predict_proba=True(不是阈值,因为AUC不需要它,并且也给出了一个错误),我再次得到:ValueError:get predict_proba of shape(96,1),但需要具有两个类的分类器用于自定义评分
import numpy as np
def custom_scorer(y_true, y_pred, actual_scorer):
    score = np.nan

    try:
      score = actual_scorer(y_true, y_pred)
    except Exception: 
      pass

    return score


from sklearn.metrics import make_scorer, roc_auc_score, accuracy_score
acc = make_scorer(custom_scorer, actual_scorer = accuracy_score)
auc_score = make_scorer(custom_scorer, actual_scorer = roc_auc_score, 
                        needs_threshold=True) # <== Added this to get correct roc

from sklearn.datasets import load_iris
X, y = load_iris().data, load_iris().target

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV, KFold
cvv = KFold(3)
params={'criterion':['gini', 'entropy']}
gc = GridSearchCV(DecisionTreeClassifier(), param_grid=params, cv =cvv, 
                  scoring={"roc_auc": auc_score, "accuracy": acc}, 
                  refit="roc_auc", n_jobs=-1, 
                  return_train_score = True, iid=False)
gc.fit(X, y)
print(gc.cv_results_)