Python 如何在交叉学习函数中集成G-均值?
在上述交叉验证评分参数下,我如何计算以下G-均值:Python 如何在交叉学习函数中集成G-均值?,python,machine-learning,scikit-learn,classification,Python,Machine Learning,Scikit Learn,Classification,在上述交叉验证评分参数下,我如何计算以下G-均值: from sklearn.model_selection import cross_validate scores = cross_validate(LogisticRegression(class_weight='balanced',max_iter=100000), X,y, cv=5, scoring=('roc_auc', 'average_precision','f1','recall'
from sklearn.model_selection import cross_validate
scores = cross_validate(LogisticRegression(class_weight='balanced',max_iter=100000),
X,y, cv=5, scoring=('roc_auc', 'average_precision','f1','recall','balanced_accuracy'))
scores['test_roc_auc'].mean(), scores['test_average_precision'].mean(),scores['test_f1'].mean(),scores['test_recall'].mean(),scores['test_balanced_accuracy'].mean()
或
您需要制作一个自定义记分器,下面是一个示例: 然后,如果这是你想要的唯一得分手,你可以:
from sklearn.metrics import accuracy_score
g_mean = 1.0
#
for label in np.unique(y_test):
idx = (y_test == label)
g_mean *= accuracy_score(y_test[idx], y_test_pred[idx])
#
g_mean = np.sqrt(g_mean)
score = g_mean
print(score)
我认为你可以使用另一个记分员,如文档中所述:
scores = cross_validate(LogisticRegression(class_weight='balanced',max_iter=100000),
X,y, cv=5, scoring=your_custom_function)
只需通过它作为一个自定义记分员
If scoring reprents multiple scores, one can use:
a list or tuple of unique strings;
a callable returning a dictionary where the keys are the metric names and the values are the metric scores;
a dictionary with metric names as keys and callables a values.
设置越大\u越好=真
,因为最佳值接近1。geometrics\u mean\u score
的附加参数可以直接传递给make\u scorer
完整示例
from sklearn.metrics import make_scorer
from imblearn.metrics import geometric_mean_score
gm_scorer = make_scorer(geometric_mean_score, greater_is_better=True, average='binary')
编辑
要指定多个指标,请将dict传递给评分
参数
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from imblearn.metrics import geometric_mean_score
X, y = load_breast_cancer(return_X_y=True)
gm_scorer = make_scorer(geometric_mean_score, greater_is_better=True)
scores = cross_validate(
LogisticRegression(class_weight='balanced',max_iter=100000),
X,y,
cv=5,
scoring=gm_scorer
)
scores
>>>
{'fit_time': array([0.76488066, 0.69808364, 1.22158527, 0.94157672, 1.01577377]),
'score_time': array([0.00103951, 0.00100923, 0.00065804, 0.00071168, 0.00068736]),
'test_score': array([0.91499142, 0.93884403, 0.9860133 , 0.92439026, 0.9525989 ])}
我是否需要在自定义函数参数中传递**kwargs?您的函数定义应该如下所示:def geometric_mean_score(y_test,y_pred,**kwargs)。然后你必须通过它,这样做记分员:使记分员(几何平均记分)。这将输出您的自定义记分器,您应该能够将其放入cross_validate函数中为什么使用average='binary'?此外,是否可以将“roc_auc”、“平均精度”与gm_记分员一起使用?我尝试使用score=('gm_scorer'、'roc_auc'、'average_precision'),但没有成功@ForestGump是的,您可以通过字典将多个指标(其中一个是自定义函数)传递到
评分
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from imblearn.metrics import geometric_mean_score
X, y = load_breast_cancer(return_X_y=True)
gm_scorer = make_scorer(geometric_mean_score, greater_is_better=True)
scores = cross_validate(
LogisticRegression(class_weight='balanced',max_iter=100000),
X,y,
cv=5,
scoring=gm_scorer
)
scores
>>>
{'fit_time': array([0.76488066, 0.69808364, 1.22158527, 0.94157672, 1.01577377]),
'score_time': array([0.00103951, 0.00100923, 0.00065804, 0.00071168, 0.00068736]),
'test_score': array([0.91499142, 0.93884403, 0.9860133 , 0.92439026, 0.9525989 ])}
scores = cross_validate(
LogisticRegression(class_weight='balanced',max_iter=100000),
X,y,
cv=5,
scoring={'gm_scorer': gm_scorer, 'AUC': 'roc_auc', 'Avg_Precision': 'average_precision'}
)
scores
>>>
{'fit_time': array([1.03509665, 0.96399784, 1.49760461, 1.13874388, 1.32006526]),
'score_time': array([0.00560617, 0.00357151, 0.0057447 , 0.00566769, 0.00549698]),
'test_gm_scorer': array([0.91499142, 0.93884403, 0.9860133 , 0.92439026, 0.9525989 ]),
'test_AUC': array([0.99443171, 0.99344907, 0.99801587, 0.97949735, 0.99765258]),
'test_Avg_Precision': array([0.99670544, 0.99623085, 0.99893162, 0.98640759, 0.99861043])}