Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/285.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 定制&x27;k'处的精度;sklearn for GridSearchCV中的评分对象_Python_Scikit Learn_Cross Validation_Grid Search - Fatal编程技术网

Python 定制&x27;k'处的精度;sklearn for GridSearchCV中的评分对象

Python 定制&x27;k'处的精度;sklearn for GridSearchCV中的评分对象,python,scikit-learn,cross-validation,grid-search,Python,Scikit Learn,Cross Validation,Grid Search,我目前正在尝试使用scikit learn中的GridSearchCV,使用“k精度”评分指标调整超参数,如果我将分类器得分的第k个百分位数分类为正类,该评分指标将为我提供精度。我知道可以使用make_scorer和创建score函数创建自定义记分器。这就是我现在拥有的: from sklearn import metrics from sklearn.grid_search import GridSearchCV from sklearn.linear_model import Logisti

我目前正在尝试使用scikit learn中的GridSearchCV,使用“k精度”评分指标调整超参数,如果我将分类器得分的第k个百分位数分类为正类,该评分指标将为我提供精度。我知道可以使用make_scorer和创建score函数创建自定义记分器。这就是我现在拥有的:

from sklearn import metrics
from sklearn.grid_search import GridSearchCV
from sklearn.linear_model import LogisticRegression

def precision_at_k(y_true, y_score, k):
    df = pd.DataFrame({'true': y_true, 'score': y_score}).sort('score')
    threshold = df.iloc[int(k*len(df)),1]
    y_pred = pd.Series([1 if i >= threshold else 0 for i in df['score']])
    return metrics.precision_score(y_true, y_pred)

custom_scorer = metrics.make_scorer(precision_at_k, needs_proba=True, k=0.1)

X = np.random.randn(100, 10)
Y = np.random.binomial(1, 0.3, 100)

train_index = range(0, 70)
test_index = range(70, 100)
train_x = X[train_index]
train_Y = Y[train_index]
test_x = X[test_index]
test_Y = Y[test_index]

clf = LogisticRegression()
params = {'C': [0.01, 0.1, 1, 10]}
clf_gs = GridSearchCV(clf, params, scoring=custom_scorer)
clf_gs.fit(train_x, train_Y)

然而,尝试调用
fit
会给我
异常:数据必须是一维的,我不知道为什么。有人能帮忙吗?提前感谢。

pd.DataFrame的参数应该是'list'而不是'numpy.arrays'

所以,只要尝试将y_true转换为python列表

df = pd.DataFrame({'true': y_true.tolist(), 'score': y_score.tolist()}).sort('score')

在阅读本文之后,我发现了一个很好的实现,希望它能有所帮助