使用scikit learn预测python中的截断

使用scikit learn预测python中的截断,python,scikit-learn,Python,Scikit Learn,当我第一次使用python进行数据挖掘时,我面临着调整参数和获得最佳参数值(cutoff、classwt、sampsize)的问题。我正在尝试使用scikit learn中的随机林查找不同类的截止值。我正在使用以下代码 def cutoff_predict(rf,trainArr,cutoff): return (rf.predict_prob(trainArr)[:,1]>cutoff).astype(int) score=[] def custom_f1(cutoff):

当我第一次使用python进行数据挖掘时,我面临着调整参数和获得最佳参数值(cutoff、classwt、sampsize)的问题。我正在尝试使用scikit learn中的随机林查找不同类的截止值。我正在使用以下代码

def cutoff_predict(rf,trainArr,cutoff):
   return (rf.predict_prob(trainArr)[:,1]>cutoff).astype(int)

score=[]
def custom_f1(cutoff):
    def f1_cutoff(rf,trainArr,y):
        ypred=cutoff_predict(rf,trainArr,cutoff)
        return sklearn.metrics.f1_score(Actualres,results)
    return f1_cutoff
for cutoff in np.arange(0.1,0.9,0.1):
    rf = RandomForestClassifier(n_estimators=100) #Random forest generation for Classification
    rf.fit(trainArr, trainRes) #Fit the random forest model
validated=cross_val_score(rf,trainArr,trainRes,cv=10,scoring=custom_f1(cutoff))
    score.append(validated)
但我得到了以下错误

    IndexError                                Traceback (most recent call last)
<ipython-input-14-f8b808ce9a4d> in <module>()
     94     rf = RandomForestClassifier(n_estimators=100) #Random forest generation for Classification
     95     rf.fit(trainArr, trainRes) #Fit the random forest model
---> 96     validated=cross_val_score(rf,trainArr,trainRes,cv=10,scoring=custom_f1(cutoff))
     97     score.append(validated)

C:\Python27\Anaconda\lib\site-packages\sklearn\cross_validation.pyc in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
   1350     X, y = indexable(X, y)
   1351 
-> 1352     cv = _check_cv(cv, X, y, classifier=is_classifier(estimator))
   1353     scorer = check_scoring(estimator, scoring=scoring)
   1354     # We clone the estimator to make sure that all the folds are

C:\Python27\Anaconda\lib\site-packages\sklearn\cross_validation.pyc in _check_cv(cv, X, y, classifier, warn_mask)
   1604         if classifier:
   1605             if type_of_target(y) in ['binary', 'multiclass']:
-> 1606                 cv = StratifiedKFold(y, cv, indices=needs_indices)
   1607             else:
   1608                 cv = KFold(_num_samples(y), cv, indices=needs_indices)

C:\Python27\Anaconda\lib\site-packages\sklearn\cross_validation.pyc in __init__(self, y, n_folds, indices, shuffle, random_state)
    432         for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
    433             for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 434                 label_test_folds = test_folds[y == label]
    435                 # the test split can be too big because we used
    436                 # KFold(max(c, self.n_folds), self.n_folds) instead of

IndexError: too many indices for array
索引器错误回溯(最近一次调用)
在()
94 rf=随机森林分类器(n_估计器=100)#用于分类的随机森林生成
95 rf.拟合(trainArr,trainRes)#拟合随机森林模型
--->96已验证=交叉评分(rf、trainArr、trainRes、cv=10,评分=自定义评分f1(截止))
97分。追加(已验证)
C:\Python27\Anaconda\lib\site packages\sklearn\cross\u validation.pyc in cross\u val\u score(估计器、X、y、评分、cv、n\u作业、详细信息、拟合参数、预调度)
1350 X,y=可转位(X,y)
1351
->1352 cv=_检查_cv(cv,X,y,分类器=is_分类器(估计器))
1353评分员=检查评分(评估员,评分=评分)
1354#我们克隆估计器以确保所有褶皱
C:\Python27\Anaconda\lib\site packages\sklearn\cross\u validation.pyc in\u check\u cv(cv、X、y、分类器、警告掩码)
1604如果分类器:
1605如果['binary','multiclass']中的_目标(y)的类型_:
->1606 cv=层状褶皱(y,cv,指数=需求指数)
1607其他:
1608 cv=KFold(_num_样本(y),cv,指数=需求指数)
C:\Python27\Anaconda\lib\site packages\sklearn\cross\u validation.pyc in\uuuuuu init\uuuuu(self、y、n\u折叠、索引、无序、随机状态)
432对于test_fold_idx,枚举(zip(*per_label_cvs))中的每个标签分割:
433对于zip中的标签(U,测试_U拆分)(唯一的_U标签,每个_U标签拆分):
-->434标签\测试\折叠=测试\折叠[y==标签]
435#由于我们使用了
436#KFold(最大(c,self.n_折叠),self.n_折叠)而不是
索引器:数组的索引太多

这里有什么问题?另外:在'R'中,我们可以选择调整'cutoff'参数(cutoff=1/(类数))。在随机林(scikit学习包)中是否有类似的参数可以在python中调优

你犯了什么错误?您的帖子没有指定。@ASCIITHENASI抱歉。。我更新了问题。现在看起来好多了:-)