使用scikit learn预测python中的截断_Python_Scikit Learn

使用scikit learn预测python中的截断

python scikit-learn

使用scikit learn预测python中的截断,python,scikit-learn,Python,Scikit Learn,当我第一次使用python进行数据挖掘时，我面临着调整参数和获得最佳参数值（cutoff、classwt、sampsize）的问题。我正在尝试使用scikit learn中的随机林查找不同类的截止值。我正在使用以下代码 def cutoff_predict(rf,trainArr,cutoff): return (rf.predict_prob(trainArr)[:,1]>cutoff).astype(int) score=[] def custom_f1(cutoff):

当我第一次使用python进行数据挖掘时，我面临着调整参数和获得最佳参数值（cutoff、classwt、sampsize）的问题。我正在尝试使用scikit learn中的随机林查找不同类的截止值。我正在使用以下代码

def cutoff_predict(rf,trainArr,cutoff):
   return (rf.predict_prob(trainArr)[:,1]>cutoff).astype(int)

score=[]
def custom_f1(cutoff):
    def f1_cutoff(rf,trainArr,y):
        ypred=cutoff_predict(rf,trainArr,cutoff)
        return sklearn.metrics.f1_score(Actualres,results)
    return f1_cutoff
for cutoff in np.arange(0.1,0.9,0.1):
    rf = RandomForestClassifier(n_estimators=100) #Random forest generation for Classification
    rf.fit(trainArr, trainRes) #Fit the random forest model
validated=cross_val_score(rf,trainArr,trainRes,cv=10,scoring=custom_f1(cutoff))
    score.append(validated)

但我得到了以下错误

    IndexError                                Traceback (most recent call last)
<ipython-input-14-f8b808ce9a4d> in <module>()
     94     rf = RandomForestClassifier(n_estimators=100) #Random forest generation for Classification
     95     rf.fit(trainArr, trainRes) #Fit the random forest model
---> 96     validated=cross_val_score(rf,trainArr,trainRes,cv=10,scoring=custom_f1(cutoff))
     97     score.append(validated)

C:\Python27\Anaconda\lib\site-packages\sklearn\cross_validation.pyc in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
   1350     X, y = indexable(X, y)
   1351 
-> 1352     cv = _check_cv(cv, X, y, classifier=is_classifier(estimator))
   1353     scorer = check_scoring(estimator, scoring=scoring)
   1354     # We clone the estimator to make sure that all the folds are

C:\Python27\Anaconda\lib\site-packages\sklearn\cross_validation.pyc in _check_cv(cv, X, y, classifier, warn_mask)
   1604         if classifier:
   1605             if type_of_target(y) in ['binary', 'multiclass']:
-> 1606                 cv = StratifiedKFold(y, cv, indices=needs_indices)
   1607             else:
   1608                 cv = KFold(_num_samples(y), cv, indices=needs_indices)

C:\Python27\Anaconda\lib\site-packages\sklearn\cross_validation.pyc in __init__(self, y, n_folds, indices, shuffle, random_state)
    432         for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
    433             for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 434                 label_test_folds = test_folds[y == label]
    435                 # the test split can be too big because we used
    436                 # KFold(max(c, self.n_folds), self.n_folds) instead of

IndexError: too many indices for array

索引器错误回溯（最近一次调用）
在（）
94 rf=随机森林分类器（n_估计器=100）#用于分类的随机森林生成
95 rf.拟合（trainArr，trainRes）#拟合随机森林模型
--->96已验证=交叉评分（rf、trainArr、trainRes、cv=10，评分=自定义评分f1（截止））
97分。追加（已验证）
C:\Python27\Anaconda\lib\site packages\sklearn\cross\u validation.pyc in cross\u val\u score（估计器、X、y、评分、cv、n\u作业、详细信息、拟合参数、预调度）
1350 X，y=可转位（X，y）
1351
->1352 cv=_检查_cv（cv，X，y，分类器=is_分类器（估计器））
1353评分员=检查评分（评估员，评分=评分）
1354#我们克隆估计器以确保所有褶皱
C:\Python27\Anaconda\lib\site packages\sklearn\cross\u validation.pyc in\u check\u cv（cv、X、y、分类器、警告掩码）
1604如果分类器：
1605如果['binary'，'multiclass']中的_目标（y）的类型_：
->1606 cv=层状褶皱（y，cv，指数=需求指数）
1607其他：
1608 cv=KFold（_num_样本（y），cv，指数=需求指数）
C:\Python27\Anaconda\lib\site packages\sklearn\cross\u validation.pyc in\uuuuuu init\uuuuu（self、y、n\u折叠、索引、无序、随机状态）
432对于test_fold_idx，枚举（zip（*per_label_cvs））中的每个标签分割：
433对于zip中的标签（U，测试_U拆分）（唯一的_U标签，每个_U标签拆分）：
-->434标签\测试\折叠=测试\折叠[y==标签]
435#由于我们使用了
436#KFold（最大（c，self.n_折叠），self.n_折叠）而不是
索引器：数组的索引太多

这里有什么问题？另外：在'R'中，我们可以选择调整'cutoff'参数（cutoff=1/（类数））。在随机林（scikit学习包）中是否有类似的参数可以在python中调优

你犯了什么错误？您的帖子没有指定。@ASCIITHENASI抱歉。。我更新了问题。现在看起来好多了：-）