使用train_索引创建K数据帧，使用sklearn.cross_validation.Kfold（）在Python中测试Kfold交叉验证的K索引_Python_Scikit Learn_K Fold

使用train_索引创建K数据帧，使用sklearn.cross_validation.Kfold（）在Python中测试Kfold交叉验证的K索引

python scikit-learn

使用train_索引创建K数据帧，使用sklearn.cross_validation.Kfold（）在Python中测试Kfold交叉验证的K索引,python,scikit-learn,k-fold,Python,Scikit Learn,K Fold,我在python中使用sklearn.cross_validation.KFold（）使用5倍交叉验证来查看模型的性能。它在4次折叠中表现良好，在一次特定折叠中表现非常差。由于我是数据科学的新手，我想知道如何从一个特定的折叠中检索数据，以便查看该集合中的数据并找出如何修复它。这很容易。关于K-fold的Sklearn文档中只有一个示例： X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) # create an array y = np.array([1

我在python中使用sklearn.cross_validation.KFold（）使用5倍交叉验证来查看模型的性能。它在4次折叠中表现良好，在一次特定折叠中表现非常差。由于我是数据科学的新手，我想知道如何从一个特定的折叠中检索数据，以便查看该集合中的数据并找出如何修复它。

这很容易。关于K-fold的Sklearn文档中只有一个示例：

X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) # create an array
y = np.array([1, 2, 3, 4]) # Create another array
kf = KFold(n_splits=2) # Define the split - into 2 folds 

for train_index, test_index in kf.split(X):
 print(“TRAIN:”, train_index, “TEST:”, test_index)
 X_train, X_test = X[train_index], X[test_index]
 y_train, y_test = y[train_index], y[test_index]

('TRAIN:', array([2, 3]), 'TEST:', array([0, 1]))
('TRAIN:', array([0, 1]), 'TEST:', array([2, 3]))

您还必须打印在每个步骤中计算的性能。

您使用的库是什么？还有什么语言？是R还是python？您没有在标签上指定任何一个。我很抱歉。我正在使用scikit学习库和Python语言，请将您尝试过的方法的代码添加到您的问题中，这是一个很好的实践，而且它还可以帮助其他用户找到您问题的解决方案。我是否可以不使用excel编写上述代码，也不使用某种循环直接创建数据框？

from pandas import ExcelWriter
from sklearn.model_selection import KFold
kf = KFold(n_splits=3)
fold = 0
writer = ExcelWriter('Kfoldcrossvalidation.xlsx')
for train_index, test_index in kf.split(X2):
    fold += 1
    print("Fold: %s" % fold)
    X_train, X_test = X50.iloc[train_index], X50.iloc[test_index]
    y_train, y_test = Y.iloc[train_index], Y.iloc[test_index]
    print(y_test)
    y_test.to_excel(writer,sheet_name='sheet '  + str(fold))
writer.save()