Python 3.x 是否可以使用列车测试分离和KFold_Python 3.x_Machine Learning_Scikit Learn

Python 3.x 是否可以使用列车测试分离和KFold

python-3.x machine-learning scikit-learn

Python 3.x 是否可以使用列车测试分离和KFold,python-3.x,machine-learning,scikit-learn,Python 3.x,Machine Learning,Scikit Learn,我有一个简单的数据集。首先，我尝试使用train\u test\u split（）分割数据集。然后我尝试使用KFold（）。代码如下所示 def call(X_train: ndarray,X_test: ndarray,y_train: ndarray,y_test: ndarray,k: int,repetitions: int, ) -> Dict: rep_sub = [] for reps in range(repetitions):

我有一个简单的数据集。首先，我尝试使用

train\u test\u split（）

分割数据集。然后我尝试使用

KFold（）

。代码如下所示

def call(X_train: ndarray,X_test: ndarray,y_train: ndarray,y_test: ndarray,k: int,repetitions: int,
) -> Dict:
        rep_sub = []
        for reps in range(repetitions):
            fold_sub = []
            kf = KFold(n_splits=k, shuffle=True)
            for train_index, test_index in kf.split(X_train):
                preds = LinearRegression().fit(X_train[train_index], y_train[train_index]).predict(X_test[test_index])
                sub = preds - y_test[test_index]
                fold_sub.extend(sub)
            rep_sub.extend(fold_sub)
        return rep_sub

if __name__ == "__main__":
    X = np.array([[1, 2], [1, 2], [1, 2], [1, 2], [1, 2], [3, 4], [1, 2], [1, 2], [1, 2], [1, 2], [1, 2], [3, 4]])
    y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    all_preds = call(X_train, X_test, y_train, y_test, k=2, repetitions=2)

我得到错误

索引器错误：索引4超出轴0的范围，大小为3

你介意解释一下我做错了什么吗？我需要使用

5倍外部验证

KFold（X）

将从提供的

中创建

（训练、测试）

索引子集。因此，您只能索引到

。您试图做的是将

测试索引为非X
的内容。在不说明为什么要使用train\u test\u split
和KFold
的奇怪组合的情况下，您应该使用test\u index
索引到提供的X\u train
中，或者忽略它。以下是两种使用方法（同样不评论为什么要使用这种方法）：
案例1
案例2
谢谢你的回答。实际上，我正在尝试将所有sub
创建为相同的维度。如果我使用k=2
则得到相同的维度sub
，但如果我使用k=3
则得到不同的维度sub
。您能给我一些想法（或任何方法）来获得所有k
的sub
的相同维度吗。例如，如果我使用k=3
，我将得到'3sub'，并且它们都具有相同的维度。同样，如果我使用k=2
，我将得到4sub
，并且它们都具有相同的维度。这并不意味着sub
的维度对于k=3
和k=2是相同的（它们可能不同）。
preds = LinearRegression().fit(
    X_train[train_index], y_train[train_index]).predict(X_train[test_index])
sub = preds - y_train[test_index]

preds = LinearRegression().fit(
    X_train[train_index], y_train[train_index]).predict(X_test)
sub = preds - y_test