Python中的K折叠_Python - Fatal编程技术网

Python中的K折叠

python

Python中的K折叠,python,Python,给定一个运行5倍交叉验证的大型数据帧，如何将每个折叠存储在一个训练和测试数组中请参见此处的scikit学习文档：以下是他们给出的示例： >>> import numpy as np >>> from sklearn.model_selection import KFold >>> X = ["a", "b", "c", "d"] >>> kf = KFold(n_splits=2) >>> for t

给定一个运行5倍交叉验证的大型数据帧，如何将每个折叠存储在一个训练和测试数组中

请参见此处的scikit学习文档：

以下是他们给出的示例：

>>> import numpy as np
>>> from sklearn.model_selection import KFold

>>> X = ["a", "b", "c", "d"]
>>> kf = KFold(n_splits=2)
>>> for train, test in kf.split(X):
...     print("%s %s" % (train, test))
[2 3] [0 1]
[0 1] [2 3]

Each fold is constituted by two arrays: the first one is related to the training set, and the second one to the test set. Thus, one can create the training/test sets using numpy indexing:

>>>
>>> X = np.array([[0., 0.], [1., 1.], [-1., -1.], [2., 2.]])
>>> y = np.array([0, 1, 0, 1])
>>> X_train, X_test, y_train, y_test = X[train], X[test], y[train], y[test]

我的dataframe有数千个值，但我想这样存储这些值：

V_-train，V_-test，W_-train，W_-test，X_-train，X_-test，Y_-train，Y_-test，Z_-train，Z_-test您可以执行以下操作：

X = pd.DataFrame() # here should be your initial DataFrame with more than 5 rows
kf = KFold(n_splits=5)

((V_train_ids, V_test_ids), 
 (W_train_ids, W_test_ids),
 (X_train_ids, X_test_ids), 
 (Y_train_ids, Y_test_ids), 
 (Z_train_ids, Z_test_ids)) = list(kf.split(X))

编辑：

之后，您将获得指定折叠的列车和测试部件的索引。要获取列车和测试对象，您可以通过以下索引访问它们：

((V_train, V_test), 
 (W_train, W_test),
 (X_train, X_test), 
 (Y_train, Y_test), 
 (Z_train, Z_test)) = ((X[V_train_ids], X[V_test_ids]),
                       (X[W_train_ids], X[W_test_ids]),
                       (X[X_train_ids], X[X_test_ids]),
                       (X[Y_train_ids], X[Y_test_ids]),
                       (X[Z_train_ids], X[Z_test_ids]))

... 对不起，你的问题是什么？该代码演示了如何做到这一点。只需循环你的折叠并将结果存储在某个容器中。我是编程新手，所以我知道我需要做什么，但我不知道如何编程。