Numpy 如何将数据集拆分为训练集、测试集和交叉验证集？_Numpy_Partitioning_Cross Validation_Indices_Numpy Ndarray

Numpy 如何将数据集拆分为训练集、测试集和交叉验证集？

numpy

Numpy 如何将数据集拆分为训练集、测试集和交叉验证集？,numpy,partitioning,cross-validation,indices,numpy-ndarray,Numpy,Partitioning,Cross Validation,Indices,Numpy Ndarray,首先，我规范化了1000x20数组中的数值数据，然后创建了另一个数组，其中包含规范化数据行索引的随机排列。如何将此新阵列拆分为培训、交叉验证和测试集 In[150]: row_indices = np.random.permutation(X_norm.shape[0]) In[151]: # Create a Training Set - 60 percent of data - 600x20 X_train = # Create a

首先，我规范化了1000x20数组中的数值数据，然后创建了另一个数组，其中包含规范化数据行索引的随机排列。如何将此新阵列拆分为培训、交叉验证和测试集

    In[150]:
    row_indices = np.random.permutation(X_norm.shape[0])

    In[151]:  
    # Create a Training Set - 60 percent of data - 600x20
    X_train = 

    # Create a Cross Validation Set - 20 percent - 200x20
    X_crossVal = 

    # Create a Test Set - 20 percent - 200x20
    X_test = 

    # If you performed the above calculations correctly, then X_train 
    # should have 600 rows and 20 columns, X_crossVal should have 200 rows 
    # and 20 columns, and X_test should have 200 rows and 20 columns. You 
    # can verify this by filling the code below:

    In[152]:
    # Print the shape of X_train
    X_train.shape

    # Print the shape of X_crossVal


    # Print the shape of X_test

请原谅我在堆栈溢出方面有多么糟糕。

您可以使用

np.split

将数据拆分为预定义大小的块：

X_train, X_crossVal, X_test = np.split(row_indices, [600, 800])

创建交叉验证集创建一个测试集还应确保在打印时使用：

print(X_train.shape)

打印X_train时，我仍然得到（20，）而不是（600,20）。shapeDoes

row_索引

包含1000x20数组的每个元素索引？如果是这样，

X_-train

、

X_-crossVal

和

X_-test

实际上包含数组的索引（而不是元素本身。这是家庭作业吗？您需要拆分无序索引，然后使用每个拆分从

X_-norm

中选择行……例如，do

X_-norm[train-index]

拆分后，您可以使用其他库吗？如果可以，则可以非常容易地使用。

X_crossVal = X_norm[row_indices[600:800]]

X_test = X_norm[row_indices[800:1000]]

print(X_train.shape)