从2D矩阵中随机选取样本,并将索引保存在python中
我有一个带有python数据的numpy 2D矩阵,我想通过保留25%的初始样本来执行下采样。为此,我使用以下random.randint功能:从2D矩阵中随机选取样本,并将索引保存在python中,python,numpy,matrix,Python,Numpy,Matrix,我有一个带有python数据的numpy 2D矩阵,我想通过保留25%的初始样本来执行下采样。为此,我使用以下random.randint功能: reduced_train_face = face_train[np.random.randint(face_train.shape[0], size=300), :] 然而,我有第二个矩阵,其中包含与面相关联的标签,我想用同样的方法减少。如何保留简化矩阵中的索引并将其应用于列车lbls矩阵?为什么不保留选定的索引并使用它们从两个矩阵中选择数据 im
reduced_train_face = face_train[np.random.randint(face_train.shape[0], size=300), :]
然而,我有第二个矩阵,其中包含与面相关联的标签,我想用同样的方法减少。如何保留简化矩阵中的索引并将其应用于列车lbls矩阵?为什么不保留选定的索引并使用它们从两个矩阵中选择数据
import numpy as np
# setting up matrices
np.random.seed(1234) # make example repeatable
# the seeding is optional, only for the showing the
# same results as below!
face_train = np.random.rand(8,3)
train_lbls= np.random.rand(8)
print('face_train:\n', face_train)
print('labels:\n', train_lbls)
# Setting the random indexes
random_idxs= np.random.randint(face_train.shape[0], size=4)
print('random_idxs:\n', random_idxs)
# Using the indexes to slice the matrixes
reduced_train_face = face_train[random_idxs, :]
reduced_labels = train_lbls[random_idxs]
print('reduced_train_face:\n', reduced_train_face)
print('reduced_labels:\n', reduced_labels)
作为输出提供:
face_train:
[[ 0.19151945 0.62210877 0.43772774]
[ 0.78535858 0.77997581 0.27259261]
[ 0.27646426 0.80187218 0.95813935]
[ 0.87593263 0.35781727 0.50099513]
[ 0.68346294 0.71270203 0.37025075]
[ 0.56119619 0.50308317 0.01376845]
[ 0.77282662 0.88264119 0.36488598]
[ 0.61539618 0.07538124 0.36882401]]
labels:
[ 0.9331401 0.65137814 0.39720258 0.78873014 0.31683612 0.56809865
0.86912739 0.43617342]
random_idxs:
[1 7 5 4]
reduced_train_face:
[[ 0.78535858 0.77997581 0.27259261]
[ 0.61539618 0.07538124 0.36882401]
[ 0.56119619 0.50308317 0.01376845]
[ 0.68346294 0.71270203 0.37025075]]
reduced_labels:
[ 0.65137814 0.43617342 0.56809865 0.31683612]
您可以在应用提取之前修复种子:
import numpy as np
# Each labels correspond to the first element of each line of face_train
labels_train = np.array(range(0,15,3))
face_train = np.array(range(15)).reshape(5,3)
np.random.seed(0)
reduced_train_face = face_train[np.random.randint(face_train.shape[0], size=3), :]
np.random.seed(0)
reduced_train_labels = labels_train[np.random.randint(labels_train.shape[0], size=3)]
print(reduced_train_face, reduced_train_labels)
# [[12, 13, 14], [ 0, 1, 2], [ 9, 10, 11]], [12, 0, 9]
用同样的种子,它将以同样的方式减少
编辑:我建议您使用,以确保您只选择一次而不是两次相同的数据而且标签矩阵是1D的,因此我猜.shape(0)没有用处,对吗?@konstantin,事实上最好是将Gabe建议与
np.random.choice
一起使用,而不是np.random.random
(请参见我的编辑)如果不使用形状[0]
,它将给出一个错误,因为它返回长度为x的(x,)
。