Python 如何从numpy.ndarray数据中排除行/列_Python_Numpy

Python 如何从numpy.ndarray数据中排除行/列

python numpy

Python 如何从numpy.ndarray数据中排除行/列,python,numpy,Python,Numpy,假设我们有一个numpy.ndarray数据，比如说形状（100200），还有一个要从数据中排除的索引列表。你会怎么做？大概是这样的： a = numpy.random.rand(100,200) indices = numpy.random.randint(100,size=20) b = a[-indices,:] # imaginary code, what to replace here? a = numpy.random.rand(100,200) indices = numpy.r

假设我们有一个numpy.ndarray数据，比如说形状（100200），还有一个要从数据中排除的索引列表。你会怎么做？大概是这样的：

a = numpy.random.rand(100,200)
indices = numpy.random.randint(100,size=20)
b = a[-indices,:] # imaginary code, what to replace here?

a = numpy.random.rand(100,200)
indices = numpy.random.randint(100,size=20)
mask = numpy.ones(a.shape, dtype=bool)
mask[indices,:] = False
b = a[mask]

谢谢。

这很难看，但很管用：

b = np.array([a[i] for i in range(m.shape[0]) if i not in indices])

您可以尝试以下方法：

a = numpy.random.rand(100,200)
indices = numpy.random.randint(100,size=20)
b = a[-indices,:] # imaginary code, what to replace here?

a = numpy.random.rand(100,200)
indices = numpy.random.randint(100,size=20)
mask = numpy.ones(a.shape, dtype=bool)
mask[indices,:] = False
b = a[mask]

您可以使用

b=numpy.delete（a，索引，轴=0）

来源：。

您可以尝试：

a = numpy.random.rand(100,200)
indices = numpy.random.randint(100,size=20)
b = a[np.setdiff1d(np.arange(100),indices),:]

这样可以避免创建与中的数据大小相同的

掩码

数组。请注意，本例创建了一个2D数组

，而不是后一个答案中的扁平数组

对这种方法的运行时与内存成本的粗略调查似乎表明，

delete

更快，而使用

setdiff1d

索引更容易消耗内存：

In [75]: %timeit b = np.delete(a, indices, axis=0)
The slowest run took 7.47 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 24.7 µs per loop

In [76]: %timeit c = a[np.setdiff1d(np.arange(100),indices),:]
10000 loops, best of 3: 48.4 µs per loop

In [77]: %memit b = np.delete(a, indices, axis=0)
peak memory: 52.27 MiB, increment: 0.85 MiB

In [78]: %memit c = a[np.setdiff1d(np.arange(100),indices),:]
peak memory: 52.39 MiB, increment: 0.12 MiB

这个解决方案需要一个与我的原始数据大小完全相同的数组，在我的例子中，它是巨大的。此解决方案的时间和空间复杂性为O（n^2），这对于我的数据来说并不实际。这基本上是

np.delete

使用的方法。看看它在哪里构造

keep=one（N，dtype=bool）；保留[obj，]=False

。对于索引的数字列表，

np.delete

使用您先前拒绝的

掩码

解决方案，因为它占用了太多内存。@hpaulj

delete

的文档说：“out:ndarray一份

arr

的副本，其中包含

obj

指定的元素。”你的意思是它使用了一个

numpy.ma

屏蔽数组吗？我听上去不像。不，不是蒙面阵；屏蔽为布尔索引。