Python 从数组中删除所有零_Python_Arrays_Python 2.7_Numpy

Python 从数组中删除所有零

python arrays python-2.7 numpy

Python 从数组中删除所有零,python,arrays,python-2.7,numpy,Python,Arrays,Python 2.7,Numpy,我有一个shape[120000，3]数组，其中只有前1500个元素有用，其他元素是0 这里有一个例子 [15.0, 14.0, 13.0] [11.0, 7.0, 8.0] [4.0, 1.0, 3.0] [0.0, 0.0, 0.0] [0.0, 0.0, 0.0] [0.0, 0.0, 0.0] [0.0, 0.0, 0.0] 我必须找到一种方法来删除所有[0.0,0.0,0.0]元素。我试着写这个，但没用 for point in points: if point[0]

我有一个shape[120000，3]数组，其中只有前1500个元素有用，其他元素是0

这里有一个例子

[15.0, 14.0, 13.0]
[11.0, 7.0, 8.0]
[4.0, 1.0, 3.0]
[0.0, 0.0, 0.0]
[0.0, 0.0, 0.0]
[0.0, 0.0, 0.0]
[0.0, 0.0, 0.0]

我必须找到一种方法来删除所有[0.0,0.0,0.0]元素。我试着写这个，但没用

for point in points:
        if point[0] == 0.0 and point[1] == 0.0 and point[2] == 0.0:
            np.delete(points, point)

编辑

注释中的所有解决方案都有效，但我对我使用的解决方案打了绿色勾。感谢大家。

不要用于循环，因为循环速度很慢。在for循环中重复调用

np.delete

，会导致性能低下

而是创建一个遮罩：

zero_rows = (points == 0).all(1)

这是一个长度为120000的数组，当该行中的所有元素都为0时，这是真的

然后找到第一个这样的行：

first_invalid = np.where(zero_rows)[0][0]

最后，切片阵列：

points[:first_invalid]

简单的迭代解决方案：

y = [i for i in x if i != [0.0, 0.0, 0.0]]

更好的解决方案（Python 3.x）：

输出：

[[15.0, 14.0, 13.0], [11.0, 7.0, 8.0], [4.0, 1.0, 3.0]]

有一些相关的方法，分为两个阵营。您可以通过计算单个布尔数组来使用矢量化方法，也可以使用。或者，您可以通过

for

循环或带有生成器表达式的

next

计算仅包含

元素的第一行的索引

为了提高性能，我建议您与手动

For

循环一起使用。这里有一个示例，但请参见下面的基准测试，以了解更有效的变体：

from numba import jit

@jit(nopython=True)
def trim_enum_nb(A):
    for idx in range(A.shape[0]):
        if (A[idx]==0).all():
            break
    return A[:idx]

绩效基准

测试代码设置

import numpy as np
from numba import jit

np.random.seed(0)

n = 120000
k = 1500

A = np.random.randint(1, 10, (n, 3))
A[k:, :] = 0

功能

def trim_enum_loop(A):
    for idx, row in enumerate(A):
        if (row==0).all():
            break
    return A[:idx]

@jit(nopython=True)
def trim_enum_nb(A):
    for idx in range(A.shape[0]):
        if (A[idx]==0).all():
            break
    return A[:idx]

@jit(nopython=True)
def trim_enum_nb2(A):
    for idx in range(A.shape[0]):
        res = False
        for col in range(A.shape[1]):
            res |= A[idx, col]
            if res:
                break
            return A[:idx]

def trim_enum_gen(A):
    idx = next(idx for idx, row in enumerate(A) if (row==0).all())
    return A[:idx]

def trim_vect(A):
    idx = np.where((A == 0).all(1))[0][0]
    return A[:idx]

def trim_searchsorted(A):
    B = np.frombuffer(A, 'S12')
    idx = A.shape[0] - np.searchsorted(B[::-1], B[-1:], 'right')[0]
    return A[:idx]

# check all results are the same
assert (trim_vect(A) == trim_enum_loop(A)).all()
assert (trim_vect(A) == trim_enum_nb(A)).all()
assert (trim_vect(A) == trim_enum_nb2(A)).all()
assert (trim_vect(A) == trim_enum_gen(A)).all()
assert (trim_vect(A) == trim_searchsorted(A)).all()

检查

def trim_enum_loop(A):
    for idx, row in enumerate(A):
        if (row==0).all():
            break
    return A[:idx]

@jit(nopython=True)
def trim_enum_nb(A):
    for idx in range(A.shape[0]):
        if (A[idx]==0).all():
            break
    return A[:idx]

@jit(nopython=True)
def trim_enum_nb2(A):
    for idx in range(A.shape[0]):
        res = False
        for col in range(A.shape[1]):
            res |= A[idx, col]
            if res:
                break
            return A[:idx]

def trim_enum_gen(A):
    idx = next(idx for idx, row in enumerate(A) if (row==0).all())
    return A[:idx]

def trim_vect(A):
    idx = np.where((A == 0).all(1))[0][0]
    return A[:idx]

def trim_searchsorted(A):
    B = np.frombuffer(A, 'S12')
    idx = A.shape[0] - np.searchsorted(B[::-1], B[-1:], 'right')[0]
    return A[:idx]

# check all results are the same
assert (trim_vect(A) == trim_enum_loop(A)).all()
assert (trim_vect(A) == trim_enum_nb(A)).all()
assert (trim_vect(A) == trim_enum_nb2(A)).all()
assert (trim_vect(A) == trim_enum_gen(A)).all()
assert (trim_vect(A) == trim_searchsorted(A)).all()

知道一切都结束了，我想我会给出我的答案：）

然后可以进行简单的列表理解

[i for i in x if all(i)]

和产出：

[[15.0, 14.0, 13.0],[11.0, 7.0, 8.0],[4.0, 1.0, 3.0]]

接受

0.0000010866 # seconds or 1.0866 microseconds

花点时间吃一克盐，这真的不一致，给我2秒钟时间，以便得到更好的估计

当：

我有时间

0.01199 # seconds

这一时间很大程度上取决于它们是否为0，0快得多，因为它被忽略了。

对于对数复杂性，您可以使用按行强制转换数据后：

B=np.frombuffer(A,'S12')
index=B.size-np.searchsorted(B[::-1],B[-1:],'right')[0]

索引

将是非空项目的数量，如果第一个项目都不是空的

测试：

>>>> %timeit B.size-searchsorted(B[::-1],B[-1:],'right')[0]
2.2 µs

一种简单的迭代解法

这与John Zwinck关于性能的回答相比会很有趣。为什么（Python3.x）解决方案更好（为什么它在Python2中工作得不那么好）？您能解释一下trim_enum_gen中的if吗？.all（）是什么？请参见，和。如果您有一个关于如何在这里使用它们的具体问题，我可以尝试进一步解释。numba for numba：替换

If（a[idx]==0）。all（）：

为`for j in range（3）：\If v[j]=0:\break\如果v[j]==0:\break快四倍；）@B.M.，观点正确，更新。尝试编写嵌套循环需要一段时间。我现在的解决方案似乎比以前的numba快了约100倍！我想你需要

A.shape[0]-np.searchsorted（B[：：-1]，B[-1:，'right'）[0]

。很好的解决方案+1，我还将此解决方案添加到我的帖子计时中，希望你同意。

0.01199 # seconds

B=np.frombuffer(A,'S12')
index=B.size-np.searchsorted(B[::-1],B[-1:],'right')[0]

>>>> %timeit B.size-searchsorted(B[::-1],B[-1:],'right')[0]
2.2 µs

import numpy as np
b = np.empty((0,3), float)
for elem in a:
    toRemove = np.array([0.0, 0.0, 0.0])
    if(not np.array_equal(elem,toRemove)):
        b=np.vstack((b, elem))