Python 如何查找numpy二维数组中与特定列表匹配的所有元素？_Python_Arrays_Performance_Numpy_Vectorization

Python 如何查找numpy二维数组中与特定列表匹配的所有元素？

python arrays performance numpy

Python 如何查找numpy二维数组中与特定列表匹配的所有元素？,python,arrays,performance,numpy,vectorization,Python,Arrays,Performance,Numpy,Vectorization,我有一个二维NumPy数组，例如： array([[1, 1, 0, 2, 2], [1, 1, 0, 2, 0], [0, 0, 0, 0, 0], [3, 3, 0, 4, 4], [3, 3, 0, 4, 4]]) 我希望从该数组中获取特定列表中的所有元素，例如（1、3、4）。示例案例中的预期结果为： array([[1, 1, 0, 0, 0], [1, 1, 0, 0, 0], [0, 0, 0, 0

我有一个二维NumPy数组，例如：

array([[1, 1, 0, 2, 2],
       [1, 1, 0, 2, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

我希望从该数组中获取特定列表中的所有元素，例如（1、3、4）。示例案例中的预期结果为：

array([[1, 1, 0, 0, 0],
       [1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

我知道我可以（按照这里的建议）：

，但这仅在示例案例中合理有效。事实上，迭代使用for loop和numpy.logical\u会非常慢，因为可能的值列表是以千为单位的（numpy数组的维数大约为1000 x 1000）。

您可以使用-

此外，还可以使用-

np.where(np.in1d(A,[1,3,4]).reshape(A.shape),A,0)

您还可以通过使用其可选的

'side'

参数来查找此类匹配，输入为

left

和

right

，注意，对于匹配，searchsorted将使用这两个输入输出不同的结果。因此，d（A，[1,3,4]）的一个等价物是-

M = np.searchsorted([1,3,4],A.ravel(),'left') != \
    np.searchsorted([1,3,4],A.ravel(),'right')

out = A*M.reshape(A.shape)

因此，最终输出将是-

M = np.searchsorted([1,3,4],A.ravel(),'left') != \
    np.searchsorted([1,3,4],A.ravel(),'right')

out = A*M.reshape(A.shape)

请注意，如果输入搜索列表未排序，则需要在

np.searchsorted

中使用可选参数

sorter

及其

argsort

索引

样本运行-

In [321]: A
Out[321]: 
array([[1, 1, 0, 2, 2],
       [1, 1, 0, 2, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

In [322]: A*np.in1d(A,[1,3,4]).reshape(A.shape)
Out[322]: 
array([[1, 1, 0, 0, 0],
       [1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

In [323]: np.where(np.in1d(A,[1,3,4]).reshape(A.shape),A,0)
Out[323]: 
array([[1, 1, 0, 0, 0],
       [1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

In [324]: M = np.searchsorted([1,3,4],A.ravel(),'left') != \
     ...:     np.searchsorted([1,3,4],A.ravel(),'right')
     ...: A*M.reshape(A.shape)
     ...: 
Out[324]: 
array([[1, 1, 0, 0, 0],
       [1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

运行时测试和验证输出-

In [309]: # Inputs
     ...: A = np.random.randint(0,1000,(400,500))
     ...: lst = np.sort(np.random.randint(0,1000,(100))).tolist()
     ...: 
     ...: def func1(A,lst):                         
     ...:   return A*np.in1d(A,lst).reshape(A.shape)
     ...: 
     ...: def func2(A,lst):                         
     ...:   return np.where(np.in1d(A,lst).reshape(A.shape),A,0)
     ...: 
     ...: def func3(A,lst):                         
     ...:   mask = np.searchsorted(lst,A.ravel(),'left') != \
     ...:          np.searchsorted(lst,A.ravel(),'right')
     ...:   return A*mask.reshape(A.shape)
     ...: 

In [310]: np.allclose(func1(A,lst),func2(A,lst))
Out[310]: True

In [311]: np.allclose(func1(A,lst),func3(A,lst))
Out[311]: True

In [312]: %timeit func1(A,lst)
10 loops, best of 3: 30.9 ms per loop

In [313]: %timeit func2(A,lst)
10 loops, best of 3: 30.9 ms per loop

In [314]: %timeit func3(A,lst)
10 loops, best of 3: 28.6 ms per loop

使用：

如名称所示，in1d在展平阵列上运行，因此您需要在运行后进行重塑。我使用了np.where（…）变体，因为它是最直观的理解。非常感谢。
np.in1d(arr, [1,3,4]).reshape(arr.shape)