Python numpy-如果某个3d数组中的值存在,则返回索引
如何在Numpy做到这一点:谢谢 输入:Python numpy-如果某个3d数组中的值存在,则返回索引,python,arrays,numpy,indexing,collect,Python,Arrays,Numpy,Indexing,Collect,如何在Numpy做到这一点:谢谢 输入: A = np.array([0, 1, 2, 3]) B = np.array([[3, 2, 0], [0, 2, 1], [2, 3, 1], [3, 0, 1]]) 输出: result = [[0, 1, 3], [1, 2, 3], [0, 1, 2], [0, 2, 3]] 在Python中: A = np.array([0 ,1 ,2 ,3]) B = np.array([[3 ,2 ,0], [0 ,2 ,1], [2 ,
A = np.array([0, 1, 2, 3])
B = np.array([[3, 2, 0], [0, 2, 1], [2, 3, 1], [3, 0, 1]])
输出:
result = [[0, 1, 3], [1, 2, 3], [0, 1, 2], [0, 2, 3]]
在Python中:
A = np.array([0 ,1 ,2 ,3])
B = np.array([[3 ,2 ,0], [0 ,2 ,1], [2 ,3 ,1], [3 ,0 ,1]])
result = []
for x , valA in enumerate (A) :
inArray = []
for y , valB in enumerate (B) :
if valA in valB:
inArray.append (y)
result.append (inArray)
print result
# result = [[0, 1, 3], [1, 2, 3], [0, 1, 2], [0, 2, 3]]
方法#1
下面是一个使用-
方法#2
假设代码< A/<代码>和<代码> B/COD>保持正数,我们可以考虑那些在“代码> 2D</代码>网格上表示索引,从而可以认为<代码> B<代码>可以按行保持列索引。一旦<代码> 2D<代码>对应于<代码> b>代码>的网格,我们只需要考虑由<代码> A < /COD>交叉的列。最后,我们在这样一个
2D
网格中得到True
值的索引,从而给出R
和C
值。这应该更节省内存
因此,另一种方法看起来是这样的-
ncols = B.max()+1
nrows = B.shape[0]
mask = np.zeros((nrows,ncols),dtype=bool)
mask[np.arange(nrows)[:,None],B] = 1
mask[:,~np.in1d(np.arange(mask.shape[1]),A)] = 0
R,C = np.where(mask.T)
out = np.split(C,np.flatnonzero(R[1:]>R[:-1])+1)
In [135]: def index_1din2d_initbased_v2(A,B):
...: nrows = B.max()+1
...: ncols = B.shape[0]
...: mask = np.zeros((nrows,ncols),dtype=bool)
...: mask[B,np.arange(ncols)[:,None]] = 1
...: mask[~np.in1d(np.arange(mask.shape[0]),A)] = 0
...: R,C = np.where(mask)
...: out = np.split(C,np.flatnonzero(R[1:]>R[:-1])+1)
...: return out
...:
In [136]: A = np.unique(np.random.randint(0,10000,(400)))
...: B = np.random.randint(0,10000,(400,300))
...:
In [137]: %timeit index_1din2d_initbased(A,B)
10 loops, best of 3: 57.5 ms per loop
In [138]: %timeit index_1din2d_initbased_v2(A,B)
10 loops, best of 3: 25.9 ms per loop
样本运行-
In [43]: A
Out[43]: array([0, 1, 2, 3])
In [44]: B
Out[44]:
array([[3, 2, 0],
[0, 2, 1],
[2, 3, 1],
[3, 0, 1]])
In [45]: out
Out[45]: [array([0, 1, 3]), array([1, 2, 3]), array([0, 1, 2]), array([0, 2, 3])]
运行时测试
按100x
的比例放大数据集大小,下面是一个快速运行时测试结果-
In [85]: def index_1din2d(A,B):
...: R,C = np.where((A[:,None,None] == B).any(-1))
...: out = np.split(C,np.flatnonzero(R[1:]>R[:-1])+1)
...: return out
...:
...: def index_1din2d_initbased(A,B):
...: ncols = B.max()+1
...: nrows = B.shape[0]
...: mask = np.zeros((nrows,ncols),dtype=bool)
...: mask[np.arange(nrows)[:,None],B] = 1
...: mask[:,~np.in1d(np.arange(mask.shape[1]),A)] = 0
...: R,C = np.where(mask.T)
...: out = np.split(C,np.flatnonzero(R[1:]>R[:-1])+1)
...: return out
...:
In [86]: A = np.unique(np.random.randint(0,10000,(400)))
...: B = np.random.randint(0,10000,(400,300))
...:
In [87]: %timeit [np.where((B == x).sum(axis = 1))[0] for x in A]
1 loop, best of 3: 161 ms per loop # @Psidom's soln
In [88]: %timeit index_1din2d(A,B)
10 loops, best of 3: 91.5 ms per loop
In [89]: %timeit index_1din2d_initbased(A,B)
10 loops, best of 3: 33.4 ms per loop
性能进一步提升强>
或者,我们可以在第二种方法中以转置的方式创建2D
网格。这个想法是为了避免R,C=np.where(mask.T)
中的转置,这似乎是瓶颈。因此,第二种方法的修改版本和相关的运行时将如下所示-
ncols = B.max()+1
nrows = B.shape[0]
mask = np.zeros((nrows,ncols),dtype=bool)
mask[np.arange(nrows)[:,None],B] = 1
mask[:,~np.in1d(np.arange(mask.shape[1]),A)] = 0
R,C = np.where(mask.T)
out = np.split(C,np.flatnonzero(R[1:]>R[:-1])+1)
In [135]: def index_1din2d_initbased_v2(A,B):
...: nrows = B.max()+1
...: ncols = B.shape[0]
...: mask = np.zeros((nrows,ncols),dtype=bool)
...: mask[B,np.arange(ncols)[:,None]] = 1
...: mask[~np.in1d(np.arange(mask.shape[0]),A)] = 0
...: R,C = np.where(mask)
...: out = np.split(C,np.flatnonzero(R[1:]>R[:-1])+1)
...: return out
...:
In [136]: A = np.unique(np.random.randint(0,10000,(400)))
...: B = np.random.randint(0,10000,(400,300))
...:
In [137]: %timeit index_1din2d_initbased(A,B)
10 loops, best of 3: 57.5 ms per loop
In [138]: %timeit index_1din2d_initbased_v2(A,B)
10 loops, best of 3: 25.9 ms per loop
组合了
numpy
和列表理解
的选项:
import numpy as np
[np.where((B == x).sum(axis = 1))[0] for x in A]
# [array([0, 1, 3]), array([1, 2, 3]), array([0, 1, 2]), array([0, 2, 3])]