在Python中查找重复矩阵?
我有一个矩阵在Python中查找重复矩阵?,python,numpy,matrix,duplicates,vectorization,Python,Numpy,Matrix,Duplicates,Vectorization,我有一个矩阵a.shape:(80000,38,38)。我想检查一下,在第一维度上是否有重复的或类似的(38,38)矩阵(在本例中,这些矩阵有80000个) 我可以为循环运行两个: for i in range(a.shape[0]): for g in range(a.shape[0]): if a[i,:,:] - a[g,:,:] < tolerance: # save the index here 对于范围内的i(a.shape[0
a.shape:(80000,38,38)
。我想检查一下,在第一维度上是否有重复的或类似的(38,38)
矩阵(在本例中,这些矩阵有80000个)
我可以为循环运行两个:
for i in range(a.shape[0]):
for g in range(a.shape[0]):
if a[i,:,:] - a[g,:,:] < tolerance:
# save the index here
对于范围内的i(a.shape[0]):
对于范围内的g(a.形状[0]):
如果a[i,:,:]-a[g,:,:,:]<公差:
#在这里保存索引
但这似乎效率极低。我知道有numpy.unique,但我不确定我是否理解当你有一组二维矩阵时它是如何工作的
如何有效地做到这一点?有没有办法让广播找到所有矩阵中所有元素的差异?检测精确的重复块
这里有一种使用-
这里有另一种方法,将轴=(1,2)
中的每个元素块视为索引元组,以找出其他块之间的唯一性-
# Reshape a to a 2D as required in few places later on
ar = a.reshape(a.shape[0],-1)
# Get dimension shape considering each block in axes(1,2) as an indexing tuple
dims = np.append(1,(ar[:,:-1].max(0)+1).cumprod())
# Finally get unique indexing tuples' indices that represent unique
# indices along first axis for indexing into input array and thus get
# the desired output of unique blocks along the axes(1,2)
out = a[np.unique(ar.dot(dims),return_index=True)[1]]
样本运行-
In [267]: a
Out[267]:
array([[[12, 4],
[ 0, 1]],
[[ 2, 4],
[ 3, 2]],
[[13, 4],
[ 0, 1]],
[[ 3, 4],
[ 1, 3]],
[[ 2, 4],
[ 3, 2]],
[[12, 5],
[ 1, 1]]])
In [268]: tolerance = 2
In [269]: R,C = np.triu_indices(a.shape[0],1)
...: mask = (np.abs(a[R] - a[C]) < tolerance).all(axis=(1,2))
...: I,G = R[mask], C[mask]
...:
In [270]: I
Out[270]: array([0, 0, 1, 2])
In [271]: G
Out[271]: array([2, 5, 4, 5])
1] 输入:
In [151]: a
Out[151]:
array([[[12, 4],
[ 0, 1]],
[[ 2, 4],
[ 3, 2]],
[[12, 4],
[ 0, 1]],
[[ 3, 4],
[ 1, 3]],
[[ 2, 4],
[ 3, 2]],
[[ 3, 0],
[ 2, 1]]])
2] 输出:
In [152]: ar = a.reshape(a.shape[0],-1)
...: sortidx = np.lexsort(ar.T)
...:
In [153]: a[sortidx][np.append(True,(np.diff(ar[sortidx],axis=0)!=0).any(1))]
Out[153]:
array([[[12, 4],
[ 0, 1]],
[[ 3, 0],
[ 2, 1]],
[[ 2, 4],
[ 3, 2]],
[[ 3, 4],
[ 1, 3]]])
In [154]: dims = np.append(1,(ar[:,:-1].max(0)+1).cumprod())
In [155]: a[np.unique(ar.dot(dims),return_index=True)[1]]
Out[155]:
array([[[12, 4],
[ 0, 1]],
[[ 3, 0],
[ 2, 1]],
[[ 2, 4],
[ 3, 2]],
[[ 3, 4],
[ 1, 3]]])
检测相似块
对于相似性标准,假设您指的是(a[i,:,:]-a[g,:,:])的绝对值。all()
,下面是一种矢量化方法,用于获取输入数组中沿轴(1,2)
的所有相似块的索引-
R,C = np.triu_indices(a.shape[0],1)
mask = (np.abs(a[R] - a[C]) < tolerance).all(axis=(1,2))
I,G = R[mask], C[mask]
R,C=np.triu_指数(a.shape[0],1)
掩码=(np.abs(a[R]-a[C])<公差。全部(轴=(1,2))
一、 G=R[mask],C[mask]
样本运行-
In [267]: a
Out[267]:
array([[[12, 4],
[ 0, 1]],
[[ 2, 4],
[ 3, 2]],
[[13, 4],
[ 0, 1]],
[[ 3, 4],
[ 1, 3]],
[[ 2, 4],
[ 3, 2]],
[[12, 5],
[ 1, 1]]])
In [268]: tolerance = 2
In [269]: R,C = np.triu_indices(a.shape[0],1)
...: mask = (np.abs(a[R] - a[C]) < tolerance).all(axis=(1,2))
...: I,G = R[mask], C[mask]
...:
In [270]: I
Out[270]: array([0, 0, 1, 2])
In [271]: G
Out[271]: array([2, 5, 4, 5])
[267]中的:a
出[267]:
数组([[12,4],
[ 0, 1]],
[[ 2, 4],
[ 3, 2]],
[[13, 4],
[ 0, 1]],
[[ 3, 4],
[ 1, 3]],
[[ 2, 4],
[ 3, 2]],
[[12, 5],
[ 1, 1]]])
[268]中:公差=2
[269]中:R,C=np.triu_指数(a.shape[0],1)
…:掩码=(np.abs(a[R]-a[C])<公差.all(轴=(1,2))
…:I,G=R[mask],C[mask]
...:
In[270]:I
Out[270]:数组([0,0,1,2])
In[271]:G
Out[271]:数组([2,5,4,5])