在Python中查找重复矩阵？_Python_Numpy_Matrix_Duplicates_Vectorization

在Python中查找重复矩阵？

python numpy matrix

在Python中查找重复矩阵？,python,numpy,matrix,duplicates,vectorization,Python,Numpy,Matrix,Duplicates,Vectorization,我有一个矩阵a.shape:（80000,38,38）。我想检查一下，在第一维度上是否有重复的或类似的（38,38）矩阵（在本例中，这些矩阵有80000个）我可以为循环运行两个： for i in range(a.shape[0]): for g in range(a.shape[0]): if a[i,:,:] - a[g,:,:] < tolerance: # save the index here 对于范围内的i（a.shape[0

我有一个矩阵

a.shape:（80000,38,38）

。我想检查一下，在第一维度上是否有重复的或类似的

（38,38）

矩阵（在本例中，这些矩阵有80000个）

我可以为循环运行两个

：
for i in range(a.shape[0]):
    for g in range(a.shape[0]):
        if a[i,:,:] - a[g,:,:] < tolerance:
            # save the index here

对于范围内的i（a.shape[0]）：
对于范围内的g（a.形状[0]）：
如果a[i，：，：]-a[g，：，：，：]<公差：
#在这里保存索引

但这似乎效率极低。我知道有numpy.unique，但我不确定我是否理解当你有一组二维矩阵时它是如何工作的
如何有效地做到这一点？有没有办法让广播找到所有矩阵中所有元素的差异？检测精确的重复块
这里有一种使用-

这里有另一种方法，将轴=（1,2）
中的每个元素块视为索引元组，以找出其他块之间的唯一性-
# Reshape a to a 2D as required in few places later on
ar = a.reshape(a.shape[0],-1)

# Get dimension shape considering each block in axes(1,2) as an indexing tuple
dims = np.append(1,(ar[:,:-1].max(0)+1).cumprod())

# Finally get unique indexing tuples' indices that represent unique
# indices along first axis for indexing into input array and thus get 
# the desired output of unique blocks along the axes(1,2)
out = a[np.unique(ar.dot(dims),return_index=True)[1]]


样本运行-
In [267]: a
Out[267]: 
array([[[12,  4],
        [ 0,  1]],

       [[ 2,  4],
        [ 3,  2]],

       [[13,  4],
        [ 0,  1]],

       [[ 3,  4],
        [ 1,  3]],

       [[ 2,  4],
        [ 3,  2]],

       [[12,  5],
        [ 1,  1]]])

In [268]: tolerance = 2

In [269]: R,C = np.triu_indices(a.shape[0],1)
     ...: mask = (np.abs(a[R] - a[C]) < tolerance).all(axis=(1,2))
     ...: I,G = R[mask], C[mask]
     ...: 

In [270]: I
Out[270]: array([0, 0, 1, 2])

In [271]: G
Out[271]: array([2, 5, 4, 5])

1] 输入：
In [151]: a
Out[151]: 
array([[[12,  4],
        [ 0,  1]],

       [[ 2,  4],
        [ 3,  2]],

       [[12,  4],
        [ 0,  1]],

       [[ 3,  4],
        [ 1,  3]],

       [[ 2,  4],
        [ 3,  2]],

       [[ 3,  0],
        [ 2,  1]]])

2] 输出：
In [152]: ar = a.reshape(a.shape[0],-1)
     ...: sortidx = np.lexsort(ar.T)
     ...: 

In [153]: a[sortidx][np.append(True,(np.diff(ar[sortidx],axis=0)!=0).any(1))]
Out[153]: 
array([[[12,  4],
        [ 0,  1]],

       [[ 3,  0],
        [ 2,  1]],

       [[ 2,  4],
        [ 3,  2]],

       [[ 3,  4],
        [ 1,  3]]])

In [154]: dims = np.append(1,(ar[:,:-1].max(0)+1).cumprod())

In [155]: a[np.unique(ar.dot(dims),return_index=True)[1]]
Out[155]: 
array([[[12,  4],
        [ 0,  1]],

       [[ 3,  0],
        [ 2,  1]],

       [[ 2,  4],
        [ 3,  2]],

       [[ 3,  4],
        [ 1,  3]]])

检测相似块
对于相似性标准，假设您指的是（a[i，：，：]-a[g，：，：]）的绝对值。all（）
，下面是一种矢量化方法，用于获取输入数组中沿轴（1,2）
的所有相似块的索引-
R,C = np.triu_indices(a.shape[0],1)
mask = (np.abs(a[R] - a[C]) < tolerance).all(axis=(1,2))
I,G = R[mask], C[mask]

R，C=np.triu_指数（a.shape[0]，1）
掩码=（np.abs（a[R]-a[C]）<公差。全部（轴=（1,2））
一、 G=R[mask]，C[mask]

样本运行-
In [267]: a
Out[267]: 
array([[[12,  4],
        [ 0,  1]],

       [[ 2,  4],
        [ 3,  2]],

       [[13,  4],
        [ 0,  1]],

       [[ 3,  4],
        [ 1,  3]],

       [[ 2,  4],
        [ 3,  2]],

       [[12,  5],
        [ 1,  1]]])

In [268]: tolerance = 2

In [269]: R,C = np.triu_indices(a.shape[0],1)
     ...: mask = (np.abs(a[R] - a[C]) < tolerance).all(axis=(1,2))
     ...: I,G = R[mask], C[mask]
     ...: 

In [270]: I
Out[270]: array([0, 0, 1, 2])

In [271]: G
Out[271]: array([2, 5, 4, 5])

[267]中的：a
出[267]：
数组（[[12,4]，
[ 0,  1]],
[[ 2,  4],
[ 3,  2]],
[[13,  4],
[ 0,  1]],
[[ 3,  4],
[ 1,  3]],
[[ 2,  4],
[ 3,  2]],
[[12,  5],
[ 1,  1]]])
[268]中：公差=2
[269]中：R，C=np.triu_指数（a.shape[0]，1）
…：掩码=（np.abs（a[R]-a[C]）<公差.all（轴=（1,2））
…：I，G=R[mask]，C[mask]
...: 
In[270]：I
Out[270]：数组（[0,0,1,2]）
In[271]：G
Out[271]：数组（[2,5,4,5]）