Python 在两个数组中查找公共值的索引_Python_Arrays_Performance_Numpy_Indices

Python 在两个数组中查找公共值的索引

python arrays performance numpy

Python 在两个数组中查找公共值的索引,python,arrays,performance,numpy,indices,Python,Arrays,Performance,Numpy,Indices,我正在使用Python 2.7。我有两个数组，A和B。为了找到A中存在于B中的元素的索引，我可以 A_inds = np.in1d(A,B) 我还想得到A中存在的B中元素的索引，也就是说，B中的索引是我使用上述代码找到的相同重叠元素目前，我再次运行同一行，如下所示： B_inds = np.in1d(B,A) 但这种额外的计算似乎没有必要。是否有一种计算效率更高的方法可以同时获得a_inds和B_inds 我愿意使用列表或数组方法。并且可以一起使用来解决它- def unq_searc

我正在使用Python 2.7。我有两个数组，A和B。为了找到A中存在于B中的元素的索引，我可以

A_inds = np.in1d(A,B)

我还想得到A中存在的B中元素的索引，也就是说，B中的索引是我使用上述代码找到的相同重叠元素

目前，我再次运行同一行，如下所示：

B_inds = np.in1d(B,A)

但这种额外的计算似乎没有必要。是否有一种计算效率更高的方法可以同时获得

a_inds

和

B_inds

我愿意使用列表或数组方法。

并且可以一起使用来解决它-

def unq_searchsorted(A,B):

    # Get unique elements of A and B and the indices based on the uniqueness
    unqA,idx1 = np.unique(A,return_inverse=True)
    unqB,idx2 = np.unique(B,return_inverse=True)

    # Create mask equivalent to np.in1d(A,B) and np.in1d(B,A) for unique elements
    mask1 = (np.searchsorted(unqB,unqA,'right') - np.searchsorted(unqB,unqA,'left'))==1
    mask2 = (np.searchsorted(unqA,unqB,'right') - np.searchsorted(unqA,unqB,'left'))==1

    # Map back to all non-unique indices to get equivalent of np.in1d(A,B), 
    # np.in1d(B,A) results for non-unique elements
    return mask1[idx1],mask2[idx2]

运行时测试和验证结果-

In [233]: def org_app(A,B):
     ...:     return np.in1d(A,B), np.in1d(B,A)
     ...: 

In [234]: A = np.random.randint(0,10000,(10000))
     ...: B = np.random.randint(0,10000,(10000))
     ...: 

In [235]: np.allclose(org_app(A,B)[0],unq_searchsorted(A,B)[0])
Out[235]: True

In [236]: np.allclose(org_app(A,B)[1],unq_searchsorted(A,B)[1])
Out[236]: True

In [237]: %timeit org_app(A,B)
100 loops, best of 3: 7.69 ms per loop

In [238]: %timeit unq_searchsorted(A,B)
100 loops, best of 3: 5.56 ms per loop

如果这两个输入阵列已排序和唯一，则性能将大幅提升。因此，解函数将简化为-

def unq_searchsorted_v1(A,B):
    out1 = (np.searchsorted(B,A,'right') - np.searchsorted(B,A,'left'))==1
    out2 = (np.searchsorted(A,B,'right') - np.searchsorted(A,B,'left'))==1  
    return out1,out2

后续运行时测试-

In [275]: A = np.random.randint(0,100000,(20000))
     ...: B = np.random.randint(0,100000,(20000))
     ...: A = np.unique(A)
     ...: B = np.unique(B)
     ...: 

In [276]: np.allclose(org_app(A,B)[0],unq_searchsorted_v1(A,B)[0])
Out[276]: True

In [277]: np.allclose(org_app(A,B)[1],unq_searchsorted_v1(A,B)[1])
Out[277]: True

In [278]: %timeit org_app(A,B)
100 loops, best of 3: 8.83 ms per loop

In [279]: %timeit unq_searchsorted_v1(A,B)
100 loops, best of 3: 4.94 ms per loop

一个简单的多处理实现将使您获得更快的速度：

导入时间
将numpy作为np导入
从多处理导入进程，队列
a=np.random.randint（0,20,1000000）
b=np.random.randint（0,20,1000000）
def原件（a、b、q）：
q、 放（np.INAD（a，b））
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu'：
t0=时间。时间（）
q=队列（）
q2=队列（）
p=过程（目标=原始，参数=（a、b、q）
p2=流程（目标=原始，参数=（b、a、q2））
p、 开始（）
p2.start（）
res=q.get（）
res2=q2.get（）
打印时间。时间（）-t0
>>> 0.21398806572

Divakar的

unq\u searchsorted（A，B）

方法在我的机器上花费了0.271834135056秒。

输入数组大小是多少？它们是1D吗？大的。10^6或10^7的顺序。这些数组是否具有唯一的元素？它们被排序了吗？不幸的是，没有。有许多重复的元素-大约占数组的5-10%。是的，它们是一维的。元素没有严格排序。事实上，它们是元组。也许我应该早些提到这一点。谢谢你，这肯定会有用的。目前，虽然我正在寻找单核上最快的方法，因为稍后我将在多个核上分发整个代码。这可以扩展到3个阵列吗？（甚至是n个数组？）@hm8我认为一个新问题很合适，因为它看起来不像是一个简单的扩展。