Python 为多阵列实现numpy.INAD的最有效方法_Python_Arrays_Sorting_Numpy_Indexing

Python 为多阵列实现numpy.INAD的最有效方法

python arrays sorting numpy indexing

Python 为多阵列实现numpy.INAD的最有效方法,python,arrays,sorting,numpy,indexing,Python,Arrays,Sorting,Numpy,Indexing,实现函数的最佳方法是什么？该函数接受任意数量的1d数组并返回包含匹配值（如果有）索引的元组下面是我想做的一些伪代码： a = np.array([1, 0, 4, 3, 2]) b = np.array([1, 2, 3, 4, 5]) c = np.array([4, 2]) (ind_a, ind_b, ind_c) = return_equals(a, b, c) # ind_a = [2, 4] # ind_b = [1, 3] # ind_c = [0, 1] (ind_a, i

实现函数的最佳方法是什么？该函数接受任意数量的1d数组并返回包含匹配值（如果有）索引的元组

下面是我想做的一些伪代码：

a = np.array([1, 0, 4, 3, 2])
b = np.array([1, 2, 3, 4, 5])
c = np.array([4, 2])

(ind_a, ind_b, ind_c) = return_equals(a, b, c)
# ind_a = [2, 4]
# ind_b = [1, 3]
# ind_c = [0, 1]

(ind_a, ind_b, ind_c) = return_equals(a, b, c, sorted_by=a)
# ind_a = [2, 4]
# ind_b = [3, 1]
# ind_c = [0, 1]

def return_equals(*args, sorted_by=None):
    ...

首先，我会尝试：

def return_equals(*args):
    x=[]
    c=args[-1]
    for a in args:
        x.append(np.nonzero(np.in1d(a,c))[0])
    return x

如果我添加一个

d=np.array（[1,0,4,3,0]）

（它只有一个匹配项；如果没有匹配项怎么办？）

然后

产生：

[array([2, 4], dtype=int32),
 array([1, 3], dtype=int32),
 array([2], dtype=int32),
 array([0, 1], dtype=int32)]

由于输入数组和返回数组的长度可能不同，因此无法将问题矢量化。也就是说，需要一些特殊的体操才能一次完成所有输入的操作。如果阵列的数量与它们的典型长度相比很小，我就不会担心速度。迭代几次并不昂贵。它迭代了100个值，这很昂贵

当然，您可以将关键字参数传递给一维中的


不清楚您试图如何处理按

参数排序的

参数。在将数组传递给此函数之前，是否可以同样轻松地将其应用于数组

列出此迭代的理解版本：
 [np.nonzero(np.in1d(x,c))[0] for x in [a,b,d,c]]

我可以想象将数组连接成一个较长的数组，在一维中应用，然后将其拆分为子数组。有一个np.split
，但它要求您告诉它每个子列表中要放入多少个元素。这意味着，不知何故，要确定每个参数有多少匹配项。在不循环的情况下这样做可能会很棘手
这方面的部件（仍需作为功能包装）包括：
（2,4,5）
是I
在C
的连续值之间的元素数，即与a
、b
、…
中的每一个匹配的元素数。您可以使用numpy.intersect1d
和reduce
，用于此：
def return_equals(*arrays):
    matched = reduce(np.intersect1d, arrays)
    return np.array([np.where(np.in1d(array, matched))[0] for array in arrays])

reduce
在这里可能有点慢，因为我们在这里创建中间NumPy数组（对于大量输入，它可能非常慢），如果我们使用Python的set
及其.intersection（）
方法，我们可以防止这种情况：
matched = np.array(list(set(arrays[0]).intersection(*arrays[1:])))


相关GitHub票证：
在Python中：
def return_equal(*args):
    rtr=[]
    for i, arr in enumerate(args):
        rtr.append([j for j, e in enumerate(arr) if 
                    all(e in a for a in args[0:i]) and 
                    all(e in a for a in args[i+1:])])
    return rtr    

>>> return_equal(a,b,c) 
[[2, 4], [1, 3], [0, 1]]

该解决方案基本上将所有输入1D
数组连接成一个大的1D
数组，以矢量化方式执行所需操作。它使用循环的唯一地方是在开始时获取输入数组的长度，这在运行时成本上必须是最小的
下面是函数的实现-
import numpy as np

def return_equals(*argv):
    # Concatenate input arrays into one big array for vectorized processing
    A = np.concatenate((argv[:]))

    # lengths of input arrays
    narr = len(argv)
    lens = np.zeros((1,narr),int).ravel()
    for i in range(narr):
        lens[i] = len(argv[i])  

    N = A.size

    # Start indices of each group of identical elements from different input arrays
    # in a sorted version of the huge concatenated input array
    start_idx = np.where(np.append([True],np.diff(np.sort(A))!=0))[0]

    # Runlengths of islands of identical elements
    runlens = np.diff(np.append(start_idx,N))

    # Starting and all indices of the positions in concatenate array that has 
    # islands of identical elements which are present across all input arrays
    good_start_idx = start_idx[runlens==narr]
    good_all_idx = good_start_idx[:,None] + np.arange(narr)

    # Get offsetted indices and sort them to get the desired output
    idx = np.argsort(A)[good_all_idx] - np.append([0],lens[:-1].cumsum())
    return np.sort(idx.T,1)

在这些输入数组中，它是否总是有唯一的值？值不是排序的，而是唯一的。是的。这是故意的非numpy方法吗？奇怪的是，它在计时方面实际上比所有numpy都快，但我不认为这是显而易见的。Ashwini Chaudhary的第二个建议是使用集（a）.intersection（b，c）
也很快，但主要是Python vs all numpy…np.in1d
需要时间排序和唯一的
数组，因此它有开销。特别是对于小型测试，纯列表操作通常比numpy
操作快。如何确定d
呢？d只是另一个示例，给出了一个查找数不是2的情况。我试图概括这个问题。
def return_equal(*args):
    rtr=[]
    for i, arr in enumerate(args):
        rtr.append([j for j, e in enumerate(arr) if 
                    all(e in a for a in args[0:i]) and 
                    all(e in a for a in args[i+1:])])
    return rtr    

>>> return_equal(a,b,c) 
[[2, 4], [1, 3], [0, 1]]

import numpy as np

def return_equals(*argv):
    # Concatenate input arrays into one big array for vectorized processing
    A = np.concatenate((argv[:]))

    # lengths of input arrays
    narr = len(argv)
    lens = np.zeros((1,narr),int).ravel()
    for i in range(narr):
        lens[i] = len(argv[i])  

    N = A.size

    # Start indices of each group of identical elements from different input arrays
    # in a sorted version of the huge concatenated input array
    start_idx = np.where(np.append([True],np.diff(np.sort(A))!=0))[0]

    # Runlengths of islands of identical elements
    runlens = np.diff(np.append(start_idx,N))

    # Starting and all indices of the positions in concatenate array that has 
    # islands of identical elements which are present across all input arrays
    good_start_idx = start_idx[runlens==narr]
    good_all_idx = good_start_idx[:,None] + np.arange(narr)

    # Get offsetted indices and sort them to get the desired output
    idx = np.argsort(A)[good_all_idx] - np.append([0],lens[:-1].cumsum())
    return np.sort(idx.T,1)