Python 将多维数组中的元素映射到其索引_Python_Arrays_Numpy_Indexing

Python 将多维数组中的元素映射到其索引

python arrays numpy indexing

Python 将多维数组中的元素映射到其索引,python,arrays,numpy,indexing,Python,Arrays,Numpy,Indexing,我正在使用函数get_tupleslength，total from 要生成给定长度和和和的所有元组的数组，下面显示了一个示例和函数。创建数组之后，我需要找到一种方法来返回数组中给定数量元素的索引。通过将数组更改为列表，我可以使用.index实现这一点，如下所示。但是，此解决方案或另一个解决方案也基于搜索，例如使用np.where，查找索引需要花费大量时间。由于示例中数组s中的所有元素都不同，我想知道我们是否可以构造一对一映射，即函数，以便给定数组中的元素，它通过对该元素的值进行加法和乘法返回元

我正在使用函数get_tupleslength，total from 要生成给定长度和和和的所有元组的数组，下面显示了一个示例和函数。创建数组之后，我需要找到一种方法来返回数组中给定数量元素的索引。通过将数组更改为列表，我可以使用.index实现这一点，如下所示。但是，此解决方案或另一个解决方案也基于搜索，例如使用np.where，查找索引需要花费大量时间。由于示例中数组s中的所有元素都不同，我想知道我们是否可以构造一对一映射，即函数，以便给定数组中的元素，它通过对该元素的值进行加法和乘法返回元素的索引。如果可能的话，有什么想法吗？谢谢

将numpy作为np导入 def get_tupleslength，总计：如果长度==1：总产量，回来对于范围为Total+1的i：对于get_tupleslength-1中的t，总计-i：收益率i+t 实例 s=np.arraylistget\u tuples4，20 数组s 在[1]中：s 出[1]：数组[[0,0,0,20]， [ 0, 0, 1, 19], [ 0, 0, 2, 18], ..., [19, 0, 1, 0], [19, 1, 0, 0], [20, 0, 0, 0]] 要查找其索引的元素的示例。注意，实际上这是1000多个元素元素\u to \u find=np.数组[[0,0,0,20]， [ 0, 0, 7, 13], [ 0, 5, 5, 10], [ 0, 0, 5, 15], [ 0, 2, 4, 14]] 将数组更改为列表 s_列表=s.tolist 找到索引 indx=[s_list.indexi for i in elements_to_find.tolist] 输出 In[2]：indx 输出[2]：[0,7100,5,45]

您可以使用二进制搜索来加快搜索速度

二进制搜索使搜索成为Ologn，而不是使用索引

我们不需要对元组进行排序，因为它们已经由生成器进行了排序

import bisect

def get_tuples(length, total):
  " Generates tuples "
  if length == 1:
    yield (total,)
    return

  yield from ((i,) + t for i in range(total + 1) for t in get_tuples(length - 1, total - i))

def find_indexes(x, indexes):
   if len(indexes) > 100:
        # Faster to generate all indexes when we have a large
        # number to check
        d = dict(zip(x, range(len(x))))
        return [d[tuple(i)] for i in indexes]
    else:
        return [bisect.bisect_left(x, tuple(i)) for i in indexes]

# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))

# Tuples are generated in sorted order [(0,0,0,20), ...(20,0,0,0)]
# which allows binary search to be used
indexes = [[ 0,  0,  0, 20],
           [ 0,  0,  7, 13],
           [ 0,  5,  5, 10],
           [ 0,  0,  5, 15],
           [ 0,  2,  4, 14]]

y = find_indexes(x, indexes)
print('Found indexes:', *y)
print('Indexes & Tuples:')
for i in y:
  print(i, x[i])

输出

Found indexes: 0 7 100 5 45
Indexes & Tuples:
0 (0, 0, 0, 20)
7 (0, 0, 7, 13)
100 (0, 5, 5, 10)
5 (0, 0, 5, 15)
45 (0, 2, 4, 14)

演出

场景1-已经计算了元组，我们只想找到某些元组的索引

例如，x=listget\u tuples4，20已经被执行

寻找

indexes = [[ 0,  0,  0, 20],
           [ 0,  0,  7, 13],
           [ 0,  5,  5, 10],
           [ 0,  0,  5, 15],
           [ 0,  2,  4, 14]]

二进制搜索

%timeit find_indexes(x, indexes)
100000 loops, best of 3: 11.2 µs per loop

使用Panzer方法仅基于元组计算索引

%timeit get_idx(indexes)
10000 loops, best of 3: 92.7 µs per loop

在这种情况下，当已经预先计算元组时，二进制搜索速度将提高约8倍

%%timeit
import bisect

def find_indexes(x, t):
    " finds the index of each tuple in list t (assumes x is sorted) "
    return [bisect.bisect_left(x, tuple(i)) for i in t]

# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))

indexes = [[ 0,  0,  0, 20],
           [ 0,  0,  7, 13],
           [ 0,  5,  5, 10],
           [ 0,  0,  5, 15],
           [ 0,  2,  4, 14]]

y = find_indexes(x, indexes)

100 loops, best of 3: 2.69 ms per loop

场景2-元组尚未预先计算

%%timeit
import bisect

def find_indexes(x, t):
    " finds the index of each tuple in list t (assumes x is sorted) "
    return [bisect.bisect_left(x, tuple(i)) for i in t]

# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))

indexes = [[ 0,  0,  0, 20],
           [ 0,  0,  7, 13],
           [ 0,  5,  5, 10],
           [ 0,  0,  5, 15],
           [ 0,  2,  4, 14]]

y = find_indexes(x, indexes)

100 loops, best of 3: 2.69 ms per loop

@在这一场景中，装甲车进近的时间与92.97美国相同

=>@PaulPanzer在不必计算元组的情况下，运算速度提高了29倍

场景3-大量索引@PJORR 生成了大量随机索引

x = list(get_tuples(4, 20))
xnp = np.array(x)
indices = xnp[np.random.randint(0,len(xnp), 2000)]
indexes = indices.tolist()
%timeit find_indexes(x, indexes)
#Result: 1000 loops, best of 3: 1.1 ms per loop
%timeit get_idx(indices)
#Result: 1000 loops, best of 3: 716 µs per loop

在这种情况下，我们发现@PaulPanzer的搜索速度快了53%

您可以使用二进制搜索来加快搜索速度

二进制搜索使搜索成为Ologn，而不是使用索引

我们不需要对元组进行排序，因为它们已经由生成器进行了排序

import bisect

def get_tuples(length, total):
  " Generates tuples "
  if length == 1:
    yield (total,)
    return

  yield from ((i,) + t for i in range(total + 1) for t in get_tuples(length - 1, total - i))

def find_indexes(x, indexes):
   if len(indexes) > 100:
        # Faster to generate all indexes when we have a large
        # number to check
        d = dict(zip(x, range(len(x))))
        return [d[tuple(i)] for i in indexes]
    else:
        return [bisect.bisect_left(x, tuple(i)) for i in indexes]

# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))

# Tuples are generated in sorted order [(0,0,0,20), ...(20,0,0,0)]
# which allows binary search to be used
indexes = [[ 0,  0,  0, 20],
           [ 0,  0,  7, 13],
           [ 0,  5,  5, 10],
           [ 0,  0,  5, 15],
           [ 0,  2,  4, 14]]

y = find_indexes(x, indexes)
print('Found indexes:', *y)
print('Indexes & Tuples:')
for i in y:
  print(i, x[i])

输出

Found indexes: 0 7 100 5 45
Indexes & Tuples:
0 (0, 0, 0, 20)
7 (0, 0, 7, 13)
100 (0, 5, 5, 10)
5 (0, 0, 5, 15)
45 (0, 2, 4, 14)

演出

场景1-已经计算了元组，我们只想找到某些元组的索引

例如，x=listget\u tuples4，20已经被执行

寻找

indexes = [[ 0,  0,  0, 20],
           [ 0,  0,  7, 13],
           [ 0,  5,  5, 10],
           [ 0,  0,  5, 15],
           [ 0,  2,  4, 14]]

二进制搜索

%timeit find_indexes(x, indexes)
100000 loops, best of 3: 11.2 µs per loop

使用Panzer方法仅基于元组计算索引

%timeit get_idx(indexes)
10000 loops, best of 3: 92.7 µs per loop

在这种情况下，当已经预先计算元组时，二进制搜索速度将提高约8倍

%%timeit
import bisect

def find_indexes(x, t):
    " finds the index of each tuple in list t (assumes x is sorted) "
    return [bisect.bisect_left(x, tuple(i)) for i in t]

# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))

indexes = [[ 0,  0,  0, 20],
           [ 0,  0,  7, 13],
           [ 0,  5,  5, 10],
           [ 0,  0,  5, 15],
           [ 0,  2,  4, 14]]

y = find_indexes(x, indexes)

100 loops, best of 3: 2.69 ms per loop

场景2-元组尚未预先计算

%%timeit
import bisect

def find_indexes(x, t):
    " finds the index of each tuple in list t (assumes x is sorted) "
    return [bisect.bisect_left(x, tuple(i)) for i in t]

# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))

indexes = [[ 0,  0,  0, 20],
           [ 0,  0,  7, 13],
           [ 0,  5,  5, 10],
           [ 0,  0,  5, 15],
           [ 0,  2,  4, 14]]

y = find_indexes(x, indexes)

100 loops, best of 3: 2.69 ms per loop

@在这一场景中，装甲车进近的时间与92.97美国相同

=>@PaulPanzer在不必计算元组的情况下，运算速度提高了29倍

场景3-大量索引@PJORR 生成了大量随机索引

x = list(get_tuples(4, 20))
xnp = np.array(x)
indices = xnp[np.random.randint(0,len(xnp), 2000)]
indexes = indices.tolist()
%timeit find_indexes(x, indexes)
#Result: 1000 loops, best of 3: 1.1 ms per loop
%timeit get_idx(indices)
#Result: 1000 loops, best of 3: 716 µs per loop

在本例中，我们发现@PaulPanzer的速度快了53%

这里有一个公式，它仅基于元组计算索引，即它不需要看到完整的数组。要计算N元组的索引，需要计算N-1个二项系数。以下实现是部分向量化的，它接受ND数组，但元组必须位于最后一个维度

import numpy as np
from scipy.special import comb

# unfortunately, comb with option exact=True is not vectorized
def bc(N,k):
    return np.round(comb(N,k)).astype(int)

def get_idx(s):
    N = s.shape[-1] - 1
    R = np.arange(1,N)
    ps = s[...,::-1].cumsum(-1)
    B = bc(ps[...,1:-1]+R,1+R)
    return bc(ps[...,-1]+N,N) - ps[...,0] - 1 - B.sum(-1)

# OP's generator
def get_tuples(length, total):
    if length == 1:
        yield (total,)
        return

    for i in range(total + 1):
        for t in get_tuples(length - 1, total - i):
            yield (i,) + t
#example
s = np.array(list(get_tuples(4, 20)))

# compute each index
r = get_idx(s)

# expected: 0,1,2,3,...
assert (r == np.arange(len(r))).all()
print("all ok")

#example of element to find the index for. (Note in reality this is 1000+ elements)
elements_to_find =np.array([[ 0,  0,  0, 20],
                            [ 0,  0,  7, 13],
                            [ 0,  5,  5, 10],
                            [ 0,  0,  5, 15],
                            [ 0,  2,  4, 14]])

print(get_idx(elements_to_find))

样本运行：

all ok
[  0   7 100   5  45]

如何推导公式：

用以表示完整的分区计数部分N，kn是总的，k是长度作为一个单一的二项式系数N+k-1选择k-1

倒数到前面：不难验证，在OP的生成器的外循环的第i次完整迭代之后，partN-i，k还没有被枚举。实际上，剩下的是所有分区p1+p2+…=N，p1>=i；我们可以写p1=q1+i，这样q1+p2+…=N-i和后一个分区是无约束的，所以我们可以使用1。数一数

下面是一个仅基于元组计算索引的公式，即它不需要看到完整的数组。要计算N元组的索引，需要计算N-1个二项系数。以下实现是部分向量化的，它接受ND数组，但元组必须位于最后一个维度

import numpy as np
from scipy.special import comb

# unfortunately, comb with option exact=True is not vectorized
def bc(N,k):
    return np.round(comb(N,k)).astype(int)

def get_idx(s):
    N = s.shape[-1] - 1
    R = np.arange(1,N)
    ps = s[...,::-1].cumsum(-1)
    B = bc(ps[...,1:-1]+R,1+R)
    return bc(ps[...,-1]+N,N) - ps[...,0] - 1 - B.sum(-1)

# OP's generator
def get_tuples(length, total):
    if length == 1:
        yield (total,)
        return

    for i in range(total + 1):
        for t in get_tuples(length - 1, total - i):
            yield (i,) + t
#example
s = np.array(list(get_tuples(4, 20)))

# compute each index
r = get_idx(s)

# expected: 0,1,2,3,...
assert (r == np.arange(len(r))).all()
print("all ok")

#example of element to find the index for. (Note in reality this is 1000+ elements)
elements_to_find =np.array([[ 0,  0,  0, 20],
                            [ 0,  0,  7, 13],
                            [ 0,  5,  5, 10],
                            [ 0,  0,  5, 15],
                            [ 0,  2,  4, 14]])

print(get_idx(elements_to_find))

Sa 示例运行：

如何推导公式：

用以表示完整的分区计数部分N，kn是总的，k是长度作为一个单一的二项式系数N+k-1选择k-1

您可以访问get_元组的输入吗？您是指示例中的参数4和20吗？是的，我设置了。好的，但是为什么你不能用这些信息计算元素的出现顺序呢？你会怎么做？这正是我想要找到的。在数组中给定一个元素时，应该有一个公式，例如[0，0，0，20]返回0，…等等。您有权访问get_元组的输入吗？您是指示例中的参数4和20吗？是的，我设置了。好的，但是为什么你不能用这些信息计算元素的出现顺序呢？你会怎么做？这正是我想要找到的。应该有一个公式，在数组中给定一个元素，例如[0，0，0，20]返回0，…等等。什么是函数find_index？@PJORR谢谢，我不知怎么漏掉了它，因为它只是一行。谢谢！从我的实验中，我发现当索引长度为1000+时，@PaulPanzer方法的速度要快3倍，即使已经执行了x=listget\u tuples4、20。可能这两种方法的使用都应该根据需要查找多少元素来决定。@PJORR。但是，我的测试是使用get_tuples4，20，它有1721个索引。想知道为什么在这种情况下，我的测量结果显示二分法更快。我的测量是使用代码，如Jupyter笔记本所示。你使用的是更新后的算法吗？我删除了排序，因为它是不必要的。我不是说“get_tuples4,20”。我的意思是，尝试上面的索引实验，查找索引的元素的形状是2000,4，而不是5,4。例如：xnp=np.arrayx，index=xnp[np.random.randint0，lenxnp，2000]，index=index.tolist，然后比较%timeit-get\u-idxindices和%timeit-find\u-indexx，index.find是什么函数？@PJORR谢谢，我不知怎么漏掉了它，因为它只是一行。谢谢！从我的实验中，我发现当索引长度为1000+时，@PaulPanzer方法的速度要快3倍，即使已经执行了x=listget\u tuples4、20。可能这两种方法的使用都应该根据需要查找多少元素来决定。@PJORR。但是，我的测试是使用get_tuples4，20，它有1721个索引。想知道为什么在这种情况下，我的测量结果显示二分法更快。我的测量是使用代码，如Jupyter笔记本所示。你使用的是更新后的算法吗？我删除了排序，因为它是不必要的。我不是说“get_tuples4,20”。我的意思是，尝试上面的索引实验，查找索引的元素的形状是2000,4，而不是5,4。例如：xnp=np.arrayx，index=xnp[np.random.randint0，lenxnp，2000]，index=index.tolist，然后比较%timeit get_idxinds和%timeit find_indexx，index。如果对get_idxI进行更多的解释，这可能会更好。你发现，使用binom而不是comb，可以使它稍微快一点。从scipy.special import binom def bcN，k:return np.roundbinomN，k.astypeint可能这会更好，对get_idxI进行更多解释，发现使用binom而不是comb可以使它稍微快一点。从scipy.special import binom def bcN，k：返回np.roundbinomN，k.astypeint