Python Numpy使用2D行索引数组进行高级索引，无需广播输出_Python_Arrays_Numpy

Python Numpy使用2D行索引数组进行高级索引，无需广播输出

python arrays numpy

Python Numpy使用2D行索引数组进行高级索引，无需广播输出,python,arrays,numpy,Python,Arrays,Numpy,我有一个带有ndim 3的ndarray数组，还有一些带有ndim 2的索引ndarrayidxs，它们为数组的第一维度指定索引。idxs的第一个维度与array的第二个维度相匹配，即idxs.shape[0]==array.shape[1] 我想用ndim 3和shape（idxs.shape[1]，array.shape[1]，array.shape[2]）得到一个结果ndarray结果，如下所示： for i0 in range(idxs.shape[1]): for i1 in

我有一个带有ndim 3的ndarray

数组

，还有一些带有ndim 2的索引ndarray

idxs

，它们为

数组

的第一维度指定索引。

idxs

的第一个维度与

array

的第二个维度相匹配，即

idxs.shape[0]==array.shape[1]

我想用ndim 3和shape

（idxs.shape[1]，array.shape[1]，array.shape[2]）

得到一个结果ndarray

结果，如下所示：
for i0 in range(idxs.shape[1]):
    for i1 in range(array.shape[1]):
        result[i0, i1] = array[idxs[i1, i0], i1]

def simplified(array, idxs):
    return array[idxs.T, np.arange(idxs.shape[0])]

我怎样才能更直接地得到这个
我曾考虑过使用高级索引，但我不确定那会是什么样子
在Theano中，以下工作：
dim1 = theano.tensor.arange(array.shape[1])
result = array[idxs[dim1], dim1]

创建与行索引相对应的二维索引网格：idxs[i1，i0]
并使用nx1
数组进行列索引。当索引到像这样的数组中时，列索引将广播到行索引的形状。因此，我们将有一个基于基础的方法，就像这样-
# Get 2D grid of row indices corresponding to two nested loops
row_idx = idxs[np.arange(array.shape[1])[:,None],np.arange(idxs.shape[1])]

# Use column indices alongwith row_idx to index into array. 
# The column indices would be broadcasted when put as Nx1 array.
result = array[row_idx,np.arange(array.shape[1])[:,None]].T

row_idx = idxs[np.ix_(np.arange(array.shape[1]),np.arange(idxs.shape[1]))]

请注意，@ali_m在评论中提到，也可以用于创建行idx
，如下所示-
# Get 2D grid of row indices corresponding to two nested loops
row_idx = idxs[np.arange(array.shape[1])[:,None],np.arange(idxs.shape[1])]

# Use column indices alongwith row_idx to index into array. 
# The column indices would be broadcasted when put as Nx1 array.
result = array[row_idx,np.arange(array.shape[1])[:,None]].T

row_idx = idxs[np.ix_(np.arange(array.shape[1]),np.arange(idxs.shape[1]))]

运行时测试和验证输出
函数定义：
def broadcasted_indexing(array,idxs):
    row_idx = idxs[np.arange(array.shape[1])[:,None],np.arange(idxs.shape[1])]
    return array[row_idx,np.arange(array.shape[1])[:,None]].T

def forloop(array,idxs):
    result = np.zeros((idxs.shape[1],array.shape[1]))
    for i0 in range(idxs.shape[1]):
        for i1 in range(array.shape[1]):
            result[i0, i1] = array[idxs[i1, i0], i1]
    return result

运行时测试和验证输出：
In [149]: # Inputs
     ...: m = 500
     ...: n = 400
     ...: array = np.random.rand(m,n)
     ...: idxs = np.random.randint(0,array.shape[1],(n,m))
     ...: 

In [150]: np.allclose(broadcasted_indexing(array,idxs),forloop(array,idxs))
Out[150]: True

In [151]: %timeit forloop(array,idxs)
10 loops, best of 3: 136 ms per loop

In [152]: %timeit broadcasted_indexing(array,idxs)
100 loops, best of 3: 5.01 ms per loop

您的for
循环执行以下操作：
out[i, j] == array[idxs[j, i], j]

也就是说，idxs
中的j，ith元素为out
中的i，jth元素提供数组中的行索引。array
中对应的列索引集只是介于0和idxs.shape[0]-1
之间的序列整数（在这种情况下，它恰好与array.shape[1]-1
相同，但一般不需要）
因此，您的
for
循环可以替换为如下的单个数组索引操作：
for i0 in range(idxs.shape[1]):
    for i1 in range(array.shape[1]):
        result[i0, i1] = array[idxs[i1, i0], i1]

def simplified(array, idxs):
    return array[idxs.T, np.arange(idxs.shape[0])]

我们可以根据@Divakar答案中的函数测试正确性和速度：
m, n = 500, 400
array = np.random.rand(m, n)
idxs = np.random.randint(n, size=(n, m))

print(np.allclose(forloop(array, idxs), simplified(array, idxs)))
# True

%timeit forloop(array, idxs)
# 10 loops, best of 3: 101 ms per loop

%timeit broadcasted_indexing(array, idxs)
# 100 loops, best of 3: 4.1 ms per loop

%timeit simplified(array, idxs)
# 1000 loops, best of 3: 1.66 ms per loop

有一个方便的功能专门为此设计purpose@ali_m谢谢我总是忘记那个，只是加了那个。因此，这可以取代创建行idx
的方式，但似乎np.ix
不适用于2D数组输入，因此最后一步仍然需要“广播索引”，对吗？我甚至不知道该怎么称呼它：）实际上，row\u idx
与idxs
完全相同，所以你可以只做array[idxs，np.arange（array.shape[1]）[：，None]].T
（或者array[idxs.T，np.r[：array.shape[1]]]
，以实现紧凑性）。@ali\m啊，是的！起初我并没有假设idxs.shape[0]==array.shape[1]
，但有了它，idxs
似乎与row\u idx
相同。那是另一个很棒的工具，非常感谢！把所有这些都当作答案吧！非常感谢简化版。我不太明白为什么这和数组[idxs.t]
不一样。它总是试图匹配多个索引向量，但当维度的索引是隐式的时就不这样做了？这类事情也经常让我感到困惑。1D的情况更容易理解idxs.T
被解释为行索引的2D数组，因此如果array
是（m，）
1D数组，那么array[idxs.T]
将具有形状（m，n）
（因为您要从每行多次采样）。在您的例子中，array
已经是（m，n）
，因此array[idxs.T]
的结果是（m，n，n）
，因为numpy保留了“现有”列维度。要折叠“现有”列维度，需要为其提供另一个1D索引向量。