Python 如何基于开始/结束索引对定义numpy数组的多个切片而不进行迭代？_Python_Arrays_Numpy

Python 如何基于开始/结束索引对定义numpy数组的多个切片而不进行迭代？

python arrays numpy

Python 如何基于开始/结束索引对定义numpy数组的多个切片而不进行迭代？,python,arrays,numpy,Python,Arrays,Numpy,我有一个整数的numpy数组我还有另外两个数组，它们表示这个数组中的开始和长度（也可以是开始和结束）索引，用于标识我需要处理的整数序列。序列长度可变 x=numpy.array([2,3,5,7,9,12,15,21,27,101, 250]) #Can have length of millions starts=numpy.array([2,7]) # Can have lengths of thousands ends=numpy.array([5,9]) # required ou

我有一个整数的numpy数组

我还有另外两个数组，它们表示这个数组中的开始和长度（也可以是开始和结束）索引，用于标识我需要处理的整数序列。序列长度可变

x=numpy.array([2,3,5,7,9,12,15,21,27,101, 250]) #Can have length of millions

starts=numpy.array([2,7]) # Can have lengths of thousands
ends=numpy.array([5,9])

# required output is x[2:5],x[7:9] in flat 1D array 
# [5,7,9,12,21,27,101]

我可以很容易地使用for循环来实现这一点，但是应用程序对性能非常敏感，所以我正在寻找一种不用Python迭代的方法

我们将感激您的帮助

道格

方法#1

一种矢量化方法是通过广播创建掩蔽-

In [16]: r = np.arange(len(x))

In [18]: x[((r>=starts[:,None]) & (r<ends[:,None])).any(0)]
Out[18]: array([ 5,  7,  9, 21, 27])

方法#4

为完整起见，这里有另一个with循环，用于选择片，然后分配到初始化的数组中，对于从大型数组中选择的片应该很好-

lens = ends-starts
out = np.empty(lens.sum(),dtype=x.dtype)
start = 0
for (i,j,l) in zip(starts,ends,lens):
    out[start:start+l] = x[i:j]
    start += l

如果迭代次数很多，则有可能进行一次较小的优化，以减少每次迭代的计算量-

lens = ends-starts
lims = np.r_[0,lens].cumsum()
out = np.empty(lims[-1],dtype=x.dtype)
for (i,j,s,t) in zip(starts,ends,lims[:-1],lims[1:]):
    out[s:t] = x[i:j]

会有重叠吗？如

x[2:5]、x[3:9]

等？如果是的话，重叠是否会被包括在发生的次数中？不允许重叠possible@scotsman60那么，预期的输出是

x[2:5]+x[3:9]

还是

x[2:5]+x[5:9]

？谢谢@Divakar！！！第一个解决方案正是我想要的。如果内存不足，我可以有条件地切换到其他操作之一，但是-但是如果在这个操作中内存不足，我的应用程序的其他地方会出现更大的问题。。。。

lens = ends-starts
out = np.empty(lens.sum(),dtype=x.dtype)
start = 0
for (i,j,l) in zip(starts,ends,lens):
    out[start:start+l] = x[i:j]
    start += l

lens = ends-starts
lims = np.r_[0,lens].cumsum()
out = np.empty(lims[-1],dtype=x.dtype)
for (i,j,s,t) in zip(starts,ends,lims[:-1],lims[1:]):
    out[s:t] = x[i:j]