Python NumPy:选择数据并将其求和到数组中

Python NumPy:选择数据并将其求和到数组中,python,numpy,Python,Numpy,我有一个(大的)数据数组和一个(大的)索引列表,例如 data = [1.0, 10.0, 100.0] contribs = [[1, 2], [0], [0, 1]] 对于contribs中的每个条目,我想将数据的相应值相加,并将它们放入一个数组中。对于上述示例,预期结果为 out = [110.0, 1.0, 11.0] 在循环中这样做是可行的 c = numpy.zeros(len(contribs)) for k, indices in enumerate(contribs):

我有一个(大的)数据数组和一个(大的)索引列表,例如

data = [1.0, 10.0, 100.0]
contribs = [[1, 2], [0], [0, 1]]
对于
contribs
中的每个条目,我想将
数据的相应值相加,并将它们放入一个数组中。对于上述示例,预期结果为

out = [110.0, 1.0, 11.0]
在循环中这样做是可行的

c = numpy.zeros(len(contribs))
for k, indices in enumerate(contribs):
    for idx in indices:
        c[k] += data[idx]
但是由于
数据
contrib
都很大,所以花费的时间太长了

我觉得使用numpy的奇特索引可以改进这一点


有什么提示吗?

一种可能是

data = np.array(data)
out = [np.sum(data[c]) for c in contribs]
至少应该比双循环快。

这里有一个几乎矢量化的*方法-

# Get lengths of list element in contribs and the cumulative lengths
# to be used for creating an ID array later on.
clens = np.cumsum([len(item) for item in contribs])

# Setup ID array that corresponds to same ID for same list element in contribs.
# These IDs would be used to accumulate values from a corresponnding array
#  that is created by indexing into data array with a flattened contribs
id_arr = np.zeros(clens[-1],dtype=int)
id_arr[clens[:-1]] = 1
out = np.bincount(id_arr.cumsum(),np.take(data,np.concatenate(contribs)))
此方法涉及一些设置工作。因此,如果在
contribs
中输入适当大小的输入数组和适当数量的列表元素,则有望看到好处,这将对应于循环解决方案中的循环


*请注意,这几乎是矢量化的,因为这里执行的唯一循环是在开始处,我们在这里获得列表元素的长度。但是,计算要求不高的部分对总运行时间的影响应该最小。

我不确定所有情况是否都有效,但以您的示例为例,使用
数据作为
numpy.array

# Flatten "contribs"
f = [j for i in contribs for j in i]

# Get the "ranges" of data[f] that will be summed in the next step
i = [0] + numpy.cumsum([len(i) for i in contribs]).tolist()[:-1]

# Take the required sums
numpy.add.reduceat(data[f], i)

您尝试过任何奇特的索引吗?当然,但是条目
contribs
的长度不一这一事实使它很困难。我哪儿也没去。