Python NumPy：选择数据并将其求和到数组中_Python_Numpy

Python NumPy：选择数据并将其求和到数组中

python numpy

Python NumPy：选择数据并将其求和到数组中,python,numpy,Python,Numpy,我有一个（大的）数据数组和一个（大的）索引列表，例如 data = [1.0, 10.0, 100.0] contribs = [[1, 2], [0], [0, 1]] 对于contribs中的每个条目，我想将数据的相应值相加，并将它们放入一个数组中。对于上述示例，预期结果为 out = [110.0, 1.0, 11.0] 在循环中这样做是可行的 c = numpy.zeros(len(contribs)) for k, indices in enumerate(contribs):

我有一个（大的）数据数组和一个（大的）索引列表，例如

data = [1.0, 10.0, 100.0]
contribs = [[1, 2], [0], [0, 1]]

对于

contribs

中的每个条目，我想将

数据的相应值相加，并将它们放入一个数组中。对于上述示例，预期结果为
out = [110.0, 1.0, 11.0]

在循环中这样做是可行的
c = numpy.zeros(len(contribs))
for k, indices in enumerate(contribs):
    for idx in indices:
        c[k] += data[idx]

但是由于数据
和contrib
都很大，所以花费的时间太长了
我觉得使用numpy的奇特索引可以改进这一点
有什么提示吗？
一种可能是
data = np.array(data)
out = [np.sum(data[c]) for c in contribs]

至少应该比双循环快。
这里有一个几乎矢量化的*方法-
# Get lengths of list element in contribs and the cumulative lengths
# to be used for creating an ID array later on.
clens = np.cumsum([len(item) for item in contribs])

# Setup ID array that corresponds to same ID for same list element in contribs.
# These IDs would be used to accumulate values from a corresponnding array
#  that is created by indexing into data array with a flattened contribs
id_arr = np.zeros(clens[-1],dtype=int)
id_arr[clens[:-1]] = 1
out = np.bincount(id_arr.cumsum(),np.take(data,np.concatenate(contribs)))

此方法涉及一些设置工作。因此，如果在contribs
中输入适当大小的输入数组和适当数量的列表元素，则有望看到好处，这将对应于循环解决方案中的循环
*请注意，这几乎是矢量化的，因为这里执行的唯一循环是在开始处，我们在这里获得列表元素的长度。但是，计算要求不高的部分对总运行时间的影响应该最小。
我不确定所有情况是否都有效，但以您的示例为例，使用数据作为numpy.array
：
# Flatten "contribs"
f = [j for i in contribs for j in i]

# Get the "ranges" of data[f] that will be summed in the next step
i = [0] + numpy.cumsum([len(i) for i in contribs]).tolist()[:-1]

# Take the required sums
numpy.add.reduceat(data[f], i)

您尝试过任何奇特的索引吗？当然，但是条目contribs
的长度不一这一事实使它很困难。我哪儿也没去。