Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/arrays/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将numpy数组项放入";垃圾箱;_Python_Arrays_Numpy - Fatal编程技术网

Python 将numpy数组项放入";垃圾箱;

Python 将numpy数组项放入";垃圾箱;,python,arrays,numpy,Python,Arrays,Numpy,我有一个包含一些整数的numpy数组,例如 a = numpy.array([1, 6, 6, 4, 1, 1, 4]) 现在,我想将所有项目放入具有相同值的“箱子”中,以便标签为1的箱子包含具有值1的a的所有索引。对于上述示例: bins = { 1: [0, 4, 5], 6: [1, 2], 4: [3, 6], } unique和的组合,其中s起作用 uniques = numpy.unique(a) bins = {u: numpy.where(a

我有一个包含一些整数的numpy数组,例如

a = numpy.array([1, 6, 6, 4, 1, 1, 4])
现在,我想将所有项目放入具有相同值的“箱子”中,以便标签为
1
的箱子包含具有值
1
a
的所有索引。对于上述示例:

bins = {
    1: [0, 4, 5],
    6: [1, 2],
    4: [3, 6],
    }
unique
的组合,其中
s起作用

uniques = numpy.unique(a)
bins = {u: numpy.where(a == u)[0] for u in uniques}
但这似乎并不理想,因为唯一条目的数量可能很大。

这里有一种方法-

def groupby_uniqueness_dict(a):
    sidx = a.argsort()
    b = a[sidx]
    cut_idx = np.flatnonzero(b[1:] != b[:-1])+1
    parts = np.split(sidx, cut_idx)
    out = dict(zip(b[np.r_[0,cut_idx]], parts))
    return out
通过避免使用
np.split
-

def groupby_uniqueness_dict_v2(a):
    sidx = a.argsort()  # use .tolist() for output dict values as lists
    b = a[sidx]
    cut_idx = np.flatnonzero(b[1:] != b[:-1])+1
    idxs = np.r_[0,cut_idx, len(b)+1]
    out = {b[i]:sidx[i:j] for i,j in zip(idxs[:-1], idxs[1:])}
    return out
样本运行-

In [161]: a
Out[161]: array([1, 6, 6, 4, 1, 1, 4])

In [162]: groupby_uniqueness_dict(a)
Out[162]: {1: array([0, 4, 5]), 4: array([3, 6]), 6: array([1, 2])}
运行时测试

其他方法-

时间安排-

案例#1:将值记录为数组

In [226]: a = np.random.randint(0,1000, 10000)

In [227]: %timeit defaultdict_app(a)
     ...: %timeit groupby_uniqueness_dict(a)
     ...: %timeit groupby_uniqueness_dict_v2(a)
100 loops, best of 3: 4.06 ms per loop
100 loops, best of 3: 3.06 ms per loop
100 loops, best of 3: 2.02 ms per loop

In [228]: a = np.random.randint(0,10000, 100000)

In [229]: %timeit defaultdict_app(a)
     ...: %timeit groupby_uniqueness_dict(a)
     ...: %timeit groupby_uniqueness_dict_v2(a)
10 loops, best of 3: 43.5 ms per loop
10 loops, best of 3: 29.1 ms per loop
100 loops, best of 3: 19.9 ms per loop
案例2:将值记录为列表

In [238]: a = np.random.randint(0,1000, 10000)

In [239]: %timeit defaultdict_app(a)
     ...: %timeit groupby_uniqueness_dict(a)
     ...: %timeit groupby_uniqueness_dict_v2(a)
100 loops, best of 3: 4.15 ms per loop
100 loops, best of 3: 4.5 ms per loop
100 loops, best of 3: 2.44 ms per loop

In [240]: a = np.random.randint(0,10000, 100000)

In [241]: %timeit defaultdict_app(a)
     ...: %timeit groupby_uniqueness_dict(a)
     ...: %timeit groupby_uniqueness_dict_v2(a)
10 loops, best of 3: 57.5 ms per loop
10 loops, best of 3: 54.6 ms per loop
10 loops, best of 3: 34 ms per loop

带append的Defaultdict可以实现以下功能:

from collections import defaultdict

d = defaultdict(list)

for ix, val in enumerate(a):
  d[val].append(ix)

下面是利用广播的一种方法,
np.where()
,和
np.split()

from collections import defaultdict

d = defaultdict(list)

for ix, val in enumerate(a):
  d[val].append(ix)
In [66]: unique = np.unique(a)

In [67]: rows, cols = np.where(unique[:, None] == a)

In [68]: indices = np.split(cols, np.where(np.diff(rows) != 0)[0] + 1)

In [69]: dict(zip(unique, indices))
Out[69]: {1: array([0, 4, 5]), 4: array([3, 6]), 6: array([1, 2])}