Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/349.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从索引的列/行数组填充发生矩阵_Python_Arrays_Numpy_Matrix_Indexing - Fatal编程技术网

Python 从索引的列/行数组填充发生矩阵

Python 从索引的列/行数组填充发生矩阵,python,arrays,numpy,matrix,indexing,Python,Arrays,Numpy,Matrix,Indexing,我正在寻找一种有效的方法,从两个包含索引的数组中创建引用矩阵,一个表示该矩阵中的行索引,另一个表示列索引 我有: #matrix will be size 4x3 in this example #array of rows idxs, with values from 0 to 3 [0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3] #array of columns idxs, with values from 0 to 2 [0, 1, 1, 1, 2,

我正在寻找一种有效的方法,从两个包含索引的数组中创建引用矩阵,一个表示该矩阵中的行索引,另一个表示列索引

我有:

#matrix will be size 4x3 in this example
#array of rows idxs, with values from 0 to 3
[0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3]
#array of columns idxs, with values from 0 to 2
[0, 1, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, 2, 2]
需要创建一个事件矩阵,如:

[[1  0  0]
 [0  2  0]
 [0  1  2]
 [2  1  5]]
我可以以简单的形式创建一个热向量数组,但当出现多个热向量时,无法使其工作:

n_rows    = 4
n_columns = 3
#data
rows    = np.array([0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3])
columns = np.array([0, 1, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, 2, 2])
#empty matrix
new_matrix = np.zeros([n_rows, n_columns])
#adding 1 for each [row, column] occurrence:
new_matrix[rows, columns] += 1
print(new_matrix)
返回:

[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  1.  1.]
 [ 1.  1.  1.]]
当存在多个引用/索引时,索引和添加这样的值似乎不起作用,除了打印之外,它似乎也可以正常工作:

print(new_matrix[rows, :])
:


那么也许我错过了什么?或者这无法完成,我需要搜索另一种方法来完成它?

使用
np.add.at
,指定索引的元组:

>>> np.add.at(new_matrix, (rows, columns), 1)
>>> new_matrix
array([[ 1.,  0.,  0.],
       [ 0.,  2.,  0.],
       [ 0.,  1.,  2.],
       [ 2.,  1.,  5.]])
np.add.at
对数组进行操作,将
1
添加到
(行、列)
元组指定的索引中。

方法#1

我们可以将这些对转换为线性索引,然后使用-

样本运行-

In [242]: n_rows    = 4
     ...: n_columns = 3
     ...: 
     ...: rows    = np.array([0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3])
     ...: columns = np.array([0, 1, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, 2, 2])

In [243]: bincount_app(rows, columns, n_rows, n_columns)
Out[243]: 
array([[1, 0, 0],
       [0, 2, 0],
       [0, 1, 2],
       [2, 1, 5]])
方法#2

或者,我们可以对线性索引进行排序,并使用
切片
获得计数,以获得第二种方法,如下所示-

def mask_diff_app(rows, columns, n_rows, n_columns):
    lidx = (columns.max()+1)*rows + columns
    lidx.sort()
    mask = np.concatenate(([True],lidx[1:] != lidx[:-1],[True]))
    count = np.diff(np.flatnonzero(mask))
    new_matrix = np.zeros([n_rows, n_columns],dtype=int)
    new_matrix.flat[lidx[mask[:-1]]] = count
    return new_matrix
方法#3

这似乎是一个具有稀疏矩阵的直截了当的方法,因为对于重复的索引,它会自行累积。好处是内存效率,因为它是一个稀疏矩阵,如果您在输出中填充少量的位置,并且稀疏矩阵输出是可以的,这一点很明显

实现看起来像这样-

from scipy.sparse import csr_matrix

def sparse_matrix_app(rows, columns, n_rows, n_columns):
    out_shp = (n_rows, n_columns)
    data = np.ones(len(rows),dtype=int)
    return csr_matrix((data, (rows, columns)), shape=out_shp)
In [314]: # Setup
     ...: n_rows = 5000
     ...: n_columns = 5000
     ...: rows = np.random.randint(0,5000,(1000))
     ...: columns = np.random.randint(0,5000,(1000))

In [315]: %timeit add_at_app(rows, columns, n_rows, n_columns)
     ...: %timeit bincount_app(rows, columns, n_rows, n_columns)
     ...: %timeit mask_diff_app(rows, columns, n_rows, n_columns)
     ...: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns)
100 loops, best of 3: 11.7 ms per loop
100 loops, best of 3: 11.1 ms per loop
100 loops, best of 3: 11.1 ms per loop
1000 loops, best of 3: 269 µs per loop
如果需要常规/密集阵列,只需执行以下操作-

sparse_matrix_app(rows, columns, n_rows, n_columns).toarray()
样本输出-

In [319]: sparse_matrix_app(rows, columns, n_rows, n_columns).toarray()
Out[319]: 
array([[1, 0, 0],
       [0, 2, 0],
       [0, 1, 2],
       [2, 1, 5]])

标杆管理 其他方法-

计时

案例1:形状
(10001000)
的输出数组和索引数=
10k

In [307]: # Setup
     ...: n_rows = 1000
     ...: n_columns = 1000
     ...: rows = np.random.randint(0,1000,(10000))
     ...: columns = np.random.randint(0,1000,(10000))

In [308]: %timeit add_at_app(rows, columns, n_rows, n_columns)
     ...: %timeit bincount_app(rows, columns, n_rows, n_columns)
     ...: %timeit mask_diff_app(rows, columns, n_rows, n_columns)
     ...: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns)
1000 loops, best of 3: 1.05 ms per loop
1000 loops, best of 3: 424 µs per loop
1000 loops, best of 3: 1.05 ms per loop
1000 loops, best of 3: 1.41 ms per loop
案例2:形状
(10001000)
的输出数组和索引数=
100k

In [309]: # Setup
     ...: n_rows = 1000
     ...: n_columns = 1000
     ...: rows = np.random.randint(0,1000,(100000))
     ...: columns = np.random.randint(0,1000,(100000))

In [310]: %timeit add_at_app(rows, columns, n_rows, n_columns)
     ...: %timeit bincount_app(rows, columns, n_rows, n_columns)
     ...: %timeit mask_diff_app(rows, columns, n_rows, n_columns)
     ...: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns)
100 loops, best of 3: 11.4 ms per loop
1000 loops, best of 3: 1.27 ms per loop
100 loops, best of 3: 7.44 ms per loop
10 loops, best of 3: 20.4 ms per loop
案例3:输出中的稀疏性

如前所述,为了使稀疏方法更好地工作,我们需要稀疏性。这种情况是这样的-

from scipy.sparse import csr_matrix

def sparse_matrix_app(rows, columns, n_rows, n_columns):
    out_shp = (n_rows, n_columns)
    data = np.ones(len(rows),dtype=int)
    return csr_matrix((data, (rows, columns)), shape=out_shp)
In [314]: # Setup
     ...: n_rows = 5000
     ...: n_columns = 5000
     ...: rows = np.random.randint(0,5000,(1000))
     ...: columns = np.random.randint(0,5000,(1000))

In [315]: %timeit add_at_app(rows, columns, n_rows, n_columns)
     ...: %timeit bincount_app(rows, columns, n_rows, n_columns)
     ...: %timeit mask_diff_app(rows, columns, n_rows, n_columns)
     ...: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns)
100 loops, best of 3: 11.7 ms per loop
100 loops, best of 3: 11.1 ms per loop
100 loops, best of 3: 11.1 ms per loop
1000 loops, best of 3: 269 µs per loop
如果您需要密集阵列,我们将失去内存效率,因此也会失去性能-

In [317]: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns).toarray()
100 loops, best of 3: 11.7 ms per loop

美丽的!非常感谢。
In [317]: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns).toarray()
100 loops, best of 3: 11.7 ms per loop