Python 生成转移矩阵_Python_Numpy

Python 生成转移矩阵

python numpy

Python 生成转移矩阵,python,numpy,Python,Numpy,我有两个非常大的矩阵，我需要计算转移矩阵，例如：矩阵A 矩阵B： 3 2 1 1 2 3 3 2 1 那么转移矩阵应该是： 1 2 3 1 0 1/3 2/3 2 0 2/3 1/3 3 1 0 0 我目前正在使用嵌套for循环迭代这两个矩阵，然后增加转换矩阵中的数字，但速度非常慢。有没有更有效的方法？谢谢我假设a和b是NumPy数组。可以将TM构造为SciPy稀疏矩阵： import numpy as np import scipy.s

我有两个非常大的矩阵，我需要计算转移矩阵，例如：矩阵A

矩阵B：

3 2 1
1 2 3
3 2 1

那么转移矩阵应该是：

   1    2    3
1  0   1/3  2/3
2  0   2/3  1/3
3  1    0    0

我目前正在使用嵌套for循环迭代这两个矩阵，然后增加转换矩阵中的数字，但速度非常慢。有没有更有效的方法？谢谢

我假设

和

是NumPy数组。可以将TM构造为SciPy稀疏矩阵：

import numpy as np 
import scipy.sparse as sp
from itertools import chain
from collections import Counter

a = np.array([[1,2,3],[3,2,1],[2,1,3]])
b = np.array([[3,2,1],[1,2,3],[3,2,1]])

查找并计算所有实际转换：

cntr = Counter(chain.from_iterable(list(zip(*x)) for x in (zip(a,b))))
#Counter({(3, 1): 3, (1, 3): 2, (2, 2): 2, (1, 2): 1, (2, 3): 1})

构建一个稀疏的计数矩阵，其中行和列表示状态：

transition = sp.csr_matrix((list(cntr.values()), zip(*cntr.keys())))

标准化矩阵：

transition[1:,1:] / transition[1:,1:].sum(axis=1)
#array([[ 0.        ,  0.33333333,  0.66666667],
#       [ 0.        ,  0.66666667,  0.33333333],
#       [ 1.        ,  0.        ,  0.        ]])

使用

np.add.at

的更通用的转换矩阵构造函数：

def trans(A, B):

    Au, Ar = np.unique(A, return_inverse = 1)
    Bu, Br = np.unique(B, return_inverse = 1)
    indices = (Ar.ravel(), Br.ravel())
    out = np.zeros((Au.size, Bu.size))
    np.add.at(out, indices, 1)
    out /= out.sum(axis = 1)
    return out, Au, Bu

trans(A, B)
Out:
array([[ 0.        ,  0.33333333,  0.66666667],
       [ 0.        ,  0.66666667,  0.33333333],
       [ 1.        ,  0.        ,  0.        ]]),
 array([1, 2, 3]),
 array([1, 2, 3]))

与@DanielF的总体方法相同，实现速度更快（在我的测试用例中是10倍）。诀窍是避免np.add.at，它非常有用，但不是最快的。我省略了两个变量之间相同的步骤（查找唯一性和规范化概率）

转移矩阵是概率的随机矩阵。你对转移矩阵的定义是什么？对不起，我想要的是概率的随机矩阵，但我认为当你得到所有的转移数时，计算概率是很容易的。谢谢我更改了我的示例转换矩阵。您的矩阵允许的“状态”（条目）是什么？仅仅是小整数？除非我使用的是

a.max（）

和

b.max（）

，例如，如果初始矩阵为零索引（或具有非连续值），它将失败。谢谢！我发现你的回答很有帮助。我昨天在用我的数据测试它。虽然它比Daniel F的答案慢，但占用的内存要少得多，对于我的50000*50000个矩阵，占用的内存约为10000 MB。你确定它是轴=0而不是1吗？矩阵不应该是行随机的吗？是的，这更有意义。我不知道该用哪种方式进行规范化。你好，丹尼尔，谢谢你解决了我的问题，但是，在我的数据上运行此操作时，我遇到了内存错误。我的矩阵大约是50000*50000，我有一台32GB的RAM计算机。非常感谢你！我的数据集太大，所以我只是在周末保持python运行。我想我会在周一看到结果的！你好，保罗，谢谢你解决了我的问题。我从你的回答中学到了很多。然而，我遇到了与Daniel F的答案相同的内存错误问题。我想我的数据集太大了？我有大约50000*50000大小的矩阵，我的电脑有32GB的内存。谢谢

def trans(A, B):

    Au, Ar = np.unique(A, return_inverse = 1)
    Bu, Br = np.unique(B, return_inverse = 1)
    indices = (Ar.ravel(), Br.ravel())
    out = np.zeros((Au.size, Bu.size))
    np.add.at(out, indices, 1)
    out /= out.sum(axis = 1)
    return out, Au, Bu

trans(A, B)
Out:
array([[ 0.        ,  0.33333333,  0.66666667],
       [ 0.        ,  0.66666667,  0.33333333],
       [ 1.        ,  0.        ,  0.        ]]),
 array([1, 2, 3]),
 array([1, 2, 3]))

>>> A = np.random.randint(0, 100, (100, 100))
>>> B = np.random.randint(0, 100, (100, 100))
>>> 
>>> def f_df(A, B):
...     out = np.zeros((100, 100), int)
...     np.add.at(out, (A.ravel(), B.ravel()), 1)
...     return out
... 
>>> def f_pp(A, B):
...     return np.bincount(np.ravel_multi_index((A, B), (100, 100)).ravel(), minlength=10000).reshape(100, 100)
... 
>>> np.all(f_df(A, B) == f_pp(A, B))
True
>>> 
>>> repeat('f_df(A, B)', globals=globals(), number=1000)
[0.7909002639353275, 0.7779529448598623, 0.7819221799727529]
>>> repeat('f_pp(A, B)', globals=globals(), number=1000)
[0.07678529410623014, 0.07394189992919564, 0.0735252988524735]