Python 从时间序列数据计算转移矩阵的有效方法是什么?
我试图从时间序列数据计算转移矩阵。我编写了一个自定义函数,如下代码所示Python 从时间序列数据计算转移矩阵的有效方法是什么?,python,pandas,numpy,scikit-learn,time-series,Python,Pandas,Numpy,Scikit Learn,Time Series,我试图从时间序列数据计算转移矩阵。我编写了一个自定义函数,如下代码所示 def compute_transition_matrix(data, n, step = 1): P = np.zeros((n, n)) m = len(data) for i in range(m): initial, final = i, i + step if final < m: P[data[initial]][data[fi
def compute_transition_matrix(data, n, step = 1):
P = np.zeros((n, n))
m = len(data)
for i in range(m):
initial, final = i, i + step
if final < m:
P[data[initial]][data[final]] += 1
sums = np.sum(P, axis = 1)
for i in range(n):
for j in range(n):
P[i][j] = P[i][j] / sums[i]
return P
print(compute_transition_matrix([3, 0, 1, 3, 2, 6, 5, 4, 7, 5, 4], 8, 1))
然而,我只是想知道是否有一种方法可以使用NumPy/pandas/scikit中的内置函数来实现这一点?我不确定是否有内置函数来实现这一点,但我可以考虑在
NumPy中(使用and)这样做:
def compute_transition_matrix2(数据,n,步长=1):
t=np.数组(数据)
步骤
总尺寸=t.尺寸-(步骤+1)+1
t_stread=np.lib.stread_.as_stread(
T
形状=(总数,2),
步幅=(t.步幅[0],步长*t.步幅[0]))
inds,计数=np.唯一(t_跨步,轴=0,返回_计数=True)
P=np.零((n,n))
P[inds[:,0],inds[:,1]=计数
总和=P.总和(轴=1)
#通过仅规范化非零行,避免被零除错误
P[sums!=0]=P[sums!=0]/sums[sums!=0][:,无]
#P=P/P.sum(轴=1)[:,无]
返回P
打印(计算转换矩阵2([3,0,1,3,2,6,5,4,7,5,4,8,1))
代码的结果:
def compute_transition_matrix(data, n, step = 1):
P = np.zeros((n, n))
m = len(data)
for i in range(m):
initial, final = i, i + step
if final < m:
P[data[initial]][data[final]] += 1
sums = np.sum(P, axis = 1)
for i in range(n):
if sums[i] != 0: # Added this check
for j in range(n):
P[i][j] = P[i][j] / sums[i]
return P
print(compute_transition_matrix([3, 0, 1, 3, 2, 6, 5, 4, 7, 5, 4], 8, 1))
# Generate some random large data
n = 1000
t = np.random.choice(np.arange(n), size = n)
data = list(t)
%timeit compute_transition_matrix(data, n, 1)
# 433 ms ± 21.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit compute_transition_matrix2(data, n, 1)
# 5.5 ms ± 304 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
我的代码中的中间值:(供参考)
t\u跨步=
array([[3, 0],
[0, 1],
[1, 3],
[3, 2],
[2, 6],
[6, 5],
[5, 4],
[4, 7],
[7, 5],
[5, 4]])
(array([[0, 1],
[1, 3],
[2, 6],
[3, 0],
[3, 2],
[4, 7],
[5, 4],
[6, 5],
[7, 5]]),
array([1, 1, 1, 1, 1, 1, 2, 1, 1]))
inds,counts=
array([[3, 0],
[0, 1],
[1, 3],
[3, 2],
[2, 6],
[6, 5],
[5, 4],
[4, 7],
[7, 5],
[5, 4]])
(array([[0, 1],
[1, 3],
[2, 6],
[3, 0],
[3, 2],
[4, 7],
[5, 4],
[6, 5],
[7, 5]]),
array([1, 1, 1, 1, 1, 1, 2, 1, 1]))
时间比较:
def compute_transition_matrix(data, n, step = 1):
P = np.zeros((n, n))
m = len(data)
for i in range(m):
initial, final = i, i + step
if final < m:
P[data[initial]][data[final]] += 1
sums = np.sum(P, axis = 1)
for i in range(n):
if sums[i] != 0: # Added this check
for j in range(n):
P[i][j] = P[i][j] / sums[i]
return P
print(compute_transition_matrix([3, 0, 1, 3, 2, 6, 5, 4, 7, 5, 4], 8, 1))
# Generate some random large data
n = 1000
t = np.random.choice(np.arange(n), size = n)
data = list(t)
%timeit compute_transition_matrix(data, n, 1)
# 433 ms ± 21.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit compute_transition_matrix2(data, n, 1)
# 5.5 ms ± 304 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
非常详细的解释。谢谢+1.基准测试结果。你是博士吗?谢谢!不,只是有一些马尔可夫链和numpy的经验!