在python或R中,有没有任何方法可以对字符串应用转换矩阵?

在python或R中,有没有任何方法可以对字符串应用转换矩阵?,r,python-3.x,pandas,numpy,dataframe,R,Python 3.x,Pandas,Numpy,Dataframe,我有以下几行: johnsonsu(a=0.35, b=0.76, loc=973796.40, scale=134834.36) johnsonsu(a=0.35, b=0.76, loc=973796.40, scale=134834.36) gausshyper(a=1.50, b=0.67, c=2.50, z=3.68, loc=77873.97, scale=2249451.03) gausshyper(a=1.50, b=0.67, c=2.50, z=3.68, loc=7787

我有以下几行:

johnsonsu(a=0.35, b=0.76, loc=973796.40, scale=134834.36)
johnsonsu(a=0.35, b=0.76, loc=973796.40, scale=134834.36)
gausshyper(a=1.50, b=0.67, c=2.50, z=3.68, loc=77873.97, scale=2249451.03)
gausshyper(a=1.50, b=0.67, c=2.50, z=3.68, loc=77873.97, scale=2249451.03)
gausshyper(a=1.50, b=0.67, c=2.50, z=3.68, loc=77873.97, scale=2249451.03)
johnsonsu(a=0.35, b=0.76, loc=973796.40, scale=134834.36)
它们是一些数据的分布和参数。我们想对它们应用一个转移矩阵来获得它们的概率。我们尝试了许多不同的代码,但由于数据类型的不同,我们总是会得到错误

我们在这些帖子中尝试了以下解决方案:

我们迄今为止尝试过的最佳解决方案是:

import pandas as pd
transitions #Larger instances than the ones above in the post
df = pd.DataFrame(columns = ['state', 'next_state'])
for i, val in enumerate(transitions[:-1]): # We don't care about last state
    df_stg = pd.DataFrame(index=[0])
    df_stg['state'], df_stg['next_state'] = transitions[i], transitions[i+1]
    df = pd.concat([df, df_stg], axis = 0)
cross_tab = pd.crosstab(df['state'], df['next_state'])
cross_tab.div(cross_tab.sum(axis=1), axis=0)
结果:

state   alpha(a=1.10, loc=-94626.86, scale=1135344.81)  dgamma(a=0.61, loc=820000.00, scale=1885232.33) dgamma(a=0.78, loc=780000.00, scale=349653.54)  dgamma(a=0.81, loc=761200.00, scale=404939.11)  dweibull(c=0.77, loc=730000.00, scale=356863.56)    dweibull(c=0.90, loc=700000.00, scale=375807.48)    foldcauchy(c=2.59, loc=1423.70, scale=313236.41)    gausshyper(a=1.50, b=0.67, c=2.50, z=3.68, loc=77873.97, scale=2249451.03)  gennorm(beta=0.12, loc=725000.01, scale=0.00)   gennorm(beta=0.19, loc=545200.00, scale=38.09)  gennorm(beta=0.33, loc=575900.00, scale=7595.02)    gennorm(beta=0.33, loc=580090.00, scale=9423.99)    gennorm(beta=0.34, loc=532822.50, scale=7547.83)    gennorm(beta=0.42, loc=750000.00, scale=22359.35)   gennorm(beta=0.47, loc=666600.00, scale=42042.13)   johnsonsu(a=-0.02, b=0.50, loc=770186.45, scale=32359.52)   johnsonsu(a=-0.49, b=0.40, loc=561967.63, scale=65812.06)   johnsonsu(a=0.31, b=0.47, loc=835025.10, scale=53272.01)    johnsonsu(a=0.35, b=0.76, loc=973796.40, scale=134834.36)   loglaplace(c=1.63, loc=-927.08, scale=640927.08)    loglaplace(c=2.42, loc=-1009.51, scale=773124.55)   pearson3(skew=2.13, loc=908886.62, scale=577310.56) t(df=0.08, loc=700000.00, scale=1.71)   vonmises_line(kappa=2.01, loc=741142.93, scale=449091.04)
alpha(a=1.10, loc=-94626.86, scale=1135344.81)  19  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
dgamma(a=0.61, loc=820000.00, scale=1885232.33) 0   19  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
dgamma(a=0.78, loc=780000.00, scale=349653.54)  0   0   19  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0
dgamma(a=0.81, loc=761200.00, scale=404939.11)  0   0   0   19  0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0
dweibull(c=0.77, loc=730000.00, scale=356863.56)    0   0   0   0   19  0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0
dweibull(c=0.90, loc=700000.00, scale=375807.48)    0   0   0   0   0   19  0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0
foldcauchy(c=2.59, loc=1423.70, scale=313236.41)    0   0   0   0   1   0   19  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
gausshyper(a=1.50, b=0.67, c=2.50, z=3.68, loc=77873.97, scale=2249451.03)  0   0   0   0   0   0   0   19  0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0
gennorm(beta=0.12, loc=725000.01, scale=0.00)   0   0   0   1   0   0   0   0   19  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
gennorm(beta=0.19, loc=545200.00, scale=38.09)  0   0   0   0   0   0   0   0   0   19  0   0   0   0   0   0   0   0   0   0   0   1   0   0
gennorm(beta=0.33, loc=575900.00, scale=7595.02)    0   0   0   0   0   1   0   0   0   0   19  0   0   0   0   0   0   0   0   0   0   0   0   0
gennorm(beta=0.33, loc=580090.00, scale=9423.99)    0   0   0   0   0   0   0   0   0   0   0   19  1   0   0   0   0   0   0   0   0   0   0   0
gennorm(beta=0.34, loc=532822.50, scale=7547.83)    0   0   0   0   0   0   0   0   0   0   0   0   19  0   0   0   1   0   0   0   0   0   0   0
gennorm(beta=0.42, loc=750000.00, scale=22359.35)   0   0   0   0   0   0   1   0   0   0   0   0   0   19  0   0   0   0   0   0   0   0   0   0
gennorm(beta=0.47, loc=666600.00, scale=42042.13)   0   0   0   0   0   0   0   0   0   0   0   1   0   0   19  0   0   0   0   0   0   0   0   0
johnsonsu(a=-0.02, b=0.50, loc=770186.45, scale=32359.52)   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   19  0   0   0   0   0   0   0   0
johnsonsu(a=-0.49, b=0.40, loc=561967.63, scale=65812.06)   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   19  0   0   0   0   0   0   0
johnsonsu(a=0.31, b=0.47, loc=835025.10, scale=53272.01)    0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   19  0   0   0   0   0   0
johnsonsu(a=0.35, b=0.76, loc=973796.40, scale=134834.36)   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   19  0   0   0   0   0
loglaplace(c=1.63, loc=-927.08, scale=640927.08)    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   19  0   0   0   0
loglaplace(c=2.42, loc=-1009.51, scale=773124.55)   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   19  0   0   0
pearson3(skew=2.13, loc=908886.62, scale=577310.56) 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   19  0   0
t(df=0.08, loc=700000.00, scale=1.71)   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   19  1
vonmises_line(kappa=2.01, loc=741142.93, scale=449091.04)   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   19

遗嘱证明是错误的。最后一个代码输出0表示转换矩阵中的大多数值。然而,如果索引和列彼此相似,它们的值将变为19,我已经解决了这个问题。我刚刚注意到,当数据没有被洗牌时,输出为19和0。因此,我洗牌了数据,然后运行代码。数据符合要求

在本例中,我将添加字符,而不是分布和参数,以便使事情变得更简单

transitions = ['A', 'B', 'B', 'C', 'A', 'A', 'A', 'Z']
from itertools import islice

def window(seq, n=2):
    "Sliding window width n from seq.  From old itertools recipes."""
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

import pandas as pd

pairs = pd.DataFrame(window(transitions), columns=['state1', 'state2'])
counts = pairs.groupby('state1')['state2'].value_counts()
probs = (counts / counts.sum()).unstack()

DF_probs = pd.DataFrame(probs)
df = DF_probs.fillna(0)
结果是:

state2         A         B         C         Z
state1                                        
A       0.285714  0.142857  0.000000  0.142857
B       0.000000  0.142857  0.142857  0.000000
C       0.142857  0.000000  0.000000  0.000000

参考资料:

到目前为止,你尝试了什么?@TobiasWilfert该帖子已更新了不同链接中的解决方案。问题尚不清楚。请提供您的导入、错误和更多详细信息。@jasonm帖子已更新,那么错误是什么?