R 基于只有一个状态的上下文的条件概率预测

R 基于只有一个状态的上下文的条件概率预测,r,markov-chains,pst,traminer,sequence-analysis,R,Markov Chains,Pst,Traminer,Sequence Analysis,似乎PST无法预测由单个状态组成的上下文(例如EX-EX 考虑以下代码: # Load libraries library(RCurl) library(TraMineR) library(PST) # Get data x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/c2539d06771317c5f4c8d3a2052a73fc485a0

似乎
PST
无法预测由单个状态组成的上下文(例如
EX-EX

考虑以下代码:

# Load libraries
library(RCurl)
library(TraMineR)
library(PST)

# Get data
x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/c2539d06771317c5f4c8d3a2052a73fc485a09c6/challenge_level.csv")
data <- read.csv(text = x)

# Load and transform data
data <- read.table("thread_level.csv", sep = ",", header = F, stringsAsFactors = F)

# Create sequence object
data.seq <- seqdef(data[2:nrow(data),2:ncol(data)], missing = NA, right= NA, nr = "*")

# Make a tree
S1 <- pstree(data.seq, ymin = 0.05, L = 6, lik = TRUE, with.missing = TRUE)

# Mine the context
context <- seqdef("EX-EX")
p_context <- predict(S1.p1, context, decomp = F, output = "prob")
这意味着无法执行
predict()


如何根据只有一个状态的上下文预测下一个状态的条件概率,该状态可能会重复多次?

这是自版本1.8-12以来已修复的
seqdef
问题

以下是我通过
TraMineR 1.8-13获得的信息

> context <- seqdef("EX-EX")
 [>] 1 distinct states appear in the data: 
     1 = EX
 [>] state coding:
       [alphabet]  [label]  [long label] 
     1  EX          EX       EX
 [>] 1 sequences in the data set
 [>] min/max sequence length: 2/2
> p_context <- predict(S1, context, decomp = F, output = "prob")
 [>] 1 sequence(s) - min/max length: 2/2
 [>] max. context length: L=6
 [>] found 2 distinct context(s)
 [>] total time: 0.019 secs
> p_context
           prob
[1] 0.000476372
>context]1数据中出现不同的状态:
1=EX
[>]状态编码:
[字母表][标签][长标签]
1前
[>]1数据集中的序列
[>]最小/最大序列长度:2/2
>p_上下文]1序列-最小/最大长度:2/2
[>]最大上下文长度:L=6
[>]找到2个不同的上下文
[>]总时间:0.019秒
>p_语境
问题
[1] 0.000476372

请注意,我将未定义的
S1.p1
替换为
S1

这适用于重复相同标记的上下文,例如
EX-EX
。然而,长度为1个标记的上下文(例如,
EX
)仍然没有被计算,但这里的问题似乎在
predict()
中,而不是在
seqdef()
中。对于
EX
,它不是一个真正的序列,它的概率只是它在数据中出现的概率。例如,您可以将其作为
seqstatf(data.seq)[“EX”,2]/100
> context <- seqdef("EX-EX")
 [>] 1 distinct states appear in the data: 
     1 = EX
 [>] state coding:
       [alphabet]  [label]  [long label] 
     1  EX          EX       EX
 [>] 1 sequences in the data set
 [>] min/max sequence length: 2/2
> p_context <- predict(S1, context, decomp = F, output = "prob")
 [>] 1 sequence(s) - min/max length: 2/2
 [>] max. context length: L=6
 [>] found 2 distinct context(s)
 [>] total time: 0.019 secs
> p_context
           prob
[1] 0.000476372