R 用coxph模型预测每个因素的事件概率

R 用coxph模型预测每个因素的事件概率,r,cox-regression,survival,R,Cox Regression,Survival,我的问题相对简单,但我在不同的论坛上找不到任何明确的答案。 我正在运行一个coxph模型来预测在三个不同地点经历两次处理的单个植物的存活率。这些人被监控了三年。我的数据和关联模型如下所示: # Generate data mydata <- data.frame(Site = as.factor(sample(c("SiteA", "SiteB", "SiteC"), 100, replace = TRUE)), Treatment = as.f

我的问题相对简单,但我在不同的论坛上找不到任何明确的答案。 我正在运行一个coxph模型来预测在三个不同地点经历两次处理的单个植物的存活率。这些人被监控了三年。我的数据和关联模型如下所示:

# Generate data
mydata <- data.frame(Site = as.factor(sample(c("SiteA", "SiteB", "SiteC"), 100, replace = TRUE)), 
                     Treatment = as.factor(sample(c("Treat.A", "Treat.B"), 100, replace = TRUE)), 
                     Time = sample(c(1, 2, 3), 100, replace = TRUE), 
                     Surv = sample(c(0, 1), 100, replace = TRUE)) # Alive is 0, death is 1


# Model
mymodel <- coxph(Surv(Time , Surv) ~ Treatment*Site, 
              data = mydata)
#生成数据

mydata我认为您遇到了这个问题,因为
survfit
返回的对象的
surv
lower
upper
元素不是向量,而是矩阵。它给出的是生存曲线,而不是点预测。这些矩阵中的列与输入到
survfit
的数据帧行中出现的协变量的特定组合相关联,而这些矩阵的行表示在原始数据中观察到的全部(顺序)时间步长。如果需要特定时间t的拟合值,则需要提取该矩阵的第tth行,即
fitted$surv[t,]

为了解决您的特定问题,一种选择是仅使用您想要的协变量组合创建新的数据帧,然后将您的模型应用于该数据帧,然后提取表示您想要的时间步长的行。所以,这里

library(survival)

# Generate data
set.seed(123)
mydata <- data.frame(Site = as.factor(sample(c("SiteA", "SiteB", "SiteC"), 100, replace = TRUE)), 
                     Treatment = as.factor(sample(c("Treat.A", "Treat.B"), 100, replace = TRUE)), 
                     Time = sample(seq(3), 100, replace = TRUE), 
                     Surv = sample(c(0, 1), 100, replace = TRUE)) # Alive is 0, death is 1


# Model
mymodel <- coxph(Surv(Time , Surv) ~ Treatment*Site, data = mydata)

# use expand.grid to get a table with all possible combinations of Site and Treatment
newdata <- with(mydata, expand.grid(Site = unique(Site), Treatment = unique(Treatment)))
# add a vector for your time of interest for clarity's sake; it won't actually factor into survfit
newdata$time = 3

# run survfit on that new table
fitted <- survfit(mymodel, newdata = newdata)

# extract the fitted values for the time slice of interest to you, here 3
newdata$fit <- fitted$surv[3,]
newdata$lower <- fitted$lower[3,]
newdata$upper <- fitted$upper[3,]

# result
print(newdata)
   Site Treatment time       fit      lower     upper
1 SiteA   Treat.B    3 0.3149307 0.15064889 0.6583612
2 SiteC   Treat.B    3 0.1721691 0.04597197 0.6447887
3 SiteB   Treat.B    3 0.3979556 0.18679672 0.8478130
4 SiteA   Treat.A    3 0.6117692 0.37752270 0.9913616
5 SiteC   Treat.A    3 0.3390650 0.15646255 0.7347769
6 SiteB   Treat.A    3 0.3128776 0.13297313 0.7361819
库(生存)
#生成数据
种子集(123)

mydata使用带有时间值的
predict.coxph

testset <-data.frame( Time=3, Surv=1,  # the Surv value is just a placeholder
                      Treatment=factor(rep(c("Treat.A", "Treat.B"),times=3)) , 
                      Site=factor(rep(c("SiteA", "SiteB", "SiteC"), each=2)))

testset$Surv3yr <- exp( -predict(mymodel, newdata=testset, typ="expected") )
testset
  Time Surv Treatment  Site   Surv3yr
1    3    1   Treat.A SiteA 0.1633725
2    3    1   Treat.B SiteA 0.3906895
3    3    1   Treat.A SiteB 0.3432062
4    3    1   Treat.B SiteB 0.2940677
5    3    1   Treat.A SiteC 0.5411742
6    3    1   Treat.B SiteC 0.2047518
testset
testset <-data.frame( Time=3, Surv=1,  # the Surv value is just a placeholder
                      Treatment=factor(rep(c("Treat.A", "Treat.B"),times=3)) , 
                      Site=factor(rep(c("SiteA", "SiteB", "SiteC"), each=2)))

testset$Surv3yr <- exp( -predict(mymodel, newdata=testset, typ="expected") )
testset
  Time Surv Treatment  Site   Surv3yr
1    3    1   Treat.A SiteA 0.1633725
2    3    1   Treat.B SiteA 0.3906895
3    3    1   Treat.A SiteB 0.3432062
4    3    1   Treat.B SiteB 0.2940677
5    3    1   Treat.A SiteC 0.5411742
6    3    1   Treat.B SiteC 0.2047518