使用R中的循环执行多重生存分析_R_Loops_Survival Analysis

使用R中的循环执行多重生存分析

r loops

使用R中的循环执行多重生存分析,r,loops,survival-analysis,R,Loops,Survival Analysis,我最近正在与R进行生存分析。我有两个数据框，基因表达的geneDf，后续的survDf。作为以下样本： #Data frame:geneID geneID=c("EGFR","Her2","E2F1","PTEN") patient1=c(12,23,56,23) patient2=c(23,34,11,6) patient3=c(56,44,32,45) patient4=c(23,64,45,23) geneDf=data.frame(patient1,patient2,patient3

我最近正在与R进行生存分析。我有两个数据框，基因表达的geneDf，后续的survDf。作为以下样本：

#Data frame:geneID  
geneID=c("EGFR","Her2","E2F1","PTEN")
patient1=c(12,23,56,23)
patient2=c(23,34,11,6)
patient3=c(56,44,32,45)
patient4=c(23,64,45,23)
geneDf=data.frame(patient1,patient2,patient3,patient4,geneID)
> geneDf
  patient1 patient2 patient3 patient4 geneID
1       12       23       56       23   EGFR
2       23       34       44       64   Her2
3       56       11       32       45   E2F1
4       23        6       45       23   PTEN
#Data frame:survDf
ID=c("patient1","patient2","patient3","patient4")
time=c(23,7,34,56)
status=c(1,0,1,1)
survDf=data.frame(ID,time,status)
#    
> survDf
        ID time status
1 patient1   23      1
2 patient1    7      0
3 patient1   34      1
4 patient1   56      1

我从geneDf中提取特定基因的表达数据，以其表达的中位数作为截止值，通过“生存”包进行生存分析，通过survdiff获得p值。在下面的代码中，我以“EGFR”基因为例

#extract expression of a certain gene
targetGene<-subset(geneDf,grepl("EGFR",geneDf$geneID))
targetGene$geneID<-NULL
#Transpose the table and adjust its format
targetGene<-t(targetGene[,1:ncol(targetGene)])
targetGene<-data.frame(as.factor(rownames(targetGene)),targetGene)
colnames(targetGene)<-c("ID","Expression")
rownames(targetGene)<-NULL
targetGene$Expression1<-targetGene$Expression
 targetGene$Expression1[ targetGene$Expression<median( targetGene$Expression)]<-1
targetGene$Expression1[ targetGene$Expression>=median( targetGene$Expression)]<-2
#Survival analysis
library(survival)
##Add survival object
survDf$SurvObj<-with(survDf, Surv(time,status==1))
## Kaplan-Meier estimator for stage
km<-survfit(SurvObj~targetGene$Expression1, data=survDf, conf.type = "log-log")
sdf<-survdiff(Surv(time, status) ~targetGene$Expression1, data=survDf)
#gain p value
p.val <-1-pchisq(sdf$chisq, length(sdf$n) - 1)
> p.val
[1] 0.1572992

#提取特定基因的表达
targetGene这是一张难看的纸条，但很有效
在数据10中，在第一列中，您需要有时间，在第二列中需要有状态，在下一列中需要有您想要的任何治疗。（患者作为行名）
loopsurfy您首先需要澄清您的统计方法。通过将指标变量设置为中位数，您失去了大量信息，并且无法对多重比较进行校正。在花更多的精力在一个几乎保证会产生垃圾的策略上之前，请寻求有能力的统计帮助。非常感谢BondedDust的评论！实际上，我正在使用这两个数据框来选择一些候选基因进行进一步研究。所以，这是程序的第一步。采用中位数作为指标是一种合理的选择方法，并将在未来的实验中得到验证。现在，我真的需要帮助来完成上述多重分析。
loopsurff<-function(Data10){combos<-
rbind.data.frame(rep(1,ncol(Data10)- 2),
rep(2,ncol(Data10)-2),rep(3:(ncol(Data10)-2),1))
combos<-as.matrix(sapply(combos, as.numeric));library(plyr);
library(survival) 
vv<-adply(combos, 2, function(x) {
fit <-survdiff(Surv(Data10[,1], Data10[,2]) ~ Data10[, x[3]],data=Data10)
p<-1 - pchisq(fit$chisq, 1)
out <- data.frame("var1"=colnames(Data10)[x[3]],"p.value" =   
as.numeric(sprintf("%.3f", p)))
return(out)  
})
}`