R:lm中高效的公式创建`
我希望为我在遗传数据上所做的一些迭代建模编写通用代码 以下是我的数据帧的子集:R:lm中高效的公式创建`,r,automation,paste,lm,R,Automation,Paste,Lm,我希望为我在遗传数据上所做的一些迭代建模编写通用代码 以下是我的数据帧的子集: > head(exprTarget) patient CDR Diagnosis diag_key UNC93B1 CTSC PLEK LGALS9 GRN CYTH4 C1QA C1QC C1QB LAPTM5 CTSS FCER1G ALOX5AP 16955 16955 2 MCI
> head(exprTarget)
patient CDR Diagnosis diag_key UNC93B1 CTSC PLEK LGALS9 GRN CYTH4 C1QA C1QC C1QB LAPTM5 CTSS FCER1G ALOX5AP
16955 16955 2 MCI 1 2.468387 3.306170 1.669025 2.197085 4.817537 2.303606 3.126281 3.537686 4.077572 4.660030 2.960342 1.0880424 2.0820685
16365 16365 5 AD 2 2.312767 3.205852 1.276787 1.942052 4.924718 2.461212 2.641784 3.592875 3.758567 4.215387 2.536174 0.9872809 0.7559553
17155 17155 5 AD 2 3.276758 4.039103 2.482880 3.347225 5.465345 2.990894 6.004585 6.108294 6.762214 5.708623 4.358901 2.5924355 3.6172763
17135 17135 5 AD 2 2.245509 3.056953 1.877469 2.083920 4.492934 1.827284 2.584534 3.012729 3.369049 3.892801 2.990098 0.7350252 1.1568519
16625 16625 4 AD 2 2.575806 3.978674 2.060418 2.327522 4.981906 2.685569 4.694788 4.725954 5.460863 5.260811 4.021172 2.5871655 3.3241311
16295 16295 4 AD 2 3.107424 3.701104 2.880653 2.880653 5.115831 2.723281 4.224342 4.717155 5.110232 5.031450 3.980189 2.0809520 1.9699207
我试图使用diag_key
作为我的响应变量,其右侧的所有列作为预测变量,即所需公式为:
lm(diag_key ~ . - patient - CDR - Diagnosis, data= exprTarget)
我想让它更通用。具体来说,我希望能够只传递将我的临床注释与基因表达数据分开的列号,在上面的示例中,这将是第4列diag_key
,尽管对于不同的实现可能有所不同
我目前的目标是用这些信息重新创建上述公式。这是我当前的尝试,请注意,response
对应于分隔列编号,即上例中的4:
clinical<- colnames(exprTarget)[1:(response-1)]
lm(as.formula(paste("exprTarget[,response] ~ . ",clinical, sep = "-")), data= exprTarget)
clinical您可以使用reformate
进行此操作。我基于响应的列名,因为这似乎比直接使用列索引更安全,但是您当然可以使用列索引
resp_col = "diag_key"
idx = match(resp_col, names(exprTarget))
my_formula = reformulate(names(exprTarget)[(idx+1):ncol(exprTarget)], response=names(exprTarget)[idx])
您可以将其打包为一个函数:
lm_form = function(data, resp_col) {
idx = match(resp_col, names(data))
form = reformulate(names(data)[(idx+1):ncol(data)], response=names(data)[idx])
lm(form, data=data)
}
my_model = lm_form(exprTarget, "diag_key")
您可以为此使用重新格式化。我基于响应的列名,因为这似乎比直接使用列索引更安全,但是您当然可以使用列索引
resp_col = "diag_key"
idx = match(resp_col, names(exprTarget))
my_formula = reformulate(names(exprTarget)[(idx+1):ncol(exprTarget)], response=names(exprTarget)[idx])
您可以将其打包为一个函数:
lm_form = function(data, resp_col) {
idx = match(resp_col, names(data))
form = reformulate(names(data)[(idx+1):ncol(data)], response=names(data)[idx])
lm(form, data=data)
}
my_model = lm_form(exprTarget, "diag_key")
可以避免这样构造公式:
nc <- ncol(exprTarget)
fm <- lm(exprTarget[response:nc])
注:上述可复制形式中使用的输入为:
Lines <- "
patient CDR Diagnosis diag_key UNC93B1 CTSC PLEK LGALS9 GRN CYTH4 C1QA C1QC C1QB LAPTM5 CTSS FCER1G ALOX5AP
16955 16955 2 MCI 1 2.468387 3.306170 1.669025 2.197085 4.817537 2.303606 3.126281 3.537686 4.077572 4.660030 2.960342 1.0880424 2.0820685
16365 16365 5 AD 2 2.312767 3.205852 1.276787 1.942052 4.924718 2.461212 2.641784 3.592875 3.758567 4.215387 2.536174 0.9872809 0.7559553
17155 17155 5 AD 2 3.276758 4.039103 2.482880 3.347225 5.465345 2.990894 6.004585 6.108294 6.762214 5.708623 4.358901 2.5924355 3.6172763
17135 17135 5 AD 2 2.245509 3.056953 1.877469 2.083920 4.492934 1.827284 2.584534 3.012729 3.369049 3.892801 2.990098 0.7350252 1.1568519
16625 16625 4 AD 2 2.575806 3.978674 2.060418 2.327522 4.981906 2.685569 4.694788 4.725954 5.460863 5.260811 4.021172 2.5871655 3.3241311
16295 16295 4 AD 2 3.107424 3.701104 2.880653 2.880653 5.115831 2.723281 4.224342 4.717155 5.110232 5.031450 3.980189 2.0809520 1.9699207"
exprTarget <- read.table(text = Lines)
response <- 4
行可以避免这样构造公式:
nc <- ncol(exprTarget)
fm <- lm(exprTarget[response:nc])
注:上述可复制形式中使用的输入为:
Lines <- "
patient CDR Diagnosis diag_key UNC93B1 CTSC PLEK LGALS9 GRN CYTH4 C1QA C1QC C1QB LAPTM5 CTSS FCER1G ALOX5AP
16955 16955 2 MCI 1 2.468387 3.306170 1.669025 2.197085 4.817537 2.303606 3.126281 3.537686 4.077572 4.660030 2.960342 1.0880424 2.0820685
16365 16365 5 AD 2 2.312767 3.205852 1.276787 1.942052 4.924718 2.461212 2.641784 3.592875 3.758567 4.215387 2.536174 0.9872809 0.7559553
17155 17155 5 AD 2 3.276758 4.039103 2.482880 3.347225 5.465345 2.990894 6.004585 6.108294 6.762214 5.708623 4.358901 2.5924355 3.6172763
17135 17135 5 AD 2 2.245509 3.056953 1.877469 2.083920 4.492934 1.827284 2.584534 3.012729 3.369049 3.892801 2.990098 0.7350252 1.1568519
16625 16625 4 AD 2 2.575806 3.978674 2.060418 2.327522 4.981906 2.685569 4.694788 4.725954 5.460863 5.260811 4.021172 2.5871655 3.3241311
16295 16295 4 AD 2 3.107424 3.701104 2.880653 2.880653 5.115831 2.723281 4.224342 4.717155 5.110232 5.031450 3.980189 2.0809520 1.9699207"
exprTarget <- read.table(text = Lines)
response <- 4
行