SAS proc在R中的等效混合
我正在尝试在R中转换以下SAS代码,以获得与从SAS获得的结果相同的结果。以下是SAS代码:SAS proc在R中的等效混合,r,sas,lme4,mixed-models,R,Sas,Lme4,Mixed Models,我正在尝试在R中转换以下SAS代码,以获得与从SAS获得的结果相同的结果。以下是SAS代码: DATA plants; INPUT sample $ treatmt $ y ; cards; 1 trt1 6.426264755 1 trt1 6.95419631 1 trt1 6.64385619 1 trt2 7.348728154 1 trt2 6.2479
DATA plants;
INPUT sample $ treatmt $ y ;
cards;
1 trt1 6.426264755
1 trt1 6.95419631
1 trt1 6.64385619
1 trt2 7.348728154
1 trt2 6.247927513
1 trt2 6.491853096
2 trt1 2.807354922
2 trt1 2.584962501
2 trt1 3.584962501
2 trt2 3.906890596
2 trt2 3
2 trt2 3.459431619
3 trt1 2
3 trt1 4.321928095
3 trt1 3.459431619
3 trt2 3.807354922
3 trt2 3
3 trt2 2.807354922
4 trt1 0
4 trt1 0
4 trt1 0
4 trt2 0
4 trt2 0
4 trt2 0
;
RUN;
PROC MIXED ASYCOV NOBOUND DATA=plants ALPHA=0.05 method=ML;
CLASS sample treatmt;
MODEL y = treatmt ;
RANDOM int treatmt/ subject=sample ;
RUN;
我从SAS获得以下协方差估计值:
我在R中尝试了以下方法,但得到了不同的结果
s=as.factor(sample)
lmer(y~ 1+treatmt+(1|treatmt:s),REML=FALSE)
我不知道您是否能够从SAS到R获得准确的结果,但我能够通过处理
对比度来接近,如下所述:
:第6页
比较SAS PROC MIXED和lmer one得出的估计值时
必须仔细考虑用于定义
各种因素的影响。在SAS中,具有截距和定性参数的模型
根据截距和指示器定义系数
除最后一级因子外的所有变量。默认值
S中的行为是使用Helmert对比作为因子。在
这些平衡因子提供了一组正交对比。在R
默认是“处理”对比,几乎与
SAS参数化,但它们会删除第一个
关卡,不是最后一关。当有疑问时,检查哪些对比度是正确的
与对比度功能一起使用。为了便于比较,
你可能觉得值得申报
选项(对比度=c(factor=“contr.SAS”,ordered=“contr.poly”)
在会话开始时
dput:
df <- structure(list(sample = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L),
treatmt = c("trt1", "trt1", "trt1", "trt2", "trt2", "trt2",
"trt1", "trt1", "trt1", "trt2", "trt2", "trt2", "trt1", "trt1",
"trt1", "trt2", "trt2", "trt2", "trt1", "trt1", "trt1", "trt2",
"trt2", "trt2"), y = c(6.426264755, 6.95419631, 6.64385619,
7.348728154, 6.247927513, 6.491853096, 2.807354922, 2.584962501,
3.584962501, 3.906890596, 3, 3.459431619, 2, 4.321928095,
3.459431619, 3.807354922, 3, 2.807354922, 0, 0, 0, 0, 0,
0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-24L), .Names = c("sample", "treatmt", "y"))
电流输出:
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: y ~ 1 + treatmt + (1 | treatmt:sample)
Data: df
AIC BIC logLik deviance df.resid
80.3564 85.0686 -36.1782 72.3564 20
Random effects:
Groups Name Std.Dev.
treatmt:sample (Intercept) 2.344
Residual 0.564
Number of obs: 24, groups: treatmt:sample, 8
Fixed Effects:
(Intercept) treatmttrt1
3.3391 -0.1072
您正在使用SAS选项NOBOUND
,该选项允许对方差进行负估计,您将得到负估计。这在lmer
中是不可能的,因为lmer将方差限制为正
我们可以尝试手动获取SAS结果。首先,请注意等效的lmer
语法是:
lmer(y ~ 1 + treatment + (1+treatment|sample), REML=FALSE, data = dat)
让我们最大化对数可能性,允许负方差:
dattxt <- "1 trt1 6.426264755
1 trt1 6.95419631
1 trt1 6.64385619
1 trt2 7.348728154
1 trt2 6.247927513
1 trt2 6.491853096
2 trt1 2.807354922
2 trt1 2.584962501
2 trt1 3.584962501
2 trt2 3.906890596
2 trt2 3
2 trt2 3.459431619
3 trt1 2
3 trt1 4.321928095
3 trt1 3.459431619
3 trt2 3.807354922
3 trt2 3
3 trt2 2.807354922
4 trt1 0
4 trt1 0
4 trt1 0
4 trt2 0
4 trt2 0
4 trt2 0
"
dat <- read.table(text = dattxt)
names(dat) <- c("sample", "treatment", "y")
dat$sample <- as.factor(dat$sample)
opts <- options(contrasts = c(factor = "contr.SAS", ordered = "contr.poly"))
library(lme4)
fit <- lmer(y ~ 1 + treatment + (1+treatment|sample), REML=FALSE, data = dat)
# marginal variance matrix in function of variance components
Vfun <- function(fit, vcs){
Z <- getME(fit, "Z")
n <- getME(fit, "n")
l_i <- getME(fit, "l_i")
sigma2_a <- vcs[1]
sigma2_b <- vcs[2]
sigma_ab <- vcs[3]
sigma2 <- vcs[4]
G <- matrix(c(sigma2_a, sigma_ab, sigma_ab, sigma2_b), nrow = 2)
R <- Diagonal(n, sigma2)
Z %*% bdiag(rep(list(G),l_i)) %*% t(Z) + R
}
# minus log-likelihood
library(mvtnorm)
logLHD <- function(params, fit){
X <- getME(fit, "X")
beta <- params[1:ncol(X)]
y <- getME(fit, "y")
vcs <- tail(params, length(params)-ncol(X))
V <- as.matrix(Vfun(fit, vcs))
if(any(eigen(V)$values <= 0)){
return(runif(1, 1e7, 1e8)) # return a high-value if V is not positive
}
-dmvnorm(y, c(X%*%beta), sigma = V, log = TRUE)
}
# optimization of log-likelihood
library(dfoptim)
start <-
c(fixef(fit), vc$sample[1,1], vc$sample[2,2], vc$sample[1,2], sigma(fit)^2)
names(start)[3:6] <-
c("sample.Intercept", "sample.trt1", "covariance", "sigma2")
opt <- hjkb(start, logLHD, lower=c(-Inf,-Inf,-Inf,-Inf,-Inf,0), fit=fit)
### results
opt$par
# (Intercept) treatmenttrt1 sample.Intercept sample.trt1 covariance sigma2
# 3.33912840 -0.10721533 5.50671885 -0.16909628 0.07275635 0.31812378
请注意,对数似然值在负方差情况下确实可以更好地最大化:
### remark: lmer achieves a lower log-likelihood
logLik(fit)
# 'log Lik.' -27.88947 (df=6)
-opt$value
# -26.43355
如果有人能解释一下所需的体操动作,我将不胜感激
编辑
对不起,这不是好的型号。模型是:
lmer(y ~ 1 + treatment + (1|sample/treatment), REML=FALSE, data = dat)
以下是SAS结果:
opts <- options(contrasts = c(factor = "contr.SAS", ordered = "contr.poly"))
library(lme4)
fit <- lmer(y ~ 1+treatment+(1|sample/treatment), REML=FALSE, data = dat)
vc <- VarCorr(fit)
Vfun <- function(fit, vcs){
Z <- getME(fit, "Z")
n <- getME(fit, "n")
l_i <- getME(fit, "l_i")
G <- Diagonal(sum(l_i), rep(vcs[1:2], l_i))
R <- Diagonal(n, vcs[3])
Z %*% G %*% t(Z) + R
}
library(mvtnorm)
logLHD <- function(params, fit){
X <- getME(fit, "X")
beta <- params[1:ncol(X)]
y <- getME(fit, "y")
vcs <- tail(params, length(params)-ncol(X))
V <- as.matrix(Vfun(fit, vcs))
if(any(eigen(V)$values <= 0)) return(runif(1, 1e7, 1e8))
-dmvnorm(y, c(X%*%beta), sigma = V, log = TRUE)
}
library(dfoptim)
start <- c(fixef(fit), vc[[1]], vc[[2]], sigma(fit)^2)
opt <- hjkb(start, logLHD, lower=c(-Inf,-Inf,-Inf,-Inf,0), fit=fit)
opt$par[3:5]
# -0.08454877 5.57947601 0.31812697
选项你读过这个了吗?我在这里没有看到任何编程问题。也许可以帮你。我想知道这是否与对比度有关?从我建议的阅读资料和@agstudy来看,主要的区别在于定义对比度
。作者建议为简便起见定义此选项(对比度=c(factor=“contr.SAS”,ordered=“contr.poly”)
应用此选项后,我能够得到符号系数,其中处理现在为负(-.1072),截距仍然为正(3.3391)。感谢Amsterl和agstudy。我会调查你提供的链接@阿姆斯特尔,你的价值观很接近。我将尝试从您的代码开始。多谢!你好,阿姆斯特尔,非常感谢你抽出时间来帮助我。我感兴趣的是随机效应的方差分量,而不是固定效应的系数。我从SAS获得以下协方差估计值:截距样本==>5.5795治疗样本==>0.08455残差==>0.3181
### remark: lmer achieves a lower log-likelihood
logLik(fit)
# 'log Lik.' -27.88947 (df=6)
-opt$value
# -26.43355
lmer(y ~ 1 + treatment + (1|sample/treatment), REML=FALSE, data = dat)
opts <- options(contrasts = c(factor = "contr.SAS", ordered = "contr.poly"))
library(lme4)
fit <- lmer(y ~ 1+treatment+(1|sample/treatment), REML=FALSE, data = dat)
vc <- VarCorr(fit)
Vfun <- function(fit, vcs){
Z <- getME(fit, "Z")
n <- getME(fit, "n")
l_i <- getME(fit, "l_i")
G <- Diagonal(sum(l_i), rep(vcs[1:2], l_i))
R <- Diagonal(n, vcs[3])
Z %*% G %*% t(Z) + R
}
library(mvtnorm)
logLHD <- function(params, fit){
X <- getME(fit, "X")
beta <- params[1:ncol(X)]
y <- getME(fit, "y")
vcs <- tail(params, length(params)-ncol(X))
V <- as.matrix(Vfun(fit, vcs))
if(any(eigen(V)$values <= 0)) return(runif(1, 1e7, 1e8))
-dmvnorm(y, c(X%*%beta), sigma = V, log = TRUE)
}
library(dfoptim)
start <- c(fixef(fit), vc[[1]], vc[[2]], sigma(fit)^2)
opt <- hjkb(start, logLHD, lower=c(-Inf,-Inf,-Inf,-Inf,0), fit=fit)
opt$par[3:5]
# -0.08454877 5.57947601 0.31812697