SAS proc在R中的等效混合

SAS proc在R中的等效混合,r,sas,lme4,mixed-models,R,Sas,Lme4,Mixed Models,我正在尝试在R中转换以下SAS代码,以获得与从SAS获得的结果相同的结果。以下是SAS代码: DATA plants; INPUT sample $ treatmt $ y ; cards; 1 trt1 6.426264755 1 trt1 6.95419631 1 trt1 6.64385619 1 trt2 7.348728154 1 trt2 6.2479

我正在尝试在R中转换以下SAS代码,以获得与从SAS获得的结果相同的结果。以下是SAS代码:

    DATA plants; 
    INPUT  sample $  treatmt $ y ; 
    cards; 

    1   trt1    6.426264755 
    1   trt1    6.95419631 
    1   trt1    6.64385619 
    1   trt2    7.348728154 
    1   trt2    6.247927513 
    1   trt2    6.491853096 
    2   trt1    2.807354922 
    2   trt1    2.584962501 
    2   trt1    3.584962501 
    2   trt2    3.906890596 
    2   trt2    3 
    2   trt2    3.459431619 
    3   trt1    2 
    3   trt1    4.321928095 
    3   trt1    3.459431619 
    3   trt2    3.807354922 
    3   trt2    3 
    3   trt2    2.807354922 
    4   trt1    0 
    4   trt1    0 
    4   trt1    0 
    4   trt2    0 
    4   trt2    0 
    4   trt2    0 
    ; 
    RUN; 

    PROC MIXED ASYCOV NOBOUND  DATA=plants ALPHA=0.05 method=ML; 
    CLASS sample treatmt; 
    MODEL  y = treatmt ; 
    RANDOM int treatmt/ subject=sample ; 
    RUN; 
我从SAS获得以下协方差估计值:

我在R中尝试了以下方法,但得到了不同的结果

s=as.factor(sample) 
lmer(y~ 1+treatmt+(1|treatmt:s),REML=FALSE) 

我不知道您是否能够从SAS到R获得准确的结果,但我能够通过处理
对比度来接近,如下所述:

:第6页

比较SAS PROC MIXED和lmer one得出的估计值时 必须仔细考虑用于定义 各种因素的影响。在SAS中,具有截距和定性参数的模型 根据截距和指示器定义系数 除最后一级因子外的所有变量。默认值 S中的行为是使用Helmert对比作为因子。在 这些平衡因子提供了一组正交对比。在R 默认是“处理”对比,几乎与 SAS参数化,但它们会删除第一个 关卡,不是最后一关。当有疑问时,检查哪些对比度是正确的 与对比度功能一起使用。为了便于比较, 你可能觉得值得申报

选项(对比度=c(factor=“contr.SAS”,ordered=“contr.poly”)
在会话开始时

dput:

df <- structure(list(sample = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), 
    treatmt = c("trt1", "trt1", "trt1", "trt2", "trt2", "trt2", 
    "trt1", "trt1", "trt1", "trt2", "trt2", "trt2", "trt1", "trt1", 
    "trt1", "trt2", "trt2", "trt2", "trt1", "trt1", "trt1", "trt2", 
    "trt2", "trt2"), y = c(6.426264755, 6.95419631, 6.64385619, 
    7.348728154, 6.247927513, 6.491853096, 2.807354922, 2.584962501, 
    3.584962501, 3.906890596, 3, 3.459431619, 2, 4.321928095, 
    3.459431619, 3.807354922, 3, 2.807354922, 0, 0, 0, 0, 0, 
    0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-24L), .Names = c("sample", "treatmt", "y"))
电流输出:

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y ~ 1 + treatmt + (1 | treatmt:sample)
   Data: df
     AIC      BIC   logLik deviance df.resid 
 80.3564  85.0686 -36.1782  72.3564       20 
Random effects:
 Groups         Name        Std.Dev.
 treatmt:sample (Intercept) 2.344   
 Residual                   0.564   
Number of obs: 24, groups:  treatmt:sample, 8
Fixed Effects:
(Intercept)  treatmttrt1  
     3.3391      -0.1072  

您正在使用SAS选项
NOBOUND
,该选项允许对方差进行负估计,您将得到负估计。这在
lmer
中是不可能的,因为lmer将方差限制为正

我们可以尝试手动获取SAS结果。首先,请注意等效的
lmer
语法是:

lmer(y ~ 1 + treatment + (1+treatment|sample), REML=FALSE, data = dat)
让我们最大化对数可能性,允许负方差:

dattxt <- "1 trt1  6.426264755 
1 trt1  6.95419631 
1 trt1  6.64385619 
1 trt2  7.348728154 
1 trt2  6.247927513 
1 trt2  6.491853096 
2 trt1  2.807354922 
2 trt1  2.584962501 
2 trt1  3.584962501 
2 trt2  3.906890596 
2 trt2  3 
2 trt2  3.459431619 
3 trt1  2 
3 trt1  4.321928095 
3 trt1  3.459431619 
3 trt2  3.807354922 
3 trt2  3 
3 trt2  2.807354922 
4 trt1  0 
4 trt1  0 
4 trt1  0 
4 trt2  0 
4 trt2  0 
4 trt2  0 
"

dat <- read.table(text = dattxt)
names(dat) <- c("sample", "treatment", "y")
dat$sample <- as.factor(dat$sample)

opts <- options(contrasts = c(factor = "contr.SAS", ordered = "contr.poly"))

library(lme4)
fit <- lmer(y ~ 1 + treatment + (1+treatment|sample), REML=FALSE, data = dat) 

# marginal variance matrix in function of variance components
Vfun <- function(fit, vcs){
  Z <- getME(fit, "Z")
  n <- getME(fit, "n")
  l_i <- getME(fit, "l_i")
  sigma2_a <- vcs[1]
  sigma2_b <- vcs[2]
  sigma_ab <- vcs[3]
  sigma2 <- vcs[4]
  G <- matrix(c(sigma2_a, sigma_ab, sigma_ab, sigma2_b), nrow = 2)
  R <- Diagonal(n, sigma2)
  Z %*% bdiag(rep(list(G),l_i)) %*% t(Z) + R
}


# minus log-likelihood
library(mvtnorm)
logLHD <- function(params, fit){
  X <- getME(fit, "X")
  beta <- params[1:ncol(X)]
  y <- getME(fit, "y")
  vcs <- tail(params, length(params)-ncol(X))
  V <- as.matrix(Vfun(fit, vcs))
  if(any(eigen(V)$values <= 0)){
    return(runif(1, 1e7, 1e8)) # return a high-value if V is not positive
  }
  -dmvnorm(y, c(X%*%beta), sigma = V, log = TRUE)  
}

# optimization of log-likelihood
library(dfoptim)
start <- 
  c(fixef(fit), vc$sample[1,1], vc$sample[2,2], vc$sample[1,2], sigma(fit)^2)
names(start)[3:6] <- 
  c("sample.Intercept", "sample.trt1", "covariance", "sigma2")
opt <- hjkb(start, logLHD, lower=c(-Inf,-Inf,-Inf,-Inf,-Inf,0), fit=fit)

### results 
opt$par
# (Intercept) treatmenttrt1 sample.Intercept  sample.trt1 covariance     sigma2 
# 3.33912840    -0.10721533       5.50671885  -0.16909628 0.07275635 0.31812378 
请注意,对数似然值在负方差情况下确实可以更好地最大化:

### remark: lmer achieves a lower log-likelihood
logLik(fit)
# 'log Lik.' -27.88947 (df=6)
-opt$value
# -26.43355
如果有人能解释一下所需的体操动作,我将不胜感激


编辑 对不起,这不是好的型号。模型是:

lmer(y ~ 1 + treatment + (1|sample/treatment), REML=FALSE, data = dat)
以下是SAS结果:

opts <- options(contrasts = c(factor = "contr.SAS", ordered = "contr.poly"))
library(lme4)
fit <- lmer(y ~ 1+treatment+(1|sample/treatment), REML=FALSE, data = dat) 
vc <- VarCorr(fit)

Vfun <- function(fit, vcs){
  Z <- getME(fit, "Z")
  n <- getME(fit, "n")
  l_i <- getME(fit, "l_i")
  G <- Diagonal(sum(l_i), rep(vcs[1:2], l_i))
  R <- Diagonal(n, vcs[3])
  Z %*% G %*% t(Z) + R
}

library(mvtnorm)
logLHD <- function(params, fit){
  X <- getME(fit, "X")
  beta <- params[1:ncol(X)]
  y <- getME(fit, "y")
  vcs <- tail(params, length(params)-ncol(X))
  V <- as.matrix(Vfun(fit, vcs))
  if(any(eigen(V)$values <= 0)) return(runif(1, 1e7, 1e8))
  -dmvnorm(y, c(X%*%beta), sigma = V, log = TRUE)  
}

library(dfoptim)
start <- c(fixef(fit), vc[[1]], vc[[2]], sigma(fit)^2)
opt <- hjkb(start, logLHD, lower=c(-Inf,-Inf,-Inf,-Inf,0), fit=fit)
opt$par[3:5]
# -0.08454877    5.57947601    0.31812697 

选项你读过这个了吗?我在这里没有看到任何编程问题。也许可以帮你。我想知道这是否与对比度有关?从我建议的阅读资料和@agstudy来看,主要的区别在于定义
对比度
。作者建议为简便起见定义此选项(对比度=c(factor=“contr.SAS”,ordered=“contr.poly”)
应用此选项后,我能够得到符号系数,其中处理现在为负(-.1072),截距仍然为正(3.3391)。感谢Amsterl和agstudy。我会调查你提供的链接@阿姆斯特尔,你的价值观很接近。我将尝试从您的代码开始。多谢!你好,阿姆斯特尔,非常感谢你抽出时间来帮助我。我感兴趣的是随机效应的方差分量,而不是固定效应的系数。我从SAS获得以下协方差估计值:截距样本==>5.5795治疗样本==>0.08455残差==>0.3181
### remark: lmer achieves a lower log-likelihood
logLik(fit)
# 'log Lik.' -27.88947 (df=6)
-opt$value
# -26.43355
lmer(y ~ 1 + treatment + (1|sample/treatment), REML=FALSE, data = dat)
opts <- options(contrasts = c(factor = "contr.SAS", ordered = "contr.poly"))
library(lme4)
fit <- lmer(y ~ 1+treatment+(1|sample/treatment), REML=FALSE, data = dat) 
vc <- VarCorr(fit)

Vfun <- function(fit, vcs){
  Z <- getME(fit, "Z")
  n <- getME(fit, "n")
  l_i <- getME(fit, "l_i")
  G <- Diagonal(sum(l_i), rep(vcs[1:2], l_i))
  R <- Diagonal(n, vcs[3])
  Z %*% G %*% t(Z) + R
}

library(mvtnorm)
logLHD <- function(params, fit){
  X <- getME(fit, "X")
  beta <- params[1:ncol(X)]
  y <- getME(fit, "y")
  vcs <- tail(params, length(params)-ncol(X))
  V <- as.matrix(Vfun(fit, vcs))
  if(any(eigen(V)$values <= 0)) return(runif(1, 1e7, 1e8))
  -dmvnorm(y, c(X%*%beta), sigma = V, log = TRUE)  
}

library(dfoptim)
start <- c(fixef(fit), vc[[1]], vc[[2]], sigma(fit)^2)
opt <- hjkb(start, logLHD, lower=c(-Inf,-Inf,-Inf,-Inf,0), fit=fit)
opt$par[3:5]
# -0.08454877    5.57947601    0.31812697