约束Newton-Raphson估计

约束Newton-Raphson估计,r,R,我试图在R中使用牛顿-拉斐逊算法来最小化我为一个非常具体的问题编写的对数似然函数。我会诚实地说,评估方法在我的头脑之外,但我知道我所在领域的许多人(心理测量学)使用NR算法进行评估,所以我尝试使用这种方法,至少从一开始。我有一系列嵌套函数,它们返回一个标量作为特定数据向量的对数似然估计: log.likelihoodSL <- function(x,sxdat1,item) { theta <- x[1] rho <- x[2] log.lik <- 0

我试图在
R
中使用牛顿-拉斐逊算法来最小化我为一个非常具体的问题编写的对数似然函数。我会诚实地说,评估方法在我的头脑之外,但我知道我所在领域的许多人(心理测量学)使用NR算法进行评估,所以我尝试使用这种方法,至少从一开始。我有一系列嵌套函数,它们返回一个标量作为特定数据向量的对数似然估计:

log.likelihoodSL <- function(x,sxdat1,item) {
  theta <- x[1]
  rho <- x[2]
  log.lik <- 0
  for (it in 1:length(sxdat1)) {
    val <- as.numeric(sxdat1[it])
    apars <- item[it,1:3]
    cpars <- item[it,4:6]
    log.lik <- log.lik + as.numeric(log.pSL(theta,rho,apars,cpars,val))
  }
  return(log.lik)
}

log.pSL <- function(theta,rho,apars,cpars,val) {
  p <- (rho * e.aSL(theta,apars,cpars,val)) + ((1-rho) * e.nrm(theta,apars,cpars,val))
  log.p <- log(p)
  return(log.p)
}

e.aSL <- function(theta,apars,cpars,val) {
  if (val==1) {
    aprob <- e.nrm(theta,apars,cpars,val)
  } else if (val==2) {
    aprob <- 1 - e.nrm(theta,apars,cpars,val)
  } else
    aprob <- 0
  return(aprob)
}

e.nrm <- function(theta,apars,cpars,val) {
  nprob <- exp(apars*theta + cpars)/sum(exp((apars*theta) + cpars))
  nprob <- nprob[val]
  return(nprob)
}
下面是变量

> item
             V1        V2         V3           V4         V5        V6
 [1,] 0.2494625 0.3785529 -0.6280155 -0.096817808 -0.7549263 0.8517441
 [2,] 0.2023690 0.4582290 -0.6605980 -0.191895013 -0.8391203 1.0310153
 [3,] 0.2044005 0.3019147 -0.5063152 -0.073135691 -0.6061725 0.6793082
 [4,] 0.2233619 0.4371988 -0.6605607 -0.160377714 -0.8233197 0.9836974
 [5,] 0.2257933 0.2851198 -0.5109131 -0.044494872 -0.5970246 0.6415195
 [6,] 0.2047308 0.3438725 -0.5486033 -0.104356236 -0.6693569 0.7737131
 [7,] 0.3402220 0.2724951 -0.6127172  0.050795183 -0.6639092 0.6131140
 [8,] 0.2513672 0.3263046 -0.5776718 -0.056203015 -0.6779823 0.7341853
 [9,] 0.2008285 0.3389165 -0.5397450 -0.103565987 -0.6589961 0.7625621
[10,] 0.2890680 0.2700661 -0.5591341  0.014251386 -0.6219001 0.6076488
[11,] 0.3127214 0.2572715 -0.5699929  0.041587479 -0.6204483 0.5788608
[12,] 0.2697048 0.2965255 -0.5662303 -0.020115553 -0.6470669 0.6671825
[13,] 0.2799978 0.3219374 -0.6019352 -0.031454750 -0.6929045 0.7243592
[14,] 0.2773233 0.2822723 -0.5595956 -0.003711768 -0.6314010 0.6351127
[15,] 0.2433519 0.2632824 -0.5066342 -0.014947878 -0.5774375 0.5923853
[16,] 0.2947281 0.3605812 -0.6553092 -0.049389825 -0.7619178 0.8113076
[17,] 0.2290081 0.3114185 -0.5404266 -0.061807853 -0.6388839 0.7006917
[18,] 0.3824588 0.2543871 -0.6368459  0.096053788 -0.6684247 0.5723709
[19,] 0.2405821 0.3903595 -0.6309416 -0.112333048 -0.7659758 0.8783089
[20,] 0.2424331 0.3028480 -0.5452811 -0.045311136 -0.6360968 0.6814080
我想最小化函数
log.likelion()
的两个参数是θ和ρ,我想约束θ在-3和3之间,ρ在0和1之间,但我不知道如何使用当前设置来实现这一点。有人能帮我吗?我是否需要使用不同于Newton-Raphson方法的估算方法,或者是否有方法使用我当前使用的包
maxLik
中的函数
maxNR
来实现这一点?谢谢


编辑:向量
x
,包含参数θ和ρ的起始值,只是
c(0,0)
,因为这是这些参数的“平均”或“默认”假设(就其实质性解释而言)。

数据以更方便的形式:

sxdat1 <- c(2,1,3,1,3,3,2,2,3,2,2,2,2,2,3,2,3,2,2,2)
item <- matrix(c(
0.2494625,0.3785529,-0.6280155,-0.096817808,-0.7549263,0.8517441,
0.2023690,0.4582290,-0.6605980,-0.191895013,-0.8391203,1.0310153,
0.2044005,0.3019147,-0.5063152,-0.073135691,-0.6061725,0.6793082,
0.2233619,0.4371988,-0.6605607,-0.160377714,-0.8233197,0.9836974,
0.2257933,0.2851198,-0.5109131,-0.044494872,-0.5970246,0.6415195,
0.2047308,0.3438725,-0.5486033,-0.104356236,-0.6693569,0.7737131,
0.3402220,0.2724951,-0.6127172,0.050795183,-0.6639092,0.6131140,
0.2513672,0.3263046,-0.5776718,-0.056203015,-0.6779823,0.7341853,
0.2008285,0.3389165,-0.5397450,-0.103565987,-0.6589961,0.7625621,
0.2890680,0.2700661,-0.5591341,0.014251386,-0.6219001,0.6076488,
0.3127214,0.2572715,-0.5699929,0.041587479,-0.6204483,0.5788608,
0.2697048,0.2965255,-0.5662303,-0.020115553,-0.6470669,0.6671825,
0.2799978,0.3219374,-0.6019352,-0.031454750,-0.6929045,0.7243592,
0.2773233,0.2822723,-0.5595956,-0.003711768,-0.6314010,0.6351127,
0.2433519,0.2632824,-0.5066342,-0.014947878,-0.5774375,0.5923853,
0.2947281,0.3605812,-0.6553092,-0.049389825,-0.7619178,0.8113076,
0.2290081,0.3114185,-0.5404266,-0.061807853,-0.6388839,0.7006917,
0.3824588,0.2543871,-0.6368459,0.096053788,-0.6684247,0.5723709,
0.2405821,0.3903595,-0.6309416,-0.112333048,-0.7659758,0.8783089,
0.2424331,0.3028480,-0.5452811,-0.045311136,-0.6360968,0.6814080),
               byrow=TRUE,ncol=6)
注意
rho
漂移为负值时产生的警告。但是,
maxNR
可以恢复 从中得到一个估计值(θ=-1,ρ=0.63),它位于 可行集
L-BFGS-B
无法处理非有限的中间结果,但边界 让算法远离那些有问题的区域

我选择使用
bbmle
而不是
optim
bbmle
optim
(和其他优化工具)的包装器,它提供了一些特定于似然估计的特性(分析、置信区间、模型之间的似然比测试等)

这里有几点:

  • rho=0.5开始,而不是在边界上
  • rho
    边界稍微设置在[0,1]内(
    L-BFGS-B
    在计算导数的有限差分近似值时并不总是完全尊重边界)
  • data
    参数中指定了输入数据
在本例中,我得到了与
maxNR
相同的结果

 ## Call:
 ## mle2(minuslogl = NLL, start = c(theta = 0, rho = 0.5), 
 ##     method = "L-BFGS-B", data = list(sxdat1 = sxdat1, item = item), 
 ##     lower = c(theta = -3, rho = 0.002), upper = c(theta = 3, 
 ##         rho = 1 - 0.002), control = list(fnscale = -1))
 ## 
 ## Coefficients:
 ##      theta        rho 
 ## -1.0038531  0.6352782 
 ## 
 ## Log-likelihood: -18.11 

除非你真的需要用牛顿-拉斐逊而不是基于梯度的“拟牛顿”方法来做这件事,否则我想这已经足够好了。(听起来你这样做没有很强的技术理由,除了“我所在领域的其他人都这么做”——这是一个很好的理由,所有其他事情都是平等的,但在这种情况下,这还不足以让我在很容易获得类似方法且效果良好的情况下挖掘实现N-R。)

我一直在写自己的可能性,但我总是使用类似的方法:fit=optim(par=c(0.5,0,0),fn=negLL08,method='BFGS',hessian=TRUE)进行估计,其中negLL08是定义对数似然的函数。我承认我不确定如何约束参数。这是一个非常有趣的帖子。考虑使用<代码> optIM/<代码>“BFGS”方法,一种拟牛顿方法。它处理上限和下限。@flodel:
method=“BFGS”
不处理,但
method=“L-BFGS-B”
处理。看起来
maxNR
不处理框约束,而是
library(“sos”);findFn(“Newton-Raphson-box-constraints”)
Bhat
包中查找函数
Newton
。另一方面,
Bhat
已存档(),运行起来可能很痛苦(上一个版本已经3年了)。我确实尝试过使用
method=“L-BFGS-B”
但我发现使用
maxNR没有多大帮助这是一个很好的答案!我真的很期待尝试你的建议,非常感谢!当我试图弄清楚如何使用梯度函数时,我看到了很多关于梯度函数的东西,但我必须承认,我并不真正理解它们是什么,也不知道在我没有一个好的、简洁的小函数的情况下如何实现它们。另外,感谢关于
bbmle
明确定义梯度函数的提示,如果您想加快优化速度或使其更稳健,那么它非常有用,但对于简单的问题,您通常可以不使用它们(并且依赖内置的数值有限差分近似)。对于真正棘手的问题,您可以查看AD Model Builder()
sxdat1 <- c(2,1,3,1,3,3,2,2,3,2,2,2,2,2,3,2,3,2,2,2)
item <- matrix(c(
0.2494625,0.3785529,-0.6280155,-0.096817808,-0.7549263,0.8517441,
0.2023690,0.4582290,-0.6605980,-0.191895013,-0.8391203,1.0310153,
0.2044005,0.3019147,-0.5063152,-0.073135691,-0.6061725,0.6793082,
0.2233619,0.4371988,-0.6605607,-0.160377714,-0.8233197,0.9836974,
0.2257933,0.2851198,-0.5109131,-0.044494872,-0.5970246,0.6415195,
0.2047308,0.3438725,-0.5486033,-0.104356236,-0.6693569,0.7737131,
0.3402220,0.2724951,-0.6127172,0.050795183,-0.6639092,0.6131140,
0.2513672,0.3263046,-0.5776718,-0.056203015,-0.6779823,0.7341853,
0.2008285,0.3389165,-0.5397450,-0.103565987,-0.6589961,0.7625621,
0.2890680,0.2700661,-0.5591341,0.014251386,-0.6219001,0.6076488,
0.3127214,0.2572715,-0.5699929,0.041587479,-0.6204483,0.5788608,
0.2697048,0.2965255,-0.5662303,-0.020115553,-0.6470669,0.6671825,
0.2799978,0.3219374,-0.6019352,-0.031454750,-0.6929045,0.7243592,
0.2773233,0.2822723,-0.5595956,-0.003711768,-0.6314010,0.6351127,
0.2433519,0.2632824,-0.5066342,-0.014947878,-0.5774375,0.5923853,
0.2947281,0.3605812,-0.6553092,-0.049389825,-0.7619178,0.8113076,
0.2290081,0.3114185,-0.5404266,-0.061807853,-0.6388839,0.7006917,
0.3824588,0.2543871,-0.6368459,0.096053788,-0.6684247,0.5723709,
0.2405821,0.3903595,-0.6309416,-0.112333048,-0.7659758,0.8783089,
0.2424331,0.3028480,-0.5452811,-0.045311136,-0.6360968,0.6814080),
               byrow=TRUE,ncol=6)
library(maxLik)
x <- c(0,0)
max1 <- maxNR(log.likelihoodSL,grad=NULL,hess=NULL,start=x,
              print.level=1,sxdat1=sxdat1,item=item)
library(bbmle)

## mle2() wants a NEGATIVE log-likelihood
NLL <- function(x,sxdat1,item) {
    -log.likelihoodSL(x,sxdat1,item)
}
## needed when objective function takes a vector of args rather than
##  separate named arguments:
parnames(NLL) <- c("theta","rho")
(m1 <- mle2(NLL,start=c(theta=0,rho=0.5),method="L-BFGS-B",
     lower=c(theta=-3,rho=2e-3),upper=c(theta=3,rho=1-2e-3),
     data=list(sxdat1=sxdat1,item=item)))
 ## Call:
 ## mle2(minuslogl = NLL, start = c(theta = 0, rho = 0.5), 
 ##     method = "L-BFGS-B", data = list(sxdat1 = sxdat1, item = item), 
 ##     lower = c(theta = -3, rho = 0.002), upper = c(theta = 3, 
 ##         rho = 1 - 0.002), control = list(fnscale = -1))
 ## 
 ## Coefficients:
 ##      theta        rho 
 ## -1.0038531  0.6352782 
 ## 
 ## Log-likelihood: -18.11