scipy优化最小化:hess_inv强烈依赖于初始猜测
我正在使用scipy优化最小化:hess_inv强烈依赖于初始猜测,scipy,mathematical-optimization,minimize,hessian-matrix,Scipy,Mathematical Optimization,Minimize,Hessian Matrix,我正在使用scipy.optimize.minimize来最小化一个简单的对数似然函数。黑森矩阵似乎表现不好 import scipy.optimize as op def lnlike(theta, n, bhat, fhat, sigb, sigf): S, b, f = theta mu = f*S + b scb2 = ((b-bhat)/sigb)**2 scf2 = ((f-fhat)/sigf)**2 return n*np.log(mu)
scipy.optimize.minimize
来最小化一个简单的对数似然函数。黑森矩阵似乎表现不好
import scipy.optimize as op
def lnlike(theta, n, bhat, fhat, sigb, sigf):
S, b, f = theta
mu = f*S + b
scb2 = ((b-bhat)/sigb)**2
scf2 = ((f-fhat)/sigf)**2
return n*np.log(mu) - mu - 0.5*(scb2+scf2)
nll = lambda *args: -lnlike(*args)
myargs=(21.0, 20.0, 0.5, 6.0, 0.1)
如果最初的猜测是最小的,那么迭代不会进行到任何地方。就参数值而言,这很好,但它也不涉及Hessian(仍然是恒等式),所以我不能将其用于不确定性估计
x0 = [2.0, 20.0, 0.5] # initial guess is at the minimum
result = op.minimize(nll, x0, args= myargs)
print result
status: 0
success: True
njev: 1
nfev: 5
hess_inv: array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
fun: -42.934971192191881
x: array([ 2. , 20. , 0.5])
message: 'Optimization terminated successfully.'
jac: array([ 0.00000000e+00, 0.00000000e+00, 9.53674316e-07])
如果我稍微改变一下最初的猜测,它似乎会返回一个合理的hess_inv
x0 = [2.01, 20.0, 0.5]
result = op.minimize(nll, x0, args= myargs)
print result
print np.sqrt(result.hess_inv[0,0])
status: 0
success: True
njev: 15
nfev: 75
hess_inv: array([[ 2.16004477e+02, -7.60588367e+01, -2.94846112e-02],
[ -7.60588367e+01, 3.55748024e+01, 2.74064505e-03],
[ -2.94846112e-02, 2.74064505e-03, 9.98030944e-03]])
fun: -42.934971191969964
x: array([ 1.99984604, 19.9999814 , 0.5000001 ])
message: 'Optimization terminated successfully.'
jac: array([ -2.38418579e-06, -5.24520874e-06, 1.90734863e-06])
14.697090757
然而,hess_inv对最初的猜测非常敏感
x0 = [2.02, 20.0, 0.5]
result = op.minimize(nll, x0, args= myargs)
print result
print np.sqrt(result.hess_inv[0,0])
status: 0
success: True
njev: 16
nfev: 80
hess_inv: array([[ 1.82153214e+02, -6.03482772e+01, -2.97458789e-02],
[ -6.03482772e+01, 3.30771459e+01, -2.53811809e-03],
[ -2.97458789e-02, -2.53811809e-03, 9.99052952e-03]])
fun: -42.934971192188634
x: array([ 1.9999702 , 20.00000354, 0.50000001])
message: 'Optimization terminated successfully.'
jac: array([ -9.53674316e-07, -4.76837158e-07, -4.76837158e-07])
13.4964148462
将最初的猜测再修改一点
x0 = [2.03, 20.0, 0.5]
result = op.minimize(nll, x0, args= myargs)
print result
print np.sqrt(result.hess_inv[0,0])
status: 0
success: True
njev: 14
nfev: 70
hess_inv: array([[ 2.30479371e+02, -7.36087027e+01, -3.79639119e-02],
[ -7.36087027e+01, 3.55785937e+01, 3.54182478e-03],
[ -3.79639119e-02, 3.54182478e-03, 9.97664441e-03]])
fun: -42.93497119204827
x: array([ 1.99975148, 20.00006366, 0.50000009])
message: 'Optimization terminated successfully.'
jac: array([ -9.53674316e-07, -9.53674316e-07, 4.29153442e-06])
15.1815470484
我错过什么了吗?这是一个bug还是一个特性?根据我对优化器的理解,Hessian是用有限差分近似的。对你来说,这似乎不是最好的主意。也许,利用Symphy(在IPython中)将产生更有用的结果:
import sympy as sy
import numpy as np
import scipy.optimize as sopt
from IPython.display import display # nice printing
sy.init_printing() # LaTeX like printing for IPython
def lnlike(theta, n, bhat, fhat, sigb, sigf):
S, b, f = theta
mu = f*S + b
scb2 = ((b-bhat)/sigb)**2
scf2 = ((f-fhat)/sigf)**2
return n*sy.log(mu) - mu - (scb2+scf2) / 2
# declare symbols:
th_S, th_b, th_f = sy.symbols("theta_S, theta_b, theta_f", real=True)
theta = (th_S, th_b, th_f)
n, bhat, fhat = sy.symbols("n, \hat{b}, \hat{f}", real=True )
sigb, sigf = sy.symbols("sigma_b, sigma_d", real=True )
# symbolic optimizaton function:
lf = -lnlike(theta, n, bhat, fhat, sigb, sigf)
# Gradient:
dlf = sy.Matrix([lf.diff(th) for th in theta])
# Hessian:
Hlf = sy.Matrix([dlf.T.diff(th) for th in theta])
print("Symbolic Hessian:")
display(Hlf)
# Make numpy functions:
margs = {n:21, bhat:20, fhat:.5, sigb:6, sigf:.1} # parameters
lf_a, dlf_a, Hlf_a = lf.subs(margs), dlf.subs(margs), Hlf.subs(margs)
lf_lam = sy.lambdify(theta, lf_a, modules="numpy")
dlf_lam = sy.lambdify(theta, dlf_a, modules="numpy")
Hlf_lam = sy.lambdify(theta, Hlf_a, modules="numpy")
nlf = lambda xx: np.array(lf_lam(xx[0], xx[1], xx[2])) # function
ndlf = lambda xx: np.array(dlf_lam(xx[0], xx[1], xx[2])).flatten() # gradient
nHlf = lambda xx: np.array(Hlf_lam(xx[0], xx[1], xx[2])) # Hessian
x0 = [2.02, 20.0, 0.5]
rs = sopt.minimize(nlf, x0, jac=ndlf, hess=nHlf, method='Newton-CG')
print(rs)
print("Hessian:")
print(nHlf(rs.x))
按照我对优化器的理解,Hessian函数是用有限差分近似的。对你来说,这似乎不是最好的主意。也许,利用Symphy(在IPython中)将产生更有用的结果:
import sympy as sy
import numpy as np
import scipy.optimize as sopt
from IPython.display import display # nice printing
sy.init_printing() # LaTeX like printing for IPython
def lnlike(theta, n, bhat, fhat, sigb, sigf):
S, b, f = theta
mu = f*S + b
scb2 = ((b-bhat)/sigb)**2
scf2 = ((f-fhat)/sigf)**2
return n*sy.log(mu) - mu - (scb2+scf2) / 2
# declare symbols:
th_S, th_b, th_f = sy.symbols("theta_S, theta_b, theta_f", real=True)
theta = (th_S, th_b, th_f)
n, bhat, fhat = sy.symbols("n, \hat{b}, \hat{f}", real=True )
sigb, sigf = sy.symbols("sigma_b, sigma_d", real=True )
# symbolic optimizaton function:
lf = -lnlike(theta, n, bhat, fhat, sigb, sigf)
# Gradient:
dlf = sy.Matrix([lf.diff(th) for th in theta])
# Hessian:
Hlf = sy.Matrix([dlf.T.diff(th) for th in theta])
print("Symbolic Hessian:")
display(Hlf)
# Make numpy functions:
margs = {n:21, bhat:20, fhat:.5, sigb:6, sigf:.1} # parameters
lf_a, dlf_a, Hlf_a = lf.subs(margs), dlf.subs(margs), Hlf.subs(margs)
lf_lam = sy.lambdify(theta, lf_a, modules="numpy")
dlf_lam = sy.lambdify(theta, dlf_a, modules="numpy")
Hlf_lam = sy.lambdify(theta, Hlf_a, modules="numpy")
nlf = lambda xx: np.array(lf_lam(xx[0], xx[1], xx[2])) # function
ndlf = lambda xx: np.array(dlf_lam(xx[0], xx[1], xx[2])).flatten() # gradient
nHlf = lambda xx: np.array(Hlf_lam(xx[0], xx[1], xx[2])) # Hessian
x0 = [2.02, 20.0, 0.5]
rs = sopt.minimize(nlf, x0, jac=ndlf, hess=nHlf, method='Newton-CG')
print(rs)
print("Hessian:")
print(nHlf(rs.x))
按照我对优化器的理解,Hessian函数是用有限差分近似的。对你来说,这似乎不是最好的主意。也许,利用Symphy(在IPython中)将产生更有用的结果:
import sympy as sy
import numpy as np
import scipy.optimize as sopt
from IPython.display import display # nice printing
sy.init_printing() # LaTeX like printing for IPython
def lnlike(theta, n, bhat, fhat, sigb, sigf):
S, b, f = theta
mu = f*S + b
scb2 = ((b-bhat)/sigb)**2
scf2 = ((f-fhat)/sigf)**2
return n*sy.log(mu) - mu - (scb2+scf2) / 2
# declare symbols:
th_S, th_b, th_f = sy.symbols("theta_S, theta_b, theta_f", real=True)
theta = (th_S, th_b, th_f)
n, bhat, fhat = sy.symbols("n, \hat{b}, \hat{f}", real=True )
sigb, sigf = sy.symbols("sigma_b, sigma_d", real=True )
# symbolic optimizaton function:
lf = -lnlike(theta, n, bhat, fhat, sigb, sigf)
# Gradient:
dlf = sy.Matrix([lf.diff(th) for th in theta])
# Hessian:
Hlf = sy.Matrix([dlf.T.diff(th) for th in theta])
print("Symbolic Hessian:")
display(Hlf)
# Make numpy functions:
margs = {n:21, bhat:20, fhat:.5, sigb:6, sigf:.1} # parameters
lf_a, dlf_a, Hlf_a = lf.subs(margs), dlf.subs(margs), Hlf.subs(margs)
lf_lam = sy.lambdify(theta, lf_a, modules="numpy")
dlf_lam = sy.lambdify(theta, dlf_a, modules="numpy")
Hlf_lam = sy.lambdify(theta, Hlf_a, modules="numpy")
nlf = lambda xx: np.array(lf_lam(xx[0], xx[1], xx[2])) # function
ndlf = lambda xx: np.array(dlf_lam(xx[0], xx[1], xx[2])).flatten() # gradient
nHlf = lambda xx: np.array(Hlf_lam(xx[0], xx[1], xx[2])) # Hessian
x0 = [2.02, 20.0, 0.5]
rs = sopt.minimize(nlf, x0, jac=ndlf, hess=nHlf, method='Newton-CG')
print(rs)
print("Hessian:")
print(nHlf(rs.x))
按照我对优化器的理解,Hessian函数是用有限差分近似的。对你来说,这似乎不是最好的主意。也许,利用Symphy(在IPython中)将产生更有用的结果:
import sympy as sy
import numpy as np
import scipy.optimize as sopt
from IPython.display import display # nice printing
sy.init_printing() # LaTeX like printing for IPython
def lnlike(theta, n, bhat, fhat, sigb, sigf):
S, b, f = theta
mu = f*S + b
scb2 = ((b-bhat)/sigb)**2
scf2 = ((f-fhat)/sigf)**2
return n*sy.log(mu) - mu - (scb2+scf2) / 2
# declare symbols:
th_S, th_b, th_f = sy.symbols("theta_S, theta_b, theta_f", real=True)
theta = (th_S, th_b, th_f)
n, bhat, fhat = sy.symbols("n, \hat{b}, \hat{f}", real=True )
sigb, sigf = sy.symbols("sigma_b, sigma_d", real=True )
# symbolic optimizaton function:
lf = -lnlike(theta, n, bhat, fhat, sigb, sigf)
# Gradient:
dlf = sy.Matrix([lf.diff(th) for th in theta])
# Hessian:
Hlf = sy.Matrix([dlf.T.diff(th) for th in theta])
print("Symbolic Hessian:")
display(Hlf)
# Make numpy functions:
margs = {n:21, bhat:20, fhat:.5, sigb:6, sigf:.1} # parameters
lf_a, dlf_a, Hlf_a = lf.subs(margs), dlf.subs(margs), Hlf.subs(margs)
lf_lam = sy.lambdify(theta, lf_a, modules="numpy")
dlf_lam = sy.lambdify(theta, dlf_a, modules="numpy")
Hlf_lam = sy.lambdify(theta, Hlf_a, modules="numpy")
nlf = lambda xx: np.array(lf_lam(xx[0], xx[1], xx[2])) # function
ndlf = lambda xx: np.array(dlf_lam(xx[0], xx[1], xx[2])).flatten() # gradient
nHlf = lambda xx: np.array(Hlf_lam(xx[0], xx[1], xx[2])) # Hessian
x0 = [2.02, 20.0, 0.5]
rs = sopt.minimize(nlf, x0, jac=ndlf, hess=nHlf, method='Newton-CG')
print(rs)
print("Hessian:")
print(nHlf(rs.x))
如果使用拟牛顿法,则:
准牛顿方法通过对一个完全幼稚的猜测(通常是恒等式的倍数)应用一系列低秩更新来建立对海森逆的猜测。在某种意义上,使用的低秩更新是使给定方程成立的“最小变化”更新,“最小变化”的含义因所选的拟牛顿法而异。如果你从最小值开始,或者非常接近最小值,那么优化者会很快发现这一点,并且它不会在逼近海森逆时积累太多信息。如果你使用的是拟牛顿法,:
准牛顿方法通过对一个完全幼稚的猜测(通常是恒等式的倍数)应用一系列低秩更新来建立对海森逆的猜测。在某种意义上,使用的低秩更新是使给定方程成立的“最小变化”更新,“最小变化”的含义因所选的拟牛顿法而异。如果你从最小值开始,或者非常接近最小值,那么优化者会很快发现这一点,并且它不会在逼近海森逆时积累太多信息。如果你使用的是拟牛顿法,:
准牛顿方法通过对一个完全幼稚的猜测(通常是恒等式的倍数)应用一系列低秩更新来建立对海森逆的猜测。在某种意义上,使用的低秩更新是使给定方程成立的“最小变化”更新,“最小变化”的含义因所选的拟牛顿法而异。如果你从最小值开始,或者非常接近最小值,那么优化者会很快发现这一点,并且它不会在逼近海森逆时积累太多信息。如果你使用的是拟牛顿法,:
准牛顿方法通过对一个完全幼稚的猜测(通常是恒等式的倍数)应用一系列低秩更新来建立对海森逆的猜测。在某种意义上,使用的低秩更新是使给定方程成立的“最小变化”更新,“最小变化”的含义因所选的拟牛顿法而异。如果你从最小值开始,或者非常接近最小值,那么优化者会很快发现这一点,并且它不会在逼近海森逆时积累太多信息。谢谢。这很好。然而,这似乎只适用于分析差异可用的简单情况。有没有一种方法可以在数值上产生更精确的hessian?只要你能用公式表达你的可能性,我很确定Symphy可以计算hessian,假设它存在的话。如果您想要稳健的数值方法,您需要了解函数的平滑度,以选择适当的微分器。另一种标准技术称为“自动微分”()。对于Python这样的解释语言,我看不到自动区分技术比Sympy(在使用sy.symplify时)的优势。谢谢。这很好。然而,这似乎只适用于分析差异可用的简单情况。有没有一种方法可以在数值上产生更精确的hessian?只要你能用公式表达你的可能性,我很确定Symphy可以计算hessian,假设它存在的话。如果您想要稳健的数值方法,您需要了解函数的平滑度,以选择适当的微分器。另一种标准技术称为“自动微分”()。对于Python这样的解释语言,我看不到自动区分技术比Sympy(在使用sy.symplify时)的优势。谢谢。这很好。然而,这似乎只适用于分析差异可用的简单情况。有没有一种方法可以在数值上产生更精确的hessian?只要你能用公式表达你的可能性,我很确定Symphy可以计算hessian,假设它存在的话。如果您想要稳健的数值方法,您需要了解函数的平滑度,以选择适当的微分器。另一种标准技术称为“自动微分”()。对于Python这样的解释语言,我看不到自动区分技术比Sympy(在使用sy.symplify时)的优势