如何在Python中向GLM添加零和约束?
我在Python中使用statsmodel如何在Python中向GLM添加零和约束?,python,r,numpy,statsmodels,Python,R,Numpy,Statsmodels,我在Python中使用statsmodelglm函数设置了一个模型,但现在我想向模型添加一个零和约束 该模型定义如下: import statsmodels.formula.api as smf model = smf.glm(formula="A ~ B + C + D", data=data, family=sm.families.Poisson()).fit() 在R中,要添加约束,我只需执行以下操作: model <- glm(A ~ B + C + D –1, family=p
glm
函数设置了一个模型,但现在我想向模型添加一个零和约束
该模型定义如下:
import statsmodels.formula.api as smf
model = smf.glm(formula="A ~ B + C + D", data=data, family=sm.families.Poisson()).fit()
在R中,要添加约束,我只需执行以下操作:
model <- glm(A ~ B + C + D –1, family=poisson(), data=data, contrasts=list(C="contr.sum", D="contr.sum"))
model这里有一个例子来说明fit\u constrained
,使用高斯族,因为我没有很快找到一个带有分类变量的泊松例子
import pandas
import statsmodels.api as sm
from statsmodels.formula.api import glm
url = 'http://www.ats.ucla.edu/stat/data/hsb2.csv'
hsb2 = pandas.read_table(url, delimiter=",")
mod = glm("write ~ C(race) - 1", data=hsb2)
res = mod.fit()
print(res.summary())
所有系数相加为零的约束
res_c = mod.fit_constrained('C(race)[1] + C(race)[2] + C(race)[3] + C(race)[4] = 0')
print(res_c.summary())
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: write No. Observations: 200
Model: GLM Df Residuals: 197
Model Family: Gaussian Df Model: 2
Link Function: identity Scale: 1232.08314649
Method: IRLS Log-Likelihood: -993.41
Date: Wed, 25 Mar 2015 Deviance: 2.4149e+05
Time: 16:42:37 Pearson chi2: 2.41e+05
No. Iterations: 1
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
C(race)[1] 1.0002 221.565 0.005 0.996 -433.260 435.260
C(race)[2] -41.1814 267.253 -0.154 0.878 -564.988 482.626
C(race)[3] -6.3498 235.771 -0.027 0.979 -468.453 455.754
C(race)[4] 46.5311 100.184 0.464 0.642 -149.827 242.889
==============================================================================
Model has been estimated subject to linear equality constraints.
约束以逗号分隔,默认为等于零:
res_c2 = mod.fit_constrained('C(race)[1] + C(race)[2], C(race)[3] + C(race)[4]')
print(res_c2.summary())
最后的指纹
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: write No. Observations: 200
Model: GLM Df Residuals: 198
Model Family: Gaussian Df Model: 1
Link Function: identity Scale: 1438.99574167
Method: IRLS Log-Likelihood: -1008.9
Date: Wed, 25 Mar 2015 Deviance: 2.8204e+05
Time: 16:42:37 Pearson chi2: 2.82e+05
No. Iterations: 1
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
C(race)[1] 13.6286 242.003 0.056 0.955 -460.689 487.946
C(race)[2] -13.6286 242.003 -0.056 0.955 -487.946 460.689
C(race)[3] -41.6606 111.458 -0.374 0.709 -260.115 176.794
C(race)[4] 41.6606 111.458 0.374 0.709 -176.794 260.115
==============================================================================
Model has been estimated subject to linear equality constraints.
我不确定patsy公式是如何工作的,因此如果存在多个分类解释变量,则不会降低任何级别。这是否会将C和D的和分别强制为零?和对比度编码是否适用于此<代码>拟合约束
将用于此,并将转换设计矩阵。我需要检查,但我认为您需要的是一个包含两行的限制矩阵,在相应的列中有1
,一行用于C
级别,另一行用于D
级别。嗯。。。您发布的内容似乎可以正常工作,但我遇到了一个错误,它甚至阻止了模型的创建。如果模型是“write~race-1”,我会得到AttributeError:'GLM'对象没有属性“fit\u constrated”
,我会尝试使用“race=1”表示fit\u constrated。我得到了一个错误:数组不能为空John,在这种情况下没有什么可估计的。statsmodels中的模型不是针对这种情况设计的。在特定情况下,通常空数组行为将主要从numpy继承,这可能会得到支持。