如何在Python中向GLM添加零和约束?

如何在Python中向GLM添加零和约束?,python,r,numpy,statsmodels,Python,R,Numpy,Statsmodels,我在Python中使用statsmodelglm函数设置了一个模型,但现在我想向模型添加一个零和约束 该模型定义如下: import statsmodels.formula.api as smf model = smf.glm(formula="A ~ B + C + D", data=data, family=sm.families.Poisson()).fit() 在R中,要添加约束,我只需执行以下操作: model <- glm(A ~ B + C + D –1, family=p

我在Python中使用statsmodel
glm
函数设置了一个模型,但现在我想向模型添加一个零和约束

该模型定义如下:

import statsmodels.formula.api as smf
model = smf.glm(formula="A ~ B + C + D", data=data, family=sm.families.Poisson()).fit()
在R中,要添加约束,我只需执行以下操作:

model <- glm(A ~ B + C + D –1, family=poisson(), data=data, contrasts=list(C="contr.sum", D="contr.sum"))

model这里有一个例子来说明
fit\u constrained
,使用高斯族,因为我没有很快找到一个带有分类变量的泊松例子

import pandas
import statsmodels.api as sm
from statsmodels.formula.api import glm

url = 'http://www.ats.ucla.edu/stat/data/hsb2.csv'
hsb2 = pandas.read_table(url, delimiter=",")

mod = glm("write ~ C(race) - 1", data=hsb2)
res = mod.fit()
print(res.summary())
所有系数相加为零的约束

res_c = mod.fit_constrained('C(race)[1] + C(race)[2] + C(race)[3] + C(race)[4] = 0')
print(res_c.summary())

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                  write   No. Observations:                  200
Model:                            GLM   Df Residuals:                      197
Model Family:                Gaussian   Df Model:                            2
Link Function:               identity   Scale:                   1232.08314649
Method:                          IRLS   Log-Likelihood:                -993.41
Date:                Wed, 25 Mar 2015   Deviance:                   2.4149e+05
Time:                        16:42:37   Pearson chi2:                 2.41e+05
No. Iterations:                     1                                         
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
C(race)[1]     1.0002    221.565      0.005      0.996      -433.260   435.260
C(race)[2]   -41.1814    267.253     -0.154      0.878      -564.988   482.626
C(race)[3]    -6.3498    235.771     -0.027      0.979      -468.453   455.754
C(race)[4]    46.5311    100.184      0.464      0.642      -149.827   242.889
==============================================================================

Model has been estimated subject to linear equality constraints.
约束以逗号分隔,默认为等于零:

res_c2 = mod.fit_constrained('C(race)[1] + C(race)[2], C(race)[3] + C(race)[4]')
print(res_c2.summary())
最后的指纹

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                  write   No. Observations:                  200
Model:                            GLM   Df Residuals:                      198
Model Family:                Gaussian   Df Model:                            1
Link Function:               identity   Scale:                   1438.99574167
Method:                          IRLS   Log-Likelihood:                -1008.9
Date:                Wed, 25 Mar 2015   Deviance:                   2.8204e+05
Time:                        16:42:37   Pearson chi2:                 2.82e+05
No. Iterations:                     1                                         
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
C(race)[1]    13.6286    242.003      0.056      0.955      -460.689   487.946
C(race)[2]   -13.6286    242.003     -0.056      0.955      -487.946   460.689
C(race)[3]   -41.6606    111.458     -0.374      0.709      -260.115   176.794
C(race)[4]    41.6606    111.458      0.374      0.709      -176.794   260.115
==============================================================================

Model has been estimated subject to linear equality constraints.

我不确定patsy公式是如何工作的,因此如果存在多个分类解释变量,则不会降低任何级别。

这是否会将C和D的和分别强制为零?和对比度编码是否适用于此<代码>拟合约束
将用于此,并将转换设计矩阵。我需要检查,但我认为您需要的是一个包含两行的限制矩阵,在相应的列中有
1
,一行用于
C
级别,另一行用于
D
级别。嗯。。。您发布的内容似乎可以正常工作,但我遇到了一个错误,它甚至阻止了模型的创建。如果模型是“write~race-1”,我会得到
AttributeError:'GLM'对象没有属性“fit\u constrated”
,我会尝试使用“race=1”表示fit\u constrated。我得到了一个错误:数组不能为空John,在这种情况下没有什么可估计的。statsmodels中的模型不是针对这种情况设计的。在特定情况下,通常空数组行为将主要从numpy继承,这可能会得到支持。