Python 将statsmodels中的数据作为GLM打开并建模

Python 将statsmodels中的数据作为GLM打开并建模,python,statistics,statsmodels,Python,Statistics,Statsmodels,在python中,我将数据作为x和y变量存储为列表。如何将其导入python以在statsmodels中运行 from __future__ import print_function import statsmodels.api as sm import statsmodels.formula.api as smf import pandas as pd x = [1,1,2,3] y=[1,0,0,0] data = pd.DataFrame(x,y) #to merge the two

在python中,我将数据作为x和y变量存储为列表。如何将其导入python以在statsmodels中运行

from __future__ import print_function
import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd

x = [1,1,2,3]
y=[1,0,0,0]
data  = pd.DataFrame(x,y) #to merge the two side by side

star98 = sm.datasets.star98.load_pandas().data

formula = 'x ~ y'


pd.options.mode.chained_assignment = None  # default='warn'


mod1 = smf.glm(formula=formula, data=data, family=sm.families.Binomial()).fit()

x = mod1.summary()

ValueError:对偏差函数的第一次猜测返回了一个nan。这可能是一个边界问题,应该报告

您有几个小问题。首先,您构建数据的方式,
y
实际上被解释为数据帧的索引:

In [3]:
    x = [1,1,2,3]
    y=[1,0,0,0]
    data  = pd.DataFrame(x,y) #to merge the two side by side
    data
Out[3]:
    0
1   1
0   1
0   2
0   3
相反,您必须以列的形式传递它们,并确保它们获得列名;使用字典可能更容易:

In [13]:
    x = [1,1,2,3]
    y = [1,0,0,0]
    data = pd.DataFrame({'x' : x, 'y' : y}) #to merge the two side by side
    data
Out[13]:
    x   y
0   1   1
1   1   0
2   2   0
3   3   0
其次,您的公式是错误的(因为我猜您试图从
x
中的数据对
y
进行分类),应该是

formula = 'y ~ x'
如果您将其与其余代码相匹配,您将获得更好的结果

In [21]:
    x
Out[21]:
Generalized Linear Model Regression Results
Dep. Variable:  y   No. Observations:   4
Model:  GLM Df Residuals:   2
Model Family:   Binomial    Df Model:   1
Link Function:  logit   Scale:  1.0
Method: IRLS    Log-Likelihood: -1.3863
Date:   Mon, 28 Mar 2016    Deviance:   2.7726
Time:   15:34:32    Pearson chi2:   2.00
No. Iterations: 22      
coef    std err z   P>|z|   [95.0% Conf. Int.]
Intercept   22.1423 3.9e+04 0.001   1.000   -7.64e+04 7.64e+04
x   -22.1423    3.9e+04 -0.001  1.000   -7.64e+04 7.64e+04

希望有帮助。

您遇到了一些小问题。首先,您构建数据的方式,
y
实际上被解释为数据帧的索引:

In [3]:
    x = [1,1,2,3]
    y=[1,0,0,0]
    data  = pd.DataFrame(x,y) #to merge the two side by side
    data
Out[3]:
    0
1   1
0   1
0   2
0   3
相反,您必须以列的形式传递它们,并确保它们获得列名;使用字典可能更容易:

In [13]:
    x = [1,1,2,3]
    y = [1,0,0,0]
    data = pd.DataFrame({'x' : x, 'y' : y}) #to merge the two side by side
    data
Out[13]:
    x   y
0   1   1
1   1   0
2   2   0
3   3   0
其次,您的公式是错误的(因为我猜您试图从
x
中的数据对
y
进行分类),应该是

formula = 'y ~ x'
如果您将其与其余代码相匹配,您将获得更好的结果

In [21]:
    x
Out[21]:
Generalized Linear Model Regression Results
Dep. Variable:  y   No. Observations:   4
Model:  GLM Df Residuals:   2
Model Family:   Binomial    Df Model:   1
Link Function:  logit   Scale:  1.0
Method: IRLS    Log-Likelihood: -1.3863
Date:   Mon, 28 Mar 2016    Deviance:   2.7726
Time:   15:34:32    Pearson chi2:   2.00
No. Iterations: 22      
coef    std err z   P>|z|   [95.0% Conf. Int.]
Intercept   22.1423 3.9e+04 0.001   1.000   -7.64e+04 7.64e+04
x   -22.1423    3.9e+04 -0.001  1.000   -7.64e+04 7.64e+04
希望能有帮助