Python 将statsmodels中的数据作为GLM打开并建模
在python中,我将数据作为x和y变量存储为列表。如何将其导入python以在statsmodels中运行Python 将statsmodels中的数据作为GLM打开并建模,python,statistics,statsmodels,Python,Statistics,Statsmodels,在python中,我将数据作为x和y变量存储为列表。如何将其导入python以在statsmodels中运行 from __future__ import print_function import statsmodels.api as sm import statsmodels.formula.api as smf import pandas as pd x = [1,1,2,3] y=[1,0,0,0] data = pd.DataFrame(x,y) #to merge the two
from __future__ import print_function
import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd
x = [1,1,2,3]
y=[1,0,0,0]
data = pd.DataFrame(x,y) #to merge the two side by side
star98 = sm.datasets.star98.load_pandas().data
formula = 'x ~ y'
pd.options.mode.chained_assignment = None # default='warn'
mod1 = smf.glm(formula=formula, data=data, family=sm.families.Binomial()).fit()
x = mod1.summary()
ValueError:对偏差函数的第一次猜测返回了一个nan。这可能是一个边界问题,应该报告您有几个小问题。首先,您构建数据的方式,
y
实际上被解释为数据帧的索引:
In [3]:
x = [1,1,2,3]
y=[1,0,0,0]
data = pd.DataFrame(x,y) #to merge the two side by side
data
Out[3]:
0
1 1
0 1
0 2
0 3
相反,您必须以列的形式传递它们,并确保它们获得列名;使用字典可能更容易:
In [13]:
x = [1,1,2,3]
y = [1,0,0,0]
data = pd.DataFrame({'x' : x, 'y' : y}) #to merge the two side by side
data
Out[13]:
x y
0 1 1
1 1 0
2 2 0
3 3 0
其次,您的公式是错误的(因为我猜您试图从x
中的数据对y
进行分类),应该是
formula = 'y ~ x'
如果您将其与其余代码相匹配,您将获得更好的结果
In [21]:
x
Out[21]:
Generalized Linear Model Regression Results
Dep. Variable: y No. Observations: 4
Model: GLM Df Residuals: 2
Model Family: Binomial Df Model: 1
Link Function: logit Scale: 1.0
Method: IRLS Log-Likelihood: -1.3863
Date: Mon, 28 Mar 2016 Deviance: 2.7726
Time: 15:34:32 Pearson chi2: 2.00
No. Iterations: 22
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 22.1423 3.9e+04 0.001 1.000 -7.64e+04 7.64e+04
x -22.1423 3.9e+04 -0.001 1.000 -7.64e+04 7.64e+04
希望有帮助。您遇到了一些小问题。首先,您构建数据的方式,
y
实际上被解释为数据帧的索引:
In [3]:
x = [1,1,2,3]
y=[1,0,0,0]
data = pd.DataFrame(x,y) #to merge the two side by side
data
Out[3]:
0
1 1
0 1
0 2
0 3
相反,您必须以列的形式传递它们,并确保它们获得列名;使用字典可能更容易:
In [13]:
x = [1,1,2,3]
y = [1,0,0,0]
data = pd.DataFrame({'x' : x, 'y' : y}) #to merge the two side by side
data
Out[13]:
x y
0 1 1
1 1 0
2 2 0
3 3 0
其次,您的公式是错误的(因为我猜您试图从x
中的数据对y
进行分类),应该是
formula = 'y ~ x'
如果您将其与其余代码相匹配,您将获得更好的结果
In [21]:
x
Out[21]:
Generalized Linear Model Regression Results
Dep. Variable: y No. Observations: 4
Model: GLM Df Residuals: 2
Model Family: Binomial Df Model: 1
Link Function: logit Scale: 1.0
Method: IRLS Log-Likelihood: -1.3863
Date: Mon, 28 Mar 2016 Deviance: 2.7726
Time: 15:34:32 Pearson chi2: 2.00
No. Iterations: 22
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 22.1423 3.9e+04 0.001 1.000 -7.64e+04 7.64e+04
x -22.1423 3.9e+04 -0.001 1.000 -7.64e+04 7.64e+04
希望能有帮助