Python中的回归
尝试通过熊猫和Stats模型进行逻辑回归。我不知道为什么会出错,也不知道如何修复Python中的回归,python,pandas,regression,statsmodels,Python,Pandas,Regression,Statsmodels,尝试通过熊猫和Stats模型进行逻辑回归。我不知道为什么会出错,也不知道如何修复 import pandas as pd import statsmodels.api as sm x = [1, 3, 5, 6, 8] y = [0, 1, 0, 1, 1] d = { "x": pd.Series(x), "y": pd.Series(y)} df = pd.DataFrame(d) model = "y ~ x" glm = sm.Logit(model, df=df).fit() 错误
import pandas as pd
import statsmodels.api as sm
x = [1, 3, 5, 6, 8]
y = [0, 1, 0, 1, 1]
d = { "x": pd.Series(x), "y": pd.Series(y)}
df = pd.DataFrame(d)
model = "y ~ x"
glm = sm.Logit(model, df=df).fit()
错误:
Traceback (most recent call last):
File "regress.py", line 45, in <module>
glm = sm.Logit(model, df=df).fit()
TypeError: __init__() takes exactly 3 arguments (2 given)
回溯(最近一次呼叫最后一次):
文件“regresse.py”,第45行,在
glm=sm.Logit(model,df=df).fit()
TypeError:\uuuu init\uuuuuu()正好接受3个参数(给定2个)
您不能将公式传递给Logit
。做:
In [82]: import patsy
In [83]: f = 'y ~ x'
In [84]: y, X = patsy.dmatrices(f, df, return_type='dataframe')
In [85]: sm.Logit(y, X).fit().summary()
Optimization terminated successfully.
Current function value: 0.511631
Iterations 6
Out[85]:
<class 'statsmodels.iolib.summary.Summary'>
"""
Logit Regression Results
==============================================================================
Dep. Variable: y No. Observations: 5
Model: Logit Df Residuals: 3
Method: MLE Df Model: 1
Date: Fri, 30 Aug 2013 Pseudo R-squ.: 0.2398
Time: 16:56:38 Log-Likelihood: -2.5582
converged: True LL-Null: -3.3651
LLR p-value: 0.2040
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept -2.0544 2.452 -0.838 0.402 -6.861 2.752
x 0.5672 0.528 1.073 0.283 -0.468 1.603
==============================================================================
"""
无法将公式传递给
Logit
。做:
In [82]: import patsy
In [83]: f = 'y ~ x'
In [84]: y, X = patsy.dmatrices(f, df, return_type='dataframe')
In [85]: sm.Logit(y, X).fit().summary()
Optimization terminated successfully.
Current function value: 0.511631
Iterations 6
Out[85]:
<class 'statsmodels.iolib.summary.Summary'>
"""
Logit Regression Results
==============================================================================
Dep. Variable: y No. Observations: 5
Model: Logit Df Residuals: 3
Method: MLE Df Model: 1
Date: Fri, 30 Aug 2013 Pseudo R-squ.: 0.2398
Time: 16:56:38 Log-Likelihood: -2.5582
converged: True LL-Null: -3.3651
LLR p-value: 0.2040
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept -2.0544 2.452 -0.838 0.402 -6.861 2.752
x 0.5672 0.528 1.073 0.283 -0.468 1.603
==============================================================================
"""
或者使用公式函数
将statsmodels.api导入为smf
,然后使用smf.logit(公式…),感谢Phillip提供了正确的答案。我的评论太快了。我想编写将statsmodels.formula.api导入为smf
,它还可以访问公式接口的快捷、小写函数。这些只是模型的from_formula
方法的方便包装,例如sm.Logit.from_formula
如何定义引用类别?这不起作用:f='C(y,治疗(0))~x'或使用公式函数将statsmodels.api导入为smf
,然后使用smf.logit(公式…)感谢Phillip提供了正确的答案。我的评论太快了。我想编写将statsmodels.formula.api导入为smf
,它还可以访问公式接口的快捷、小写函数。这些只是模型的from_formula
方法的方便包装,例如sm.Logit.from_formula
如何定义引用类别?这不起作用:f='C(y,治疗(0))~x'