Python中的回归_Python_Pandas_Regression_Statsmodels

Python中的回归

python pandas

Python中的回归,python,pandas,regression,statsmodels,Python,Pandas,Regression,Statsmodels,尝试通过熊猫和Stats模型进行逻辑回归。我不知道为什么会出错，也不知道如何修复 import pandas as pd import statsmodels.api as sm x = [1, 3, 5, 6, 8] y = [0, 1, 0, 1, 1] d = { "x": pd.Series(x), "y": pd.Series(y)} df = pd.DataFrame(d) model = "y ~ x" glm = sm.Logit(model, df=df).fit() 错误

尝试通过熊猫和Stats模型进行逻辑回归。我不知道为什么会出错，也不知道如何修复

import pandas as pd
import statsmodels.api as sm
x = [1, 3, 5, 6, 8]
y = [0, 1, 0, 1, 1]
d = { "x": pd.Series(x), "y": pd.Series(y)}
df = pd.DataFrame(d)

model = "y ~ x"
glm = sm.Logit(model, df=df).fit()

错误：

Traceback (most recent call last):
  File "regress.py", line 45, in <module>
    glm = sm.Logit(model, df=df).fit()
TypeError: __init__() takes exactly 3 arguments (2 given)

回溯（最近一次呼叫最后一次）：
文件“regresse.py”，第45行，在
glm=sm.Logit（model，df=df）.fit（）
TypeError:\uuuu init\uuuuuu（）正好接受3个参数（给定2个）

您不能将公式传递给

Logit

。做：

In [82]: import patsy

In [83]: f = 'y ~ x'

In [84]: y, X = patsy.dmatrices(f, df, return_type='dataframe')

In [85]: sm.Logit(y, X).fit().summary()
Optimization terminated successfully.
         Current function value: 0.511631
         Iterations 6
Out[85]:
<class 'statsmodels.iolib.summary.Summary'>
"""
                           Logit Regression Results
==============================================================================
Dep. Variable:                      y   No. Observations:                    5
Model:                          Logit   Df Residuals:                        3
Method:                           MLE   Df Model:                            1
Date:                Fri, 30 Aug 2013   Pseudo R-squ.:                  0.2398
Time:                        16:56:38   Log-Likelihood:                -2.5582
converged:                       True   LL-Null:                       -3.3651
                                        LLR p-value:                    0.2040
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept     -2.0544      2.452     -0.838      0.402        -6.861     2.752
x              0.5672      0.528      1.073      0.283        -0.468     1.603
==============================================================================
"""

无法将公式传递给

Logit

。做：

In [82]: import patsy

In [83]: f = 'y ~ x'

In [84]: y, X = patsy.dmatrices(f, df, return_type='dataframe')

In [85]: sm.Logit(y, X).fit().summary()
Optimization terminated successfully.
         Current function value: 0.511631
         Iterations 6
Out[85]:
<class 'statsmodels.iolib.summary.Summary'>
"""
                           Logit Regression Results
==============================================================================
Dep. Variable:                      y   No. Observations:                    5
Model:                          Logit   Df Residuals:                        3
Method:                           MLE   Df Model:                            1
Date:                Fri, 30 Aug 2013   Pseudo R-squ.:                  0.2398
Time:                        16:56:38   Log-Likelihood:                -2.5582
converged:                       True   LL-Null:                       -3.3651
                                        LLR p-value:                    0.2040
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept     -2.0544      2.452     -0.838      0.402        -6.861     2.752
x              0.5672      0.528      1.073      0.283        -0.468     1.603
==============================================================================
"""

或者使用公式函数

将statsmodels.api导入为smf

，然后使用smf.logit（公式…），感谢Phillip提供了正确的答案。我的评论太快了。我想编写

将statsmodels.formula.api导入为smf

，它还可以访问公式接口的快捷、小写函数。这些只是模型的

from_formula

方法的方便包装，例如

sm.Logit.from_formula

如何定义引用类别？这不起作用：f='C（y，治疗（0））~x'或使用公式函数

将statsmodels.api导入为smf

，然后使用smf.logit（公式…）感谢Phillip提供了正确的答案。我的评论太快了。我想编写

将statsmodels.formula.api导入为smf

，它还可以访问公式接口的快捷、小写函数。这些只是模型的

from_formula

方法的方便包装，例如

sm.Logit.from_formula

如何定义引用类别？这不起作用：f='C（y，治疗（0））~x'