Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用statsmodels.api构建三次回归?_Python_Python 3.x_Statsmodels - Fatal编程技术网

Python 如何使用statsmodels.api构建三次回归?

Python 如何使用statsmodels.api构建三次回归?,python,python-3.x,statsmodels,Python,Python 3.x,Statsmodels,几天来,我一直在尝试进行三次回归,但我遇到了同样的问题:我的结果与我在R中编写的代码不一致。数据库是完全相同的,所以这不是问题所在。 我现在的代码是这样的: import pandas as pd from sklearn.preprocessing import PolynomialFeatures import statsmodels.api as sm import numpy as np df = pd.read_csv("http://web.stanford.edu/~o

几天来,我一直在尝试进行三次回归,但我遇到了同样的问题:我的结果与我在R中编写的代码不一致。数据库是完全相同的,所以这不是问题所在。 我现在的代码是这样的:

import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
import statsmodels.api as sm
import numpy as np

df = pd.read_csv("http://web.stanford.edu/~oleg2/hse/boston/boston_house_prices.csv")
df = df.dropna()
x, y = np.array(df.lstat), np.array(df.crim)
polynomial_features= PolynomialFeatures(degree=3)
xp = polynomial_features.fit_transform(x.reshape(-1,1))
model = sm.OLS(y, xp).fit()
print(model.summary())
import pandas as pd
import statsmodels.api as sm
import numpy as np
import statsmodels.formula.api as smf

df = pd.read_csv("http://web.stanford.edu/~oleg2/hse/boston/boston_house_prices.csv")
df = df.dropna()
ft1 = smf.ols(formula=f"crim ~ lstat + I(np.power(lstat,2)) + I(np.power(lstat,3))", data=df).fit()
print(ft1.summary())
我也做了这样的事情:

import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
import statsmodels.api as sm
import numpy as np

df = pd.read_csv("http://web.stanford.edu/~oleg2/hse/boston/boston_house_prices.csv")
df = df.dropna()
x, y = np.array(df.lstat), np.array(df.crim)
polynomial_features= PolynomialFeatures(degree=3)
xp = polynomial_features.fit_transform(x.reshape(-1,1))
model = sm.OLS(y, xp).fit()
print(model.summary())
import pandas as pd
import statsmodels.api as sm
import numpy as np
import statsmodels.formula.api as smf

df = pd.read_csv("http://web.stanford.edu/~oleg2/hse/boston/boston_house_prices.csv")
df = df.dropna()
ft1 = smf.ols(formula=f"crim ~ lstat + I(np.power(lstat,2)) + I(np.power(lstat,3))", data=df).fit()
print(ft1.summary())
这两种方法给出了完全相同的结果:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   crim   R-squared:                       0.218
Model:                            OLS   Adj. R-squared:                  0.213
Method:                 Least Squares   F-statistic:                     46.63
Date:                Sat, 03 Oct 2020   Prob (F-statistic):           1.35e-26
Time:                        10:26:13   Log-Likelihood:                -1744.2
No. Observations:                 506   AIC:                             3496.
Df Residuals:                     502   BIC:                             3513.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
=========================================================================================
                            coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------------
Intercept                 1.2010      2.029      0.592      0.554      -2.785       5.187
lstat                    -0.4491      0.465     -0.966      0.335      -1.362       0.464
I(np.power(lstat, 2))     0.0558      0.030      1.852      0.065      -0.003       0.115
I(np.power(lstat, 3))    -0.0009      0.001     -1.517      0.130      -0.002       0.000
==============================================================================
Omnibus:                      607.734   Durbin-Watson:                   1.239
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            53621.219
Skew:                           5.726   Prob(JB):                         0.00
Kurtosis:                      52.114   Cond. No.                     5.20e+04
==============================================================================
这是R上的程序:

fit.lstat2 <- lm(crim ~ poly(lstat, 3))
summary(fit.lstat2)
fit.lstat2 | t |)

##(截距)3.6135 0.3392 10.654我不是R方面的专家,但是,我想你不使用正交多项式,所以你必须设置raw=TRUE

当我用于R进程时,我得到了与python statsmodels.api相同的结果:

fit.lstat2 <- lm(crim ~ poly(lstat, 3, raw=TRUE))
summary(fit.lstat2)
fit.lstat2 | t |)
(截距)1.2009656 2.0286452 0.592 0.5541
聚(lstat,3,原始=真)1-0.4490656 0.4648911-0.966 0.3345
poly(lstat,3,原始=真实)20.0557794 0.0301156 1.852 0.0646。
聚(lstat,3,原始=真)3-0.0008574 0.0005652-1.517 0.1299
---
签名。代码:0'***'0.001'***'0.01'*'0.05'.'0.1''1
残余标准误差:502自由度上的7.629
倍数R平方:0.2179,调整后的R平方:0.2133
F-统计量:在3和502 DF上为46.63,p值:<2.2e-16

你是如何用R加载数据的?@Frenchy
library(MASS)attach(Boston)
我不知道python的结果是否是错误的,但你没有提到同样的结果。你的R提到了正交多边形,所以请看我的答案,改为Raw这是一个很好的观察结果,但我更愿意修改python代码,以得到与R相同的结果,因为我试图在线性和非线性观察结果(通过相同的方法获得)之间进行方差分析测试,它们几乎没有什么区别,但它显示了一些非常小的p值,表明它们在天文上是不同的。如果有人想知道,方差分析测试不起作用,因为我忘了加截距)。因此,我的python代码是正确的。