Python 如何使用statsmodels.api构建三次回归?
几天来,我一直在尝试进行三次回归,但我遇到了同样的问题:我的结果与我在R中编写的代码不一致。数据库是完全相同的,所以这不是问题所在。 我现在的代码是这样的:Python 如何使用statsmodels.api构建三次回归?,python,python-3.x,statsmodels,Python,Python 3.x,Statsmodels,几天来,我一直在尝试进行三次回归,但我遇到了同样的问题:我的结果与我在R中编写的代码不一致。数据库是完全相同的,所以这不是问题所在。 我现在的代码是这样的: import pandas as pd from sklearn.preprocessing import PolynomialFeatures import statsmodels.api as sm import numpy as np df = pd.read_csv("http://web.stanford.edu/~o
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
import statsmodels.api as sm
import numpy as np
df = pd.read_csv("http://web.stanford.edu/~oleg2/hse/boston/boston_house_prices.csv")
df = df.dropna()
x, y = np.array(df.lstat), np.array(df.crim)
polynomial_features= PolynomialFeatures(degree=3)
xp = polynomial_features.fit_transform(x.reshape(-1,1))
model = sm.OLS(y, xp).fit()
print(model.summary())
import pandas as pd
import statsmodels.api as sm
import numpy as np
import statsmodels.formula.api as smf
df = pd.read_csv("http://web.stanford.edu/~oleg2/hse/boston/boston_house_prices.csv")
df = df.dropna()
ft1 = smf.ols(formula=f"crim ~ lstat + I(np.power(lstat,2)) + I(np.power(lstat,3))", data=df).fit()
print(ft1.summary())
我也做了这样的事情:
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
import statsmodels.api as sm
import numpy as np
df = pd.read_csv("http://web.stanford.edu/~oleg2/hse/boston/boston_house_prices.csv")
df = df.dropna()
x, y = np.array(df.lstat), np.array(df.crim)
polynomial_features= PolynomialFeatures(degree=3)
xp = polynomial_features.fit_transform(x.reshape(-1,1))
model = sm.OLS(y, xp).fit()
print(model.summary())
import pandas as pd
import statsmodels.api as sm
import numpy as np
import statsmodels.formula.api as smf
df = pd.read_csv("http://web.stanford.edu/~oleg2/hse/boston/boston_house_prices.csv")
df = df.dropna()
ft1 = smf.ols(formula=f"crim ~ lstat + I(np.power(lstat,2)) + I(np.power(lstat,3))", data=df).fit()
print(ft1.summary())
这两种方法给出了完全相同的结果:
OLS Regression Results
==============================================================================
Dep. Variable: crim R-squared: 0.218
Model: OLS Adj. R-squared: 0.213
Method: Least Squares F-statistic: 46.63
Date: Sat, 03 Oct 2020 Prob (F-statistic): 1.35e-26
Time: 10:26:13 Log-Likelihood: -1744.2
No. Observations: 506 AIC: 3496.
Df Residuals: 502 BIC: 3513.
Df Model: 3
Covariance Type: nonrobust
=========================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------------
Intercept 1.2010 2.029 0.592 0.554 -2.785 5.187
lstat -0.4491 0.465 -0.966 0.335 -1.362 0.464
I(np.power(lstat, 2)) 0.0558 0.030 1.852 0.065 -0.003 0.115
I(np.power(lstat, 3)) -0.0009 0.001 -1.517 0.130 -0.002 0.000
==============================================================================
Omnibus: 607.734 Durbin-Watson: 1.239
Prob(Omnibus): 0.000 Jarque-Bera (JB): 53621.219
Skew: 5.726 Prob(JB): 0.00
Kurtosis: 52.114 Cond. No. 5.20e+04
==============================================================================
这是R上的程序:
fit.lstat2 <- lm(crim ~ poly(lstat, 3))
summary(fit.lstat2)
fit.lstat2 | t |)
##(截距)3.6135 0.3392 10.654我不是R方面的专家,但是,我想你不使用正交多项式,所以你必须设置raw=TRUE
当我用于R进程时,我得到了与python statsmodels.api相同的结果:
fit.lstat2 <- lm(crim ~ poly(lstat, 3, raw=TRUE))
summary(fit.lstat2)
fit.lstat2 | t |)
(截距)1.2009656 2.0286452 0.592 0.5541
聚(lstat,3,原始=真)1-0.4490656 0.4648911-0.966 0.3345
poly(lstat,3,原始=真实)20.0557794 0.0301156 1.852 0.0646。
聚(lstat,3,原始=真)3-0.0008574 0.0005652-1.517 0.1299
---
签名。代码:0'***'0.001'***'0.01'*'0.05'.'0.1''1
残余标准误差:502自由度上的7.629
倍数R平方:0.2179,调整后的R平方:0.2133
F-统计量:在3和502 DF上为46.63,p值:<2.2e-16
你是如何用R加载数据的?@Frenchylibrary(MASS)attach(Boston)
我不知道python的结果是否是错误的,但你没有提到同样的结果。你的R提到了正交多边形,所以请看我的答案,改为Raw这是一个很好的观察结果,但我更愿意修改python代码,以得到与R相同的结果,因为我试图在线性和非线性观察结果(通过相同的方法获得)之间进行方差分析测试,它们几乎没有什么区别,但它显示了一些非常小的p值,表明它们在天文上是不同的。如果有人想知道,方差分析测试不起作用,因为我忘了加截距)。因此,我的python代码是正确的。