Python Scikit学习:我的线性回归不是一条直线,它是混乱的

Python Scikit学习:我的线性回归不是一条直线,它是混乱的,python,machine-learning,scikit-learn,linear-regression,Python,Machine Learning,Scikit Learn,Linear Regression,我试着简单地画一条回归线,但是我得到了一条混乱的线。是因为我用两个特征来拟合模型,所以唯一合适的可视化是3d平面吗 import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.datasets import load_boston from sklearn.linear_model import LinearRegression # prepare data boston = load_

我试着简单地画一条回归线,但是我得到了一条混乱的线。是因为我用两个特征来拟合模型,所以唯一合适的可视化是3d平面吗

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression

# prepare data
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)[['AGE','RM']]
y = boston.target

# split dataset into training and test data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=33)

# apply linear regression on dataset
lm = LinearRegression()
lm.fit(X_train, y_train)
pred_train = lm.predict(X_train)
pred_test = lm.predict(X_test)

#plot relationship between RM and price
plt.scatter(X_train['RM'],
            y_train,
            c='g',
            s=40,
            alpha=0.5)
plt.plot(X_train['RM'], pred_train, color='r')
plt.title('Relationship between RM and Price')
plt.ylabel('Price')
plt.xlabel('RM')

问题在于,在绘制时,必须对参数进行排序

'plt.plot(np.sort(X_train['RM'])、np.sort(pred_train)、color='r')'

结果是:


如果你做一个3d绘图,你可能会很容易地看到协变量RM和年龄之间的关系,你是对的。您正在培训多种功能,即年龄和RM。但是,您正在打印只有一个特征的二维打印,即RM。试着得到一个3D图。通常,具有两个特征的线性回归会产生一个平面。这仍然是一个线性回归。这就是为什么我们使用术语“超平面”。它解析为单个要素的直线,两个要素的平面,依此类推

以下是3D格式的输出:

plt3d=plt.figure().gca(projection='3d')
plt3d.view_init(方位角=135)
plt3d.plot_trisurf(X_序列['RM']值,X_序列['AGE']值,预序列,α=0.7,抗锯齿=真)

下面是一个stackoverflow问题,我的回答给出了使用3D散点图、3D曲面图和等高线图进行曲面拟合的Python代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression

# prepare data
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)[['AGE','RM']]
y = boston.target

# split dataset into training and test data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=33)

# apply linear regression on dataset
lm = LinearRegression()
lm.fit(X_train, y_train)
pred_train = lm.predict(X_train)
pred_test = lm.predict(X_test)

#plot relationship between RM and price
plt.scatter(X_train['RM'],
            y_train,
            c='g',
            s=40,
            alpha=0.5)
plt.plot(np.sort(X_train['RM']), np.sort(pred_train), color='r')
plt.title('Relationship between RM and Price')
plt.ylabel('Price')
plt.xlabel('RM')
plt.show()