Python 回归线图_Python_Matplotlib_Machine Learning_Scikit Learn

Python 回归线图

python matplotlib machine-learning scikit-learn

Python 回归线图,python,matplotlib,machine-learning,scikit-learn,Python,Matplotlib,Machine Learning,Scikit Learn,我正试图根据我的预测数据，在散点图上绘制一条回归线问题是我应该得到一条线，但我的绘图有许多线连接所有点（见图）在根据其他数据预测CO2排放量后，我绘制了测试发动机尺寸与测试实际数据（CO2排放量）的对比图，我试图绘制发动机尺寸与测试预测数据的对比图，但我无法绘制代码如下： #import the dataset df = pd.read_csv('FuelConsumptionCo2.csv') cols = ['ENGINESIZE','CYLINDERS','FUELTYPE','F

我正试图根据我的预测数据，在散点图上绘制一条回归线

问题是我应该得到一条线，但我的绘图有许多线连接所有点（见图）

在根据其他数据预测CO2排放量后，我绘制了测试发动机尺寸与测试实际数据（CO2排放量）的对比图，我试图绘制发动机尺寸与测试预测数据的对比图，但我无法绘制

代码如下：

#import the dataset
df = pd.read_csv('FuelConsumptionCo2.csv')
cols = ['ENGINESIZE','CYLINDERS','FUELTYPE','FUELCONSUMPTION_CITY','FUELCONSUMPTION_HWY','FUELCONSUMPTION_COMB','CO2EMISSIONS']

#create new dataset with colums neeeded
cdf = df[cols]
#dummies for the categorigal column fueltype
cdf = pd.get_dummies(cdf,'FUELTYPE')

#the features without the target column
selFeatures = list(cdf.columns.values)
del selFeatures[5]


#split the dataset for fitting
X_train, X_test, Y_train, Y_test = train_test_split(cdf[selFeatures], cdf['CO2EMISSIONS'], test_size=0.5)

#regression model
clfregr = linear_model.LinearRegression()

#train the model
clfregr.fit(X_train, Y_train)

#predict the values
train_pred = clfregr.predict(X_train)
test_pred = clfregr.predict(X_test)

#regression line for the predicted in test
plt.scatter(X_test.ENGINESIZE,Y_test,  color='gray')
plt.plot(X_test.ENGINESIZE, test_pred, color='red', linewidth=1)
plt.show()

尝试从

LinearRegression（）

函数中提取回归线的斜率（m）和截距（b），然后使用

plt.plot（X_test.ENGINESIZE，m*X_test.ENGINESIZE+b，'r'，线宽=1）

或者使用seaborn的

lmplot

或

regplot

函数。

尝试从

线性回归（）函数中提取回归线的斜率（m）和截距（b），然后使用
plt.plot（X_test.ENGINESIZE，m*X_test.ENGINESIZE+b，'r'，线宽=1）

或者使用seaborn的lmplot
或regplot
功能。
您可以应用此代码绘制回归模型
model = linear_model.LinearRegression()
x_train = np.asanyarray(df[['ENGINESIZE']])
y_train = np.asanyarray(df[['CO2EMISSIONS']])
model.fit (x_train, y_train)


plt.scatter(df['ENGINESIZE'], df["CO2EMISSIONS"], color='blue')
plt.plot(x_train, model.coef_[0][0]*x_train + model.intercept_[0], color='red')

您可以应用此代码绘制回归模型
model = linear_model.LinearRegression()
x_train = np.asanyarray(df[['ENGINESIZE']])
y_train = np.asanyarray(df[['CO2EMISSIONS']])
model.fit (x_train, y_train)


plt.scatter(df['ENGINESIZE'], df["CO2EMISSIONS"], color='blue')
plt.plot(x_train, model.coef_[0][0]*x_train + model.intercept_[0], color='red')

问题是你在做多元线性回归。如果发动机尺寸是影响二氧化碳排放量的唯一因素，则应预期为一条直线。但也有其他因素。如果你有两个自变量，你会得到一个三维平面。如果有n个变量，则应在n维度量空间中使用线性形状
 问题是你在做多元线性回归。如果发动机尺寸是影响二氧化碳排放量的唯一因素，则应预期为一条直线。但也有其他因素。如果你有两个自变量，你会得到一个三维平面。如果有n个变量，则应在n维度量空间中使用线性形状
 数据中有9个自变量。因此，仅通过其中一个进行打印，最终将得到每个ENGINESIZE
值的副本。这不会产生绘图仪功能。当你试图画一条线时，它会在多个垂直点之间曲折

请注意，当我们对预测进行散点图
时，我们在一条垂直线上有多个变量-对应于其他八个自变量的不同值，而不是在x轴上绘制的自变量值
：
 plt.scatter(X_test.ENGINESIZE, test_pred, color='yello') # , linewidth=1)


我会说，sklearn
LinearRegression
类很难使用。我改用了statsmodels

plt.scatter(X_test.ENGINESIZE,Y_test,  color='gray')
import statsmodels.formula.api  as smf
y = Y_train
X = X_train
df = pd.DataFrame({'x' : X.ENGINESIZE, 'y': y})
smod = smf.ols(formula ='y~ x', data=df)
result = smod.fit()
plt.plot(df['x'], result.predict(df['x']), color='red', linewidth=1)
plt.show()


然后再申请额外的学分
print(result.summary())

数据中有9个自变量。因此，仅通过其中一个进行打印，最终将得到每个ENGINESIZE
值的副本。这不会产生绘图仪功能。当你试图画一条线时，它会在多个垂直点之间曲折

请注意，当我们对预测进行散点图
时，我们在一条垂直线上有多个变量-对应于其他八个自变量的不同值，而不是在x轴上绘制的自变量值
：
 plt.scatter(X_test.ENGINESIZE, test_pred, color='yello') # , linewidth=1)


我会说，sklearn
LinearRegression
类很难使用。我改用了statsmodels

plt.scatter(X_test.ENGINESIZE,Y_test,  color='gray')
import statsmodels.formula.api  as smf
y = Y_train
X = X_train
df = pd.DataFrame({'x' : X.ENGINESIZE, 'y': y})
smod = smf.ols(formula ='y~ x', data=df)
result = smod.fit()
plt.plot(df['x'], result.predict(df['x']), color='red', linewidth=1)
plt.show()


然后再申请额外的学分
print(result.summary())

您有FuelConsumptionC02.csv
的样本数据吗？啊，我从这里得到的是的，很抱歉我没有发布数据集的链接。但我想它在很多练习中都会用到，所以你很快就能找到它！你有FuelConsumptionC02.csv
的样本数据吗？啊，我从这里得到的是的，很抱歉我没有发布数据集的链接。但我想它在很多练习中都会用到，所以你很快就能找到它！哦，model.coef
是一个二维数组！为什么会这样？是的，我也有同样的问题：为什么有必要做np.asanyarray（）
？只是我一个人，还是想知道所需的输入和输出非常困难structures@javadba只是df[['ENGINESIZE']]为我工作。没有double[[是在分割df时遇到的唯一问题，因为它需要df或二维np数组。起初我尝试了df['ENGINESIZE']，但它创建了一个系列，而不是df，因此它的1Dcoef_形状数组（n_特征）或（n_目标，n_特征）线性回归问题的估计系数。如果拟合过程中传递了多个目标（y 2D），则这是一个2D形状数组（n_目标，n_特征），而如果只传递了一个目标，这是一个长度n_特征的1D数组。哦，模型。coef_
是一个二维数组！为什么？是的，我也有同样的问题：为什么必须执行np.asanyarray（）
？是只有我一个人，还是想知道所需的输入和输出非常困难structures@javadba只有df[['ENGINESIZE']]对我有效。没有double[[的唯一问题是在拆分df时，因为它需要df或二维np数组。我尝试了df['ENGINESIZE']首先，它创建了一个系列，而不是df，因此它的1Dcoef_形状数组（n_特征）或（n_目标，n_特征）估计了线性回归问题的系数。如果在拟合过程中传递了多个目标（y 2D），则这是一个2D形状数组（n_目标，n_特征），而如果只通过一个目标，这是一个1D长度n_特征数组。我认为我可以根据同一模型的引擎大小进行绘图，但显然不是这样。我总是可以制作另一个模型，只根据引擎大小进行预测，并使用该模型来绘制我需要的内容。我认为我可以根据引擎大小进行绘图根据同一个模型的发动机尺寸，但显然不是这样。我总是可以制作另一个模型，只根据发动机尺寸进行预测，然后用这个模型来绘制我需要的东西。