Matplotlib 如何计算回归预测的置信区间？以及如何用python绘制它_Matplotlib_Machine Learning_Statistics_Linear Regression_Confidence Interval

Matplotlib 如何计算回归预测的置信区间？以及如何用python绘制它

matplotlib machine-learning statistics

Matplotlib 如何计算回归预测的置信区间？以及如何用python绘制它,matplotlib,machine-learning,statistics,linear-regression,confidence-interval,Matplotlib,Machine Learning,Statistics,Linear Regression,Confidence Interval,图7.1，统计学习简介我目前正在学习一本名为《统计学习导论与R中的应用》，并将解决方案转换为python语言。我无法获得如何获得置信区间并按照上图（虚线）所示绘制它们。我已经画好了线。这是我的代码- （我使用多项式回归，预测因子为“年龄”，回答为“工资”，学位为4）这里的数据是R中可用的工资数据。这是我得到的结果图- 以下代码得出95%的置信区间 from scipy import stats confidence = 0.95 squared_errors = (<<

图7.1，统计学习简介

我目前正在学习一本名为《统计学习导论与R中的应用》，并将解决方案转换为python语言。
我无法获得如何获得置信区间并按照上图（虚线）所示绘制它们。我已经画好了线。这是我的代码- （我使用多项式回归，预测因子为“年龄”，回答为“工资”，学位为4）

这里的数据是R中可用的工资数据。这是我得到的结果图-

以下代码得出95%的置信区间

from scipy import stats

confidence = 0.95
squared_errors = (<<predicted values>> - <<true y_test values>>) ** 2
np.sqrt(stats.t.interval(confidence, len(squared_errors) - 1,
                         loc=squared_errors.mean(),
                         scale=stats.sem(squared_errors)))

来自scipy导入统计信息
置信度=0.95
平方误差=（-）**2
np.sqrt（统计t.区间（置信度，len（平方误差）-1，
loc=平方误差。平均值（），
scale=stats.sem（平方误差）

我使用bootstraping来计算置信区间，为此我使用了一个自定义模块-

import numpy as np
import pandas as pd
from tqdm import tqdm

class Bootstrap_ci:


    def boot(self,X_data,y_data,R,test_data,model):
        predictions = []
        for i in tqdm(range(R)):
            predictions.append(self.alpha(X_data,y_data,self.get_indices(X_data,200),test_data,model))
           
        return np.percentile(predictions,2.5,axis = 0),np.percentile(predictions,97.5,axis = 0)

    def alpha(self,X_data,y_data,index,test_data,model):
        X = X_data.loc[index]
        y = y_data.loc[index]
        
        lr = model
        lr.fit(pd.DataFrame(X),y)
        
        return lr.predict(pd.DataFrame(test_data))


    def get_indices(self,data,num_samples):
        return  np.random.choice(data.index, num_samples, replace=True)

上述模块可用作-

poly = PolynomialFeatures(4)
X = poly.fit_transform(data['age'].to_frame())
y = data['wage']

X_test = np.linspace(min(data['age']),max(data['age']),100)
X_test_poly = poly.transform(X_test.reshape(-1,1))

from bootstrap import Bootstrap_ci

bootstrap = Bootstrap_ci()

li,ui = bootstrap.boot(pd.DataFrame(X),y,1000,X_test_poly,LinearRegression())

这将给出较低的置信区间和较高的置信区间。绘制图表-

plt.scatter(data['age'],data['wage'],facecolors='none', edgecolors='darkgray')
plt.plot(X_test,pred,label = 'Fitted Line')
plt.plot(X_test,ui,linestyle = 'dashed',color = 'r',label = 'Confidence Intervals')
plt.plot(X_test,li,linestyle = 'dashed',color = 'r')

结果图是

您能告诉我返回值在这里指的是什么吗。它返回两个项目的列表。我要寻找的是每个预测点的置信区间。这如何帮助我得到第一行问题中提到的图表。谢谢

plt.scatter(data['age'],data['wage'],facecolors='none', edgecolors='darkgray')
plt.plot(X_test,pred,label = 'Fitted Line')
plt.plot(X_test,ui,linestyle = 'dashed',color = 'r',label = 'Confidence Intervals')
plt.plot(X_test,li,linestyle = 'dashed',color = 'r')