Python sklearn模型返回的平均绝对误差为0,为什么?
玩弄Python sklearn模型返回的平均绝对误差为0,为什么?,python,pandas,machine-learning,scikit-learn,Python,Pandas,Machine Learning,Scikit Learn,玩弄sklearn,我想用Open、High、Low价格和成交量来预测几天的TSLA收盘价。我用了一个非常基本的模型来预测收盘价,他们应该是100%准确的,我不知道为什么。0%的错误让我感觉好像没有正确设置模型 代码: 从操作系统导入X_确定 从numpy.lib.shape_base导入沿_轴应用_ 作为pd进口熊猫 从sklearn.tree导入决策树 从sklearn.metrics导入平均绝对误差 tsla_data_path=“/Users/simon/Documents/Python
sklearn
,我想用Open
、High
、Low
价格和成交量来预测几天的TSLA收盘价。我用了一个非常基本的模型来预测收盘价,他们应该是100%准确的,我不知道为什么。0%的错误让我感觉好像没有正确设置模型
代码:
从操作系统导入X_确定
从numpy.lib.shape_base导入沿_轴应用_
作为pd进口熊猫
从sklearn.tree导入决策树
从sklearn.metrics导入平均绝对误差
tsla_data_path=“/Users/simon/Documents/PythonVS/ML/tsla.csv”
tsla_数据=pd.read_csv(tsla_数据路径)
tsla_功能=[“开放”、“高”、“低”、“音量”]
y=tsla_数据。关闭
X=tsla_数据[tsla_特征]
#定义模型
特斯拉模型=决策树累加器(随机状态=1)
#拟合模型
特斯拉_模型拟合(X,y)
#打印结果
打印('对以下五个日期进行预测')
打印(X.head())
打印('UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
打印('预测是')
打印(tesla_model.predict(X.head()))
打印('UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
打印('错误为')
打印(平均绝对误差(y.head(),特斯拉模型预测(X.head()))
输出:
making predictions for the following five dates
Open High Low Volume
0 67.054001 67.099998 65.419998 39737000
1 66.223999 66.786003 65.713997 27778000
2 66.222000 66.251999 65.500000 12328000
3 65.879997 67.276001 65.737999 30372500
4 66.524002 67.582001 66.438004 32868500
________________________________________________
the predictions are
[65.783997 66.258003 65.987999 66.973999 67.239998]
________________________________________________
the error is
0.0
数据:
在用于训练模型的数据集上测量模型的性能是一个错误
如果您想对您的性能有一个合适的评估指标,您应该将数据集拆分为两个数据集。一个用于训练模型,另一个用于测量其性能。您可以使用sklearn.model\u selection.train\u test\u split()
拆分数据集,如下所示:
tesla_model = DecisionTreeRegressor(random_state = 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
tesla_model.fit(X_train, X_test)
mae = mean_absolute_error(y_test,tesla_model.predict(X_test))
看看这本维基百科,它用ML解释了不同的数据集。您正在使用输入到fit
的相同数据集进行预测。
tesla_model = DecisionTreeRegressor(random_state = 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
tesla_model.fit(X_train, X_test)
mae = mean_absolute_error(y_test,tesla_model.predict(X_test))