Python 如何在新数据集上实现模型
我不熟悉使用python进行机器学习。我试图预测一个因素,比如说房子的价格,但我使用高阶多项式特征来创建一个模型。 所以我有两个数据集。我用一个数据集准备了我的模型。 如何在一个全新的数据集上实现这个模型? 我在下面附上我的代码:Python 如何在新数据集上实现模型,python,python-3.x,machine-learning,linear-regression,Python,Python 3.x,Machine Learning,Linear Regression,我不熟悉使用python进行机器学习。我试图预测一个因素,比如说房子的价格,但我使用高阶多项式特征来创建一个模型。 所以我有两个数据集。我用一个数据集准备了我的模型。 如何在一个全新的数据集上实现这个模型? 我在下面附上我的代码: data1 = pd.read_csv(r"C:\Users\DELL\Desktop\experimental data/xyz1.csv", engine = 'c', dtype=float, delimiter = ",") data2 = pd.read_c
data1 = pd.read_csv(r"C:\Users\DELL\Desktop\experimental data/xyz1.csv", engine = 'c', dtype=float, delimiter = ",")
data2 = pd.read_csv(r"C:\Users\DELL\Desktop\experimental data/xyz2.csv", engine = 'c', dtype=float, delimiter = ",")
#I have to do this step otherwise everytime i get an error of NaN or infinite value
data1.fillna(0.000, inplace=True)
data2.fillna(0.000, inplace=True)
X_train = data1.drop('result', axis = 1)
y_train = data1.result
X_test = data2.drop('result', axis = 1)
y_test = data2.result
x2_ = PolynomialFeatures(degree=2, include_bias=False).fit_transform(X_train)
x3_ = PolynomialFeatures(degree=3, include_bias=False).fit_transform(X_train)
model2 = LinearRegression().fit(x2_, y_train)
model3 = LinearRegression().fit(x3_, y_train)
r_sq2 = model2.score(x2_, y_train)
r_sq3 = model3.score(x3_, y_train)
y_pred2 = model2.predict(x2_)
y_pred3 = model3.predict(x3_)
所以基本上我被这件事困住了。
如何在测试数据上实现相同的模型以预测y_测试值并计算分数?要再现
多项式特征的效果,您需要存储对象本身(一次用于度=2
,另一次用于度=3
),否则,您无法将拟合的变换应用于测试数据集
X_train = data1.drop('result', axis = 1)
y_train = data1.result
X_test = data2.drop('result', axis = 1)
y_test = data2.result
# store these data transform objects
pf2 = PolynomialFeatures(degree=2, include_bias=False)
pf3 = PolynomialFeatures(degree=3, include_bias=False)
# then apply the transform to the training set
x2_ = pf2.fit_transform(X_train)
x3_ = pf3.fit_transform(X_train)
model2 = LinearRegression().fit(x2_, y_train)
model3 = LinearRegression().fit(x3_, y_train)
r_sq2 = model2.score(x2_, y_train)
r_sq3 = model3.score(x3_, y_train)
y_pred2 = model2.predict(x2_)
y_pred3 = model3.predict(x3_)
# now apply the fitted transform to the test set
x2_test = pf2.transform(X_test)
x3_test = pf3.transform(X_test)
# apply trained model to transformed test data
y2_test_pred = model2.predict(x2_test)
y3_test_pred = model3.predict(x3_test)
# compute the model accuracy for the test data
r_sq2_test = model2.score(x2_test, y_test)
r_sq3_test = model3.score(x3_test, y_test)
然后我在步骤-x3_test=pf3.tranform(X_test)后得到一个错误:ValueError:X shape与训练shape不匹配我不知道数据集的大小或形状。您必须确保它们具有兼容的尺寸,例如打印(X\u train.shape,X\u test.shape)
。哦,是的,我的错!我很抱歉!编辑了那个。非常感谢您的及时帮助最后一个问题,如何计算分数,我的模型在预测测试数据输出方面有多准确?@Sukhmani我在最后添加了两行代码,模拟您如何计算训练集的分数。值得注意的是,如果您已经有了预测结果的向量,那么运行score
将在后台重新计算所有这些预测。您可能需要使用类似的内容。