Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/325.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在新数据集上实现模型_Python_Python 3.x_Machine Learning_Linear Regression - Fatal编程技术网

Python 如何在新数据集上实现模型

Python 如何在新数据集上实现模型,python,python-3.x,machine-learning,linear-regression,Python,Python 3.x,Machine Learning,Linear Regression,我不熟悉使用python进行机器学习。我试图预测一个因素,比如说房子的价格,但我使用高阶多项式特征来创建一个模型。 所以我有两个数据集。我用一个数据集准备了我的模型。 如何在一个全新的数据集上实现这个模型? 我在下面附上我的代码: data1 = pd.read_csv(r"C:\Users\DELL\Desktop\experimental data/xyz1.csv", engine = 'c', dtype=float, delimiter = ",") data2 = pd.read_c

我不熟悉使用python进行机器学习。我试图预测一个因素,比如说房子的价格,但我使用高阶多项式特征来创建一个模型。 所以我有两个数据集。我用一个数据集准备了我的模型。 如何在一个全新的数据集上实现这个模型? 我在下面附上我的代码:

data1 = pd.read_csv(r"C:\Users\DELL\Desktop\experimental data/xyz1.csv", engine = 'c', dtype=float, delimiter = ",")
data2 = pd.read_csv(r"C:\Users\DELL\Desktop\experimental data/xyz2.csv", engine = 'c', dtype=float, delimiter = ",")

#I have to do this step otherwise everytime i get an error of NaN or infinite value
data1.fillna(0.000, inplace=True)
data2.fillna(0.000, inplace=True)

X_train = data1.drop('result', axis = 1)
y_train = data1.result
X_test = data2.drop('result', axis = 1)
y_test = data2.result

x2_ = PolynomialFeatures(degree=2, include_bias=False).fit_transform(X_train)
x3_ = PolynomialFeatures(degree=3, include_bias=False).fit_transform(X_train)

model2 = LinearRegression().fit(x2_, y_train)
model3 = LinearRegression().fit(x3_, y_train)

r_sq2 = model2.score(x2_, y_train)
r_sq3 = model3.score(x3_, y_train)

y_pred2 = model2.predict(x2_)
y_pred3 = model3.predict(x3_)
所以基本上我被这件事困住了。
如何在测试数据上实现相同的模型以预测y_测试值并计算分数?

要再现
多项式特征的效果,您需要存储对象本身(一次用于
度=2
,另一次用于
度=3
),否则,您无法将拟合的变换应用于测试数据集

X_train = data1.drop('result', axis = 1)
y_train = data1.result
X_test = data2.drop('result', axis = 1)
y_test = data2.result

# store these data transform objects
pf2 = PolynomialFeatures(degree=2, include_bias=False)
pf3 = PolynomialFeatures(degree=3, include_bias=False)

# then apply the transform to the training set
x2_ = pf2.fit_transform(X_train)
x3_ = pf3.fit_transform(X_train)

model2 = LinearRegression().fit(x2_, y_train)
model3 = LinearRegression().fit(x3_, y_train)

r_sq2 = model2.score(x2_, y_train)
r_sq3 = model3.score(x3_, y_train)

y_pred2 = model2.predict(x2_)
y_pred3 = model3.predict(x3_)

# now apply the fitted transform to the test set
x2_test = pf2.transform(X_test)
x3_test = pf3.transform(X_test)

# apply trained model to transformed test data
y2_test_pred = model2.predict(x2_test)
y3_test_pred = model3.predict(x3_test)

# compute the model accuracy for the test data
r_sq2_test = model2.score(x2_test, y_test)
r_sq3_test = model3.score(x3_test, y_test)

然后我在步骤-x3_test=pf3.tranform(X_test)后得到一个错误:ValueError:X shape与训练shape不匹配我不知道数据集的大小或形状。您必须确保它们具有兼容的尺寸,例如
打印(X\u train.shape,X\u test.shape)
。哦,是的,我的错!我很抱歉!编辑了那个。非常感谢您的及时帮助最后一个问题,如何计算分数,我的模型在预测测试数据输出方面有多准确?@Sukhmani我在最后添加了两行代码,模拟您如何计算训练集的分数。值得注意的是,如果您已经有了预测结果的向量,那么运行
score
将在后台重新计算所有这些预测。您可能需要使用类似的内容。