Pandas 离散属性线性回归中的低分_Pandas_Jupyter Notebook_Sklearn Pandas

Pandas 离散属性线性回归中的低分

pandas jupyter-notebook

Pandas 离散属性线性回归中的低分,pandas,jupyter-notebook,sklearn-pandas,Pandas,Jupyter Notebook,Sklearn Pandas,我试图在我的数据框中做一个线性回归。dataframe是关于apple应用程序的，我想预测应用程序的注释。注释的格式如下： 1.0 1.5 2.0 2.5 ... 5.0 我的代码是： atributos = ['size_bytes','price','rating_count_tot','cont_rating','sup_devices_num','num_screenshots','num_lang','vpp_lic'] atrib_prev = ['nota'] X = np.a

我试图在我的数据框中做一个线性回归。dataframe是关于apple应用程序的，我想预测应用程序的注释。注释的格式如下：

1.0
1.5
2.0
2.5
...
5.0

我的代码是：

atributos = ['size_bytes','price','rating_count_tot','cont_rating','sup_devices_num','num_screenshots','num_lang','vpp_lic']
atrib_prev = ['nota']

X = np.array(data_regress.drop(['nota'],1))
y = np.array(data_regress['nota'])

X = preprocessing.scale(X)

X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)

clf = LinearRegression()
clf.fit(X_train, y_train)

accuracy = clf.score(X_test, y_test)

print(accuracy)

但我的准确度是0.046295306696438665。我认为这是因为线性模型预测的是真实值，而我的“注释”是真实的，但间隔时间较长。我不知道如何在

clf.score

之前对这些值进行四舍五入。首先，对于回归模型，

clf.score（）

计算，而不是精度。因此，您需要决定是否将此问题视为分类问题（对于某些固定数量的目标标签）或回归问题（对于实值目标）

其次，如果您坚持使用回归模型而不是分类，您可以调用

clf.predict（）

，首先获得预测值，然后根据需要进行四舍五入，然后在实际和预测标签上调用

r2_score（）

。比如：

# Get actual predictions
y_pred = clf.predict(X_test)

# You will need to implement the round function yourself
y_pred_rounded = round(y_pred)

# Call the appropriate scorer
score = r2_score(y_test, y_pred_rounded)

您可以在sklearn中查看可用的指标