Scikit learn 使用scikit从连续数据预测值_Scikit Learn

Scikit learn 使用scikit从连续数据预测值

scikit-learn

Scikit learn 使用scikit从连续数据预测值,scikit-learn,Scikit Learn,我是scikit新手，需要学习如何基于多个连续数据列预测值。这里我有几个数据列，它们有如下连续的数据。（列名仅作为示例参考）我需要做的是根据输入上述数据创建的模型预测可乐的价值。我只看到对预测值进行分类的例子。如果给出了ColB、ColC、ColD、ColE值中的任何/全部，如何获得实际值有人能帮我了解一下如何使用scikit吗？首先，我将数据转换为csv文件，以便使用pandas。 csv是示例： import pandas as pd from sklearn.linear_model

我是scikit新手，需要学习如何基于多个连续数据列预测值。这里我有几个数据列，它们有如下连续的数据。（列名仅作为示例参考）

我需要做的是根据输入上述数据创建的模型预测可乐的价值。我只看到对预测值进行分类的例子。如果给出了ColB、ColC、ColD、ColE值中的任何/全部，如何获得实际值

有人能帮我了解一下如何使用scikit吗？

首先，我将数据转换为csv文件，以便使用pandas。 csv是

示例：

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

df = pd.read_csv('data.csv',header = None)

#Fill the missing data with 0 and the '?' that you have with 0
df = df.fillna(0)
df= df.replace('?', 0)

X = df.iloc[:,1:7]

#I assume than the y is the first column of the dataset as you said
y = df.iloc[:,0]

#I split the data X, y into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

#Convert pandas dataframes into numpy arrays (it is needed for the fitting)
X_train = X_train.values
X_test = X_test.values
y_train = y_train.values
y_test = y_test.values

#Create and fit the model
model = LinearRegression()

#Fit the model using the training data
model.fit(X_train,y_train)

#Predict unseen data
y_predicted =model.predict(X_test)
scores = model.score(X_test, y_test)

print(y_predicted)
print(scores)

第一次打印的结果是不可见的（X_检验）特征的预测值。预测值对应于数据集的第一列
第二次打印的结果返回预测的确定系数R^2
更多
p.S:您要解决的问题太笼统了
首先，您可以使用sklearn中的
StandardScaler
方法来缩放功能（X数组）。这通常是好的，它可以提高性能，但这取决于你。更多细节
接下来，您可以使用其他方法来分割数据，而不是使用
train\u test\u split
最后，可以使用其他方法代替线性回归

希望这对您有所帮助
您可以使用很多模型，如线性回归。在使用数据拟合模型后，您将使用predict（）方法获得预测。你想让我举个例子吗？如果是，你能上传数据吗？谢谢Sera。如果你能把一些代码片段放在这里，我会很有帮助。我知道如何将数据加载到表格结构中。我对传递给fit方法的参数和正确的编码方式感到困惑。假设我将所有数据加载到一个名为sample_data的结构中。应该通过ColB、ColC、ColD、ColE作为X和ColA作为y的子模型吗？如何传递预测值？是的，这是正确的方法。我将在几分钟后发布一些代码。X和y应该是数组。此外，pandas模块对于加载/拆分数据也非常有用。我将使用您发布的数据创建一个简单的示例。再次感谢。代码将对我有帮助，因为我从一开始就在学习。如果您需要数据文件的完整url，这里是链接。此原始文件包含一些缺少的值。因此，需要对缺少的值进行额外清理。因此，对我来说，仅仅做预测的方法就足够了。我将修改代码以查找丢失的数据。等一下，我刚刚展示了你的评论，我确实点击了upvote。希望系统稍后会更新。
import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split df = pd.read_csv('data.csv',header = None) #Fill the missing data with 0 and the '?' that you have with 0 df = df.fillna(0) df= df.replace('?', 0) X = df.iloc[:,1:7] #I assume than the y is the first column of the dataset as you said y = df.iloc[:,0] #I split the data X, y into training and testing data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) #Convert pandas dataframes into numpy arrays (it is needed for the fitting) X_train = X_train.values X_test = X_test.values y_train = y_train.values y_test = y_test.values #Create and fit the model model = LinearRegression() #Fit the model using the training data model.fit(X_train,y_train) #Predict unseen data y_predicted =model.predict(X_test) scores = model.score(X_test, y_test) print(y_predicted) print(scores)