Python 尝试将分类数据转换为数字并运行RandomForestClassifier_Python_Python 3.x_Machine Learning

Python 尝试将分类数据转换为数字并运行RandomForestClassifier

python python-3.x machine-learning

Python 尝试将分类数据转换为数字并运行RandomForestClassifier,python,python-3.x,machine-learning,Python,Python 3.x,Machine Learning,我正在测试这个代码 df1 = df[['Group', 'Sector', 'Cat2', 'Cat3', 'Cat4', 'Cat5', 'Cat6', 'Industry', 'Market', 'Price']].copy() df1 = df1[:100000] df1.shape df1 = df1.fillna(0) df1 = pd.get_dummies(df1) X = df1.drop(['Price'], axis=1) y = df1['Price'] fr

我正在测试这个代码

df1 = df[['Group', 'Sector', 'Cat2', 'Cat3', 'Cat4', 'Cat5', 'Cat6', 'Industry', 'Market', 'Price']].copy()
df1 = df1[:100000]
df1.shape

df1 = df1.fillna(0)


df1 = pd.get_dummies(df1)


X = df1.drop(['Price'], axis=1)
y = df1['Price']

from sklearn.model_selection import train_test_split
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) # 70% training and 30% test


#Import Random Forest Model
from sklearn.ensemble import RandomForestClassifier

# Create the model with 100 trees
model = RandomForestClassifier(n_estimators=100, 
                               bootstrap = True,
                               max_features = 'sqrt')
# Fit on training data
model.fit(X_train, y_train)

我在这一行遇到一个错误：

model.fit（X\u-train，y\u-train）

这是我的错误：

value错误：未知标签类型：“连续”

我的设置是这样的：我在'df'中有许多字段，我正在将一些字段复制到'df1'。这些都是分类的：

“集团”、“行业”、“二类”、“三类”、“四类”、“五类”、“六类”、“行业”、“市场”

这个是数字：

“价格”

我使用一个热编码将分类项目转换为数字项目，数字（价格）保持原样。这个设置有什么问题吗，还是没问题？只是想在这里找到一些指导和解决方案。谢谢。

您正在使用分类器来预测连续价格。当它提到标签时，
sklearn
表示目标，因此问题不在于你的
X
，而在于
y
。您需要的是
sklearn.essemble.RandomForestRegressor
。有了它，您将能够预测连续值，例如
price
改用这个：

from sklearn.ensemble import RandomForestRegressor model = RandomForestRegressor(n_estimators=100, bootstrap = True, max_features = 'sqrt') # model.fit(X, y...

您正在使用分类器来预测连续价格。当它提到标签时，
sklearn
表示目标，因此问题不在于你的
X
，而在于
y
。您需要的是
sklearn.essemble.RandomForestRegressor
。有了它，您将能够预测连续值，例如
price
改用这个：

from sklearn.ensemble import RandomForestRegressor model = RandomForestRegressor(n_estimators=100, bootstrap = True, max_features = 'sqrt') # model.fit(X, y...

好的，我知道这里发生了什么。我去掉了连续变量。比方说，我想根据所有其他特征预测“市场”，我肯定想使用分类，而不是回归。只要我将“市场”设置为y变量，然后应用一个热编码，我就会得到所有不同的市场变量，因为有一个热编码。如何使这个概念起作用？使用分类器并使用
sklearn.preprocessing.OrdinalEncoder对分类目标变量进行编码好的，我知道这里发生了什么。我去掉了连续变量。比方说，我想根据所有其他特征预测“市场”，我肯定想使用分类，而不是回归。只要我将“市场”设置为y变量，然后应用一个热编码，我就会得到所有不同的市场变量，因为有一个热编码。如何使这个概念起作用？使用分类器并使用sklearn.preprocessing.OrdinalEncoder