Python中的反向标签编码器功能_Python_Machine Learning_Regression_Random Forest

Python中的反向标签编码器功能

python machine-learning

Python中的反向标签编码器功能,python,machine-learning,regression,random-forest,Python,Machine Learning,Regression,Random Forest,考虑下面的示例表，我正试图对其进行预测正如你所看到的，我混合了数字（Num1和Num2）和分类特征（Cat1和Cat2）来预测一个值，我使用随机森林回归来预测读入文件后，我将使用LabelEncoder将分类特征转换为数字特征，如下所示 category_col =['Cat1', 'Cat2'] labelEncoder = preprocessing.LabelEncoder() # creating a map of all the numerical values of eac

考虑下面的示例表，我正试图对其进行预测

正如你所看到的，我混合了数字（Num1和Num2）和分类特征（Cat1和Cat2）来预测一个值，我使用随机森林回归来预测

读入文件后，我将使用LabelEncoder将分类特征转换为数字特征，如下所示

category_col =['Cat1', 'Cat2'] 
labelEncoder = preprocessing.LabelEncoder()

# creating a map of all the numerical values of each categorical labels.
mapping_dict={}
for col in category_col:
    df[col] = labelEncoder.fit_transform(df[col])
    le_name_mapping = dict(zip(labelEncoder.classes_, labelEncoder.transform(labelEncoder.classes_)))
    mapping_dict[col]=le_name_mapping

一旦转换，我就将数据帧分割成一个训练和测试集&做出预测，就像这样

train_features, test_features, train_labels, test_labels = train_test_split(df, labels, test_size = 0.30)

rf = RandomForestRegressor(n_estimators = 1000)
rf.fit(train_features, train_labels)
predictions = rf.predict(test_features)

我的问题是，如何更改Cat1和Cat2的数字以再次显示原始类别，以便可以像这样导出预测

我知道我需要使用labelEncoder.inverse_transform，但是，我似乎无法获得正确的语法来返回类别文本以与结果相匹配

感谢您的帮助

快速解决方案，基于您已有的代码：

# Invert the mapping dictionary you created
inv_mapping_dict = {cat: {v: k for k, v in map_dict.items()} for cat, map_dict in mapping_dict.items()}

# Assuming `predictions` is your resulting dataframe.
# Replace the predictions with the inverted mapping dictionary.
predictions.replace(inv_mapping_dict)

<> P>一种稍微好一些的方法，在创建初始映射字典时，也可以考虑这里的答案：

不必在类别列上使用for循环来创建映射字典，您可以在列上创建一个LabelEncoders字典，然后在开始和结束时同时应用列的拟合和反转

快速解决方案，基于您已有的代码：

# Invert the mapping dictionary you created
inv_mapping_dict = {cat: {v: k for k, v in map_dict.items()} for cat, map_dict in mapping_dict.items()}

# Assuming `predictions` is your resulting dataframe.
# Replace the predictions with the inverted mapping dictionary.
predictions.replace(inv_mapping_dict)

<> P>一种稍微好一些的方法，在创建初始映射字典时，也可以考虑这里的答案：

不必在类别列上使用for循环来创建映射字典，您可以在列上创建一个LabelEncoders字典，然后在开始和结束时同时应用列的拟合和反转

谢谢，我确实对你发布的链接有疑问，这里显示的方法对我的数据框中的所有变量进行编码。我怎样才能挑出我需要的两列，并使用该方法对它们进行编码？谢谢，我确实对你发布的链接有疑问，这里显示的方法对我的数据框中的所有变量进行编码。我怎样才能挑出我需要的两列，并使用该方法对它们进行编码？