Python 回归中的有序分类数据_Python_Machine Learning_Scikit Learn_Categorical Data

Python 回归中的有序分类数据

python machine-learning scikit-learn

Python 回归中的有序分类数据,python,machine-learning,scikit-learn,categorical-data,Python,Machine Learning,Scikit Learn,Categorical Data,我有一个数据集，其中包含不同权重的分类数据，例如，Phd的权重高于硕士，MSc的权重高于Bsc 我知道我要使用标签编码器，但我不希望python任意地将代码分配给这些变量。我想要更高的Phd=4，Msc=3，Bsc=2，O级别=1，无教育=0的代码有什么我可以做的吗？有人能帮忙吗？LabelEncoder将根据字母顺序对类别进行编码，并存储在类属性中。默认情况下，情况如下： from sklearn.preprocessing import LabelEncoder le = LabelEnc

我有一个数据集，其中包含不同权重的分类数据，例如，Phd的权重高于硕士，MSc的权重高于Bsc

我知道我要使用标签编码器，但我不希望python任意地将代码分配给这些变量。我想要更高的Phd=4，Msc=3，Bsc=2，O级别=1，无教育=0的代码

有什么我可以做的吗？有人能帮忙吗？

LabelEncoder将根据字母顺序对类别进行编码，并存储在

类

属性中。默认情况下，情况如下：

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit(['Phd', 'Msc','Bsc', 'O Levels','No education'])
ll.classes_
# Output: array(['Bsc', 'Msc', 'No education', 'O Levels', 'Phd'], dtype='|S12')

有多少类？如果较少，您可以自己使用dict进行转换，类似于：

您的数据是否在数据框中？

my_dict = {'Phd':4, 'Msc':3 , 'Bsc':2, 'O Levels':1, 'No education':0}

y = ['No education', 'O Levels','Bsc', 'Msc','Phd']
np.vectorize(my_dict.get)(y)

# Output: array([0, 1, 2, 3, 4])