Python 如何防止LabelEncoder对标签值进行排序？_Python_Scikit Learn

Python 如何防止LabelEncoder对标签值进行排序？

python scikit-learn

Python 如何防止LabelEncoder对标签值进行排序？,python,scikit-learn,Python,Scikit Learn,Scikit LabelEncoder在我的Jupyter笔记本中表现出一些令人费解的行为，如： from sklearn.preprocessing import LabelEncoder le2 = LabelEncoder() le2.fit(['zero', 'one']) print (le2.inverse_transform([0, 0, 0, 1, 1, 1])) 打印['one'one'one'zero'zero']。这很奇怪，它是否应该打印['zero''zero''on

Scikit LabelEncoder在我的Jupyter笔记本中表现出一些令人费解的行为，如：

from sklearn.preprocessing import LabelEncoder
le2 = LabelEncoder()
le2.fit(['zero', 'one'])
print (le2.inverse_transform([0, 0, 0, 1, 1, 1]))

打印

['one'one'one'zero'zero']

。这很奇怪，它是否应该打印

['zero''zero''one''one''one']

？然后我试着

le3 = LabelEncoder()
le3.fit(['one', 'zero'])
print (le3.inverse_transform([0, 0, 0, 1, 1, 1]))

它还打印

['one'one'one'zero'zero']

。也许发生了按字母顺序排列的事情？接下来，我试着

le4 = LabelEncoder()
le4.fit(['nil', 'one'])
print (le4.inverse_transform([0, 0, 0, 1, 1, 1]))

它将打印

['nil''nil''nil''one''one']

我花了好几个小时在这上面。FWIW，示例按预期工作，因此我怀疑在我预期的

逆变换

工作方式中存在缺陷。我的部分研究包括和

如果相关，我使用的是iPython 7.7.0、numpy 1.17.3和scikit学习版0.21.3。

问题是LabelEncoder.fit（）总是返回排序数据。这是因为它使用了

np.unique

这是源代码

我想做你想做的事情的唯一方法是创建你自己的

fit

方法，并覆盖LabelEncoder中的原始方法

您只需重复使用链接中给出的现有代码，以下是示例：

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import column_or_1d

class MyLabelEncoder(LabelEncoder):

    def fit(self, y):
        y = column_or_1d(y, warn=True)
        self.classes_ = pd.Series(y).unique()
        return self

le2 = MyLabelEncoder()
le2.fit(['zero', 'one'])
print (le2.inverse_transform([0, 0, 0, 1, 1, 1]))

给你：

['zero' 'zero' 'zero' 'one' 'one' 'one']

哈，看看我在这上面发现了什么。伟大的发现！如果你想根据修改答案，我会接受。