Python 使用OneHotEncoder编码_Python_Scikit Learn_One Hot Encoding

Python 使用OneHotEncoder编码

python scikit-learn

Python 使用OneHotEncoder编码,python,scikit-learn,one-hot-encoding,Python,Scikit Learn,One Hot Encoding,我正在尝试使用scikitlearn的OneHotEncoder预压缩数据。显然，我做错了什么。以下是我的示例程序： from sklearn.preprocessing import LabelEncoder, OneHotEncoder from sklearn.compose import ColumnTransformer cat = ['ok', 'ko', 'maybe', 'maybe'] label_encoder = LabelEncoder() label_encod

我正在尝试使用scikitlearn的OneHotEncoder预压缩数据。显然，我做错了什么。以下是我的示例程序：

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer


cat = ['ok', 'ko', 'maybe', 'maybe']


label_encoder = LabelEncoder()
label_encoder.fit(cat)


cat = label_encoder.transform(cat)

# returns [2 0 1 1], which seams good.
print(cat)

ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')

res = ct.fit_transform([cat])

print(res)

最终结果：

[[1.0 0 0 1]]

预期结果：类似于：

[
 [ 1 0 0 ]
 [ 0 0 1 ]
 [ 0 1 0 ]
 [ 0 1 0 ]
]

有人能指出我遗漏了什么吗？

< P>你可以考虑使用NUMPY和多标签二值化器。< /P>

import numpy as np
from sklearn.preprocessing import MultiLabelBinarizer

cat = np.array([['ok', 'ko', 'maybe', 'maybe']])

m = MultiLabelBinarizer()
print(m.fit_transform(cat.T))

如果你仍然想坚持你的解决方案。您只需按以下方式进行更新：

# because of it still a row, not a column
# res = ct.fit_transform([cat])  => remove this

# it should works
res = ct.fit_transform(np.array([cat]).T)

Out[2]:
array([[0., 0., 1.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 1., 0.]])

可以考虑使用NUMPY和多标签二值化器。< /P>

import numpy as np
from sklearn.preprocessing import MultiLabelBinarizer

cat = np.array([['ok', 'ko', 'maybe', 'maybe']])

m = MultiLabelBinarizer()
print(m.fit_transform(cat.T))

如果你仍然想坚持你的解决方案。您只需按以下方式进行更新：

# because of it still a row, not a column
# res = ct.fit_transform([cat])  => remove this

# it should works
res = ct.fit_transform(np.array([cat]).T)

Out[2]:
array([[0., 0., 1.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 1., 0.]])

onehotcoder（dtype=int）

获取正确的返回数据类型

onehotcoder（dtype=int）

获取正确的返回数据类型