Python 3.x 一个热编码:ValueError:无法将字符串转换为float:';是'; 我正在尝试一个关于类别值的HotEncoder

Python 3.x 一个热编码:ValueError:无法将字符串转换为float:';是'; 我正在尝试一个关于类别值的HotEncoder,python-3.x,machine-learning,scikit-learn,Python 3.x,Machine Learning,Scikit Learn,然而,它的失败与以下错误。会出什么问题? 请帮忙,任何意见都欢迎 下面是代码snipet from sklearn.preprocessing import LabelEncoder, OneHotEncoder labelencoder_X = LabelEncoder() print(X.shape) X[:, 0] = labelencoder_X.fit_transform(X[:, 0]) X[:, 1] = labelencoder_X.fit_transform(X[:, 1])

然而,它的失败与以下错误。会出什么问题? 请帮忙,任何意见都欢迎

下面是代码snipet

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
print(X.shape)
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
X[:, 1] = labelencoder_X.fit_transform(X[:, 1])
print(X)
print(X.shape)
print(y)
#X = X.reshape(len(X[:, 0]), 7)
print(X.shape)
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
print(X.shape)
print(X)
=================================================================== 代码的输出如下所示 看起来问题在于数组格式

 I am a getting following ouput 
(17, 7)
[[2 0 0 'Offline' 'Low' 'Cold' 'No']
 [0 0 0 'Offline' 'High' 'Cold' 'No']
 [3 0 1 'Online' 'High' 'Cold' 'Yes']
 [2 0 1 'Offline' 'Low' 'Hot' 'Yes']
 [2 0 1 'Offline' 'High' 'Hot' 'Yes']
 [2 0 0 'Online' 'High' 'Cold' 'Yes']
 [2 1 1 'Offline' 'Low' 'Hot' 'No']
 [2 1 0 'Offline' 'Low' 'Cold' 'No']
 [0 1 0 'Online' 'Low' 'Cold' 'Yes']
 [3 1 1 'Online' 'Low' 'Hot' 'Yes']
 [1 1 0 'Offline' 'Low' 'Hot' 'No']
 [2 1 1 'Offline' 'Low' 'Hot' 'Yes']
 [3 1 1 'Online' 'High' 'Hot' 'Yes']
 [2 1 0 'Online' 'High' 'Hot' 'No']
 [2 2 2 'Offline' 'Low' 'Hot' 'Yes']
 [2 2 1 'Offline' 'Low' 'Cold' 'No']
 [1 2 0 'Offline' 'High' 'Cold' 'Yes']]
(17, 7)
['Low' 'Low' 'High' 'High' 'High' 'Low' 'Low' 'Low' 'Low' 'High' 'Low'
 'High' 'High' 'High' 'High' 'Low' 'Low']
(17, 7)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-42-84bec98371d4> in <module>()
     28 print(X.shape)
     29 onehotencoder = OneHotEncoder(categorical_features = [0])
---> 30 X = onehotencoder.fit_transform(X).toarray()
     31 print(X.shape)
     32 print(X)

C:\Users\patilsi\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\sklearn\preprocessing\data.py in fit_transform(self, X, y)
   2017         """
   2018         return _transform_selected(X, self._fit_transform,
-> 2019                                    self.categorical_features, copy=True)
   2020 
   2021     def _transform(self, X):

C:\Users\patilsi\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\sklearn\preprocessing\data.py in _transform_selected(X, transform, selected, copy)
   1807     X : array or sparse matrix, shape=(n_samples, n_features_new)
   1808     """
-> 1809     X = check_array(X, accept_sparse='csc', copy=copy, dtype=FLOAT_DTYPES)
   1810 
   1811     if isinstance(selected, six.string_types) and selected == "all":

C:\Users\patilsi\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    431                                       force_all_finite)
    432     else:
--> 433         array = np.array(array, dtype=dtype, order=order, copy=copy)
    434 
    435         if ensure_2d:

(17, 7)
[[2 0 0 'Offline' 'Low' 'Cold' 'No']
 [0 0 0 'Offline' 'High' 'Cold' 'No']
 [3 0 1 'Online' 'High' 'Cold' 'Yes']
 [2 0 1 'Offline' 'Low' 'Hot' 'Yes']
 [2 0 1 'Offline' 'High' 'Hot' 'Yes']
 [2 0 0 'Online' 'High' 'Cold' 'Yes']
 [2 1 1 'Offline' 'Low' 'Hot' 'No']
 [2 1 0 'Offline' 'Low' 'Cold' 'No']
 [0 1 0 'Online' 'Low' 'Cold' 'Yes']
 [3 1 1 'Online' 'Low' 'Hot' 'Yes']
 [1 1 0 'Offline' 'Low' 'Hot' 'No']
 [2 1 1 'Offline' 'Low' 'Hot' 'Yes']
 [3 1 1 'Online' 'High' 'Hot' 'Yes']
 [2 1 0 'Online' 'High' 'Hot' 'No']
 [2 2 2 'Offline' 'Low' 'Hot' 'Yes']
 [2 2 1 'Offline' 'Low' 'Cold' 'No']
 [1 2 0 'Offline' 'High' 'Cold' 'Yes']]
(17, 7)
['Low' 'Low' 'High' 'High' 'High' 'Low' 'Low' 'Low' 'Low' 'High' 'Low'
 'High' 'High' 'High' 'High' 'Low' 'Low']
(17, 7)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-42-84bec98371d4> in <module>()
     28 print(X.shape)
     29 onehotencoder = OneHotEncoder(categorical_features = [0])
---> 30 X = onehotencoder.fit_transform(X).toarray()
     31 print(X.shape)
     32 print(X)

C:\Users\patilsi\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\sklearn\preprocessing\data.py in fit_transform(self, X, y)
   2017         """
   2018         return _transform_selected(X, self._fit_transform,
-> 2019                                    self.categorical_features, copy=True)
   2020 
   2021     def _transform(self, X):

C:\Users\patilsi\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\sklearn\preprocessing\data.py in _transform_selected(X, transform, selected, copy)
   1807     X : array or sparse matrix, shape=(n_samples, n_features_new)
   1808     """
-> 1809     X = check_array(X, accept_sparse='csc', copy=copy, dtype=FLOAT_DTYPES)
   1810 
   1811     if isinstance(selected, six.string_types) and selected == "all":

C:\Users\patilsi\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    431                                       force_all_finite)
    432     else:
--> 433         array = np.array(array, dtype=dtype, order=order, copy=copy)
    434 
    435         if ensure_2d:

ValueError: could not convert string to float: 'Yes'
我是一个被跟踪的输出
(17, 7)
[[2 0 0 0'脱机''低''冷''否']
[0'脱机''高''冷''否']
[3 0 1'联机''高''冷''是']
[2 0 1“脱机”“低”“热”“是”]
[2 0 1“脱机”“高”“热”“是”]
[2 0 0“联机”“高”“冷”“是”]
[2 1 1“脱机”“低”“热”“否”]
[2 1 0“脱机”“低”“冷”“否”]
[0 1 0'联机''低''冷''是']
[3 1 1“在线”“低”“热”“是”]
[1 10“脱机”“低”“热”“否”]
[2 1 1“脱机”“低”“热”“是”]
[3 1 1“在线”“高”“热门”“是”]
[2 1 0'联机''高''热''否']
[2“脱机”“低”“热”“是”]
[2 1“脱机”“低”“冷”“否”]
[1 2 0“脱机”“高”“冷”“是”]]
(17, 7)
[“低”“低”“高”“高”“低”“低”“低”“低”“高”“低”
“高”“高”“高”“低”“低”]
(17, 7)
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在()
28打印(X形)
29 onehotencoder=onehotencoder(分类功能=[0])
--->30 X=onehotencoder.fit_transform(X).toarray()
31打印(X形)
32份打印件(X)
C:\Users\patilsi\AppData\Local\enthught\Canopy\edm\envs\User\lib\site packages\sklearn\preprocessing\data.py in fit\u transform(self,X,y)
2017         """
2018年返回-选择转换(X,自适配转换,
->2019 self.categorical_功能,copy=True)
2020
2021 def_变换(自,X):
C:\Users\patilsi\AppData\Local\enthught\corporation\edm\envs\User\lib\site packages\sklearn\preprocessing\data.py in\u transform\u selected(X,transform,selected,copy)
1807 X:阵列或稀疏矩阵,形状=(n个样本,n个特征\u新)
1808     """
->1809 X=检查数组(X,接受稀疏=csc',复制=复制,数据类型=浮动\U数据类型)
1810
1811如果isinstance(已选择,六种.string_类型)和selected==“all”:
检查数组中的C:\Users\patilsi\AppData\Local\enthught\Canopy\edm\envs\User\lib\site packages\sklearn\utils\validation.py(数组、接受稀疏、数据类型、顺序、复制、强制所有有限、确保二维、允许nd、确保最小样本、确保最小特征、警告数据类型、估计器)
431力(所有有限)
432其他:
-->433 array=np.array(array,dtype=dtype,order=order,copy=copy)
434
435如果确保_2d:
(17, 7)
[[2 0 0 0'脱机''低''冷''否']
[0'脱机''高''冷''否']
[3 0 1'联机''高''冷''是']
[2 0 1“脱机”“低”“热”“是”]
[2 0 1“脱机”“高”“热”“是”]
[2 0 0“联机”“高”“冷”“是”]
[2 1 1“脱机”“低”“热”“否”]
[2 1 0“脱机”“低”“冷”“否”]
[0 1 0'联机''低''冷''是']
[3 1 1“在线”“低”“热”“是”]
[1 10“脱机”“低”“热”“否”]
[2 1 1“脱机”“低”“热”“是”]
[3 1 1“在线”“高”“热门”“是”]
[2 1 0'联机''高''热''否']
[2“脱机”“低”“热”“是”]
[2 1“脱机”“低”“冷”“否”]
[1 2 0“脱机”“高”“冷”“是”]]
(17, 7)
[“低”“低”“高”“高”“低”“低”“低”“低”“高”“低”
“高”“高”“高”“低”“低”]
(17, 7)
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在()
28打印(X形)
29 onehotencoder=onehotencoder(分类功能=[0])
--->30 X=onehotencoder.fit_transform(X).toarray()
31打印(X形)
32份打印件(X)
C:\Users\patilsi\AppData\Local\enthught\Canopy\edm\envs\User\lib\site packages\sklearn\preprocessing\data.py in fit\u transform(self,X,y)
2017         """
2018年返回-选择转换(X,自适配转换,
->2019 self.categorical_功能,copy=True)
2020
2021 def_变换(自,X):
C:\Users\patilsi\AppData\Local\enthught\corporation\edm\envs\User\lib\site packages\sklearn\preprocessing\data.py in\u transform\u selected(X,transform,selected,copy)
1807 X:阵列或稀疏矩阵,形状=(n个样本,n个特征\u新)
1808     """
->1809 X=检查数组(X,接受稀疏=csc',复制=复制,数据类型=浮动\U数据类型)
1810
1811如果isinstance(已选择,六种.string_类型)和selected==“all”:
检查数组中的C:\Users\patilsi\AppData\Local\enthught\Canopy\edm\envs\User\lib\site packages\sklearn\utils\validation.py(数组、接受稀疏、数据类型、顺序、复制、强制所有有限、确保二维、允许nd、确保最小样本、确保最小特征、警告数据类型、估计器)
431力(所有有限)
432其他:
-->433 array=np.array(array,dtype=dtype,order=order,copy=copy)
434
435如果确保_2d:
ValueError:无法将字符串转换为浮点:“是”

您应该在所需的列上应用OneHotEncoder,如:

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

onehotencoder = OneHotEncoder()
X_0 = onehotencoder.fit_transform(X[:, 0]).toarray()
X_1 = onehotencoder.fit_transform(X[:, 1]).toarray()
这将根据
X[:,0]
X[:,1]

在你可以自由合并矩阵或任何东西之后。如果您想知道该列或特定类别,可以访问
onehotcoder.feature\u index
,但当您使用相同的OHE时,您将丢失功能X0的信息


我希望这会有所帮助,

即使您指定了
分类功能=[0]
,OneHotEncoder仍会检查所有列的数据是否与scikit learn兼容,从而在其他列包含字符串数据时抛出错误

from sklearn.preprocessing import LabelEncoder, OneHotEncoder labelencoder_X = LabelEncoder() print(X.shape) X[:, 0] = labelencoder_X.fit_transform(X[:, 0]) X[:, 1] = labelencoder_X.fit_transform(X[:, 1]) print(X) print(X.shape) print(y) #X = X.reshape(len(X[:, 0]), 7) print(X.shape) onehotencoder = OneHotEncoder() categorical_features = [0] # Send only the first column to onehotencoder X_oneHotEncoded = onehotencoder.fit_transform(X[:, categorical_features]).toarray() # Combine the two arrays back together X_final = np.hstack((X_oneHotEncoded, X[:,1:]))