Python 无法转换分类变量,显示类别=自动错误
python版本3.7,spyder 3.3.6。始终显示我使用不同版本python尝试过的错误:Python 无法转换分类变量,显示类别=自动错误,python,scikit-learn,anaconda,data-science,spyder,Python,Scikit Learn,Anaconda,Data Science,Spyder,python版本3.7,spyder 3.3.6。始终显示我使用不同版本python尝试过的错误: import pandas as pa import numpy as np X=0 y=0 dataset = 0 #import the data set and separete the dataset = pa.read_csv("50_Startups.csv") X = dataset.iloc[:,:-1].values y = dataset.iloc[:,4].values
import pandas as pa
import numpy as np
X=0
y=0
dataset = 0
#import the data set and separete the
dataset = pa.read_csv("50_Startups.csv")
X = dataset.iloc[:,:-1].values
y = dataset.iloc[:,4].values
#categorical variable
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer(
[('one_hot_encoder',OneHotEncoder(),[0])],
remainder = 'passthrough'
)
X = np.array(ct.fit_transform(X), dtype=np.float64)
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
错误是:
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py:415: FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values.
If you want the future behaviour and silence this warning, you can specify "categories='auto'".
In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.
warnings.warn(msg, FutureWarning)
Traceback (most recent call last):
File "<ipython-input-5-139c661c06f7>", line 25, in <module>
X = np.array(ct.fit_transform(X), dtype=np.float64)
File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py", line 490, in fit_transform
return self._hstack(list(Xs))
File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py", line 541, in _hstack
raise ValueError("For a sparse output, all columns should"
ValueError: For a sparse output, all columns should be a numeric or convertible to a numeric.
特征矩阵作为X和dep变量作为Y将数据帧转换为numpy数组
`X = dataset.iloc[:,:-1].values`
`Y = dataset.iloc[:,-1].values`
编码分类变量
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
en = LabelEncoder()
X[:,3] = en.fit_transform(X[:,3])
oh = OneHotEncoder(categorical_features=[3])
X = oh.fit_transform(X)
#converting from matrix to array
X = X.toarray()
#Dummy variable trap ---- Removing one dummy variable
X = X[:,1:]
在这里,您可以选择所有包含数字数据的列。您只为分类列安装编码器,然后对其进行转换。并删除虚拟变量。不推荐使用“分类功能”关键字。我们使用上面的警告。根据我的理解,我们必须使用ColumnTransformer,因为HotEncoder应该事先转换为数字,无论如何不?