Python 使用scikit learn'；将分类变量转换为数字；s标签编码器_Python_Pandas_Machine Learning_Scikit Learn

Python 使用scikit learn'；将分类变量转换为数字；s标签编码器

python pandas machine-learning scikit-learn

Python 使用scikit learn'；将分类变量转换为数字；s标签编码器,python,pandas,machine-learning,scikit-learn,Python,Pandas,Machine Learning,Scikit Learn,我正在尝试使用scikit learn的LabelEncoder将分类变量转换为数字变量： from sklearn.preprocessing import LabelEncoder var_mod = ['Gender', 'Married', 'Dependents', 'Education', 'Self_Employed', 'Property_Area', 'Loan_Status'] le = LabelEncoder() for i in var_mod: data_tr

我正在尝试使用scikit learn的LabelEncoder将分类变量转换为数字变量：

from sklearn.preprocessing import LabelEncoder
var_mod = ['Gender', 'Married', 'Dependents', 'Education', 'Self_Employed', 'Property_Area', 'Loan_Status']
le = LabelEncoder()
for i in var_mod:
    data_train[i] = le.fit_transform(data_train[i])
data_train.dtypes

我使用Python3.6.4在Jupyter中运行它。它给了我以下python错误：

TypeError:“类型为

object

的列中有一列具有混合的对象类型（它们不都是字符串）。。。找出它是哪一个（可能只是在循环的开始处放一个

print（i）

，这样你就知道它在哪里断了…然后做一个

data\u train[i].map（type）.value\u counts（）

来查看分布情况…然后找出你想如何处理该场景…（可能先强制列都是字符串等等）对于

Gender

和like类型已经是int64，所以可能不需要编码。顺便说一句，你可以在pandas中进行标签编码和dummification：对我来说，这比sklearn要好得多。@JonClements你说得绝对正确！它在“Dependents”处被破坏了，所以我使用了

数据训练[“Dependents]”。replace（）

将它们全部转换为整数。非常感谢！类型为

object

的列中有一列具有混合的对象类型（它们不都是字符串）…找出它是哪一种类型（可能只需在循环的开头放一个

print（i）

，这样你就知道它在哪里断开了…然后做一个

data\u train[i]。映射（类型）.value_counts（）

以查看分发内容…然后确定如何处理该场景…（可能会先强制列为所有字符串等）对于

Gender

数据训练[“Dependents]”。replace（）

将它们全部转换为整数。非常感谢！

Loan_ID               object
Gender                 int64
Married                int64
Dependents            object
Education             object
Self_Employed         object
ApplicantIncome        int64
CoapplicantIncome    float64
LoanAmount           float64
Loan_Amount_Term     float64
Credit_History       float64
Property_Area         object
Loan_Status           object
LoanAmount_log       float64
TotalIncome          float64
TotalIncome_log      float64
dtype: object