Pandas 将分类列转换为单个虚拟变量列
假设我有以下数据帧:Pandas 将分类列转换为单个虚拟变量列,pandas,dataframe,machine-learning,categorical-data,one-hot-encoding,Pandas,Dataframe,Machine Learning,Categorical Data,One Hot Encoding,假设我有以下数据帧: Survived Pclass Sex Age Fare 0 0 3 male 22.0 7.2500 1 1 1 female 38.0 71.2833 2 1 3 female 26.0 7.9250 3 1 1 female 35.0 53.1000 4 0 3 m
Survived Pclass Sex Age Fare
0 0 3 male 22.0 7.2500
1 1 1 female 38.0 71.2833
2 1 3 female 26.0 7.9250
3 1 1 female 35.0 53.1000
4 0 3 male 35.0 8.0500
Category Message
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup final...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
我使用get_dummies()函数来创建虚拟变量。代码和输出如下所示:
one_hot = pd.get_dummies(dataset, columns = ['Category'])
这将返回:
Survived Pclass Age Fare Sex_female Sex_male
0 0 3 22 7.2500 0 1
1 1 1 38 71.2833 1 0
2 1 3 26 7.9250 1 0
3 1 1 35 53.1000 1 0
4 0 3 35 8.0500 0 1
我想要的是一个单独的Sex列,其值为0或1,而不是2列
有趣的是,当我在不同的数据帧上使用get_dummies()时,它的工作方式与我所希望的一样。对于以下数据帧:
Survived Pclass Sex Age Fare
0 0 3 male 22.0 7.2500
1 1 1 female 38.0 71.2833
2 1 3 female 26.0 7.9250
3 1 1 female 35.0 53.1000
4 0 3 male 35.0 8.0500
Category Message
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup final...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
使用代码:
one_hot = pd.get_dummies(dataset, columns = ['Category'])
它返回:
Message ... Category_spam
0 Go until jurong point, crazy.. Available only ... ... 0
1 Ok lar... Joking wif u oni... ... 0
2 Free entry in 2 a wkly comp to win FA Cup fina... ... 1
3 U dun say so early hor... U c already then say... ... 0
4 Nah I don't think he goes to usf, he lives aro... ... 0
以下是您可以使用的多种方法:
from sklearn.preprocessing import LabelEncoder
lbl=LabelEncoder()
df['Sex_encoded'] = lbl.fit_transform(df['Sex'])
# using only pandas
df['Sex_encoded'] = df['Sex'].map({'male': 0, 'female': 1})
Survived Pclass Sex Age Fare Sex_encoded
0 0 3 male 22.0 7.2500 0
1 1 1 female 38.0 71.2833 1
2 1 3 female 26.0 7.9250 1
3 1 1 female 35.0 53.1000 1
4 0 3 male 35.0 8.0500 0
为什么get_假人在问题中提到的2个不同数据帧上的工作方式不同?@user41855可能检查数据帧中变量的类型,可能不同。