Python 熊猫:得到假人
我有以下数据帧:Python 熊猫:得到假人,python,pandas,dummy-variable,Python,Pandas,Dummy Variable,我有以下数据帧: amount catcode cid cycle date di feccandid type 0 1000 E1600 N00029285 2014 2014-05-15 D H8TX22107 24K 1 5000 G4600 N00026722 2014 2013-10-22 D H4TX28046 24K 2 4 C2100 N000
amount catcode cid cycle date di feccandid type
0 1000 E1600 N00029285 2014 2014-05-15 D H8TX22107 24K
1 5000 G4600 N00026722 2014 2013-10-22 D H4TX28046 24K
2 4 C2100 N00030676 2014 2014-03-26 D H0MO07113 24Z
Survived Pclass Sex Age Fare
0 0 3 male 22.0 7.2500
1 1 1 female 38.0 71.2833
2 1 3 female 26.0 7.9250
3 1 1 female 35.0 53.1000
4 0 3 male 35.0 8.0500
我想为列type
中的值创建虚拟变量。大约15分钟。我试过这个:
pd.get_假人(df['type'])
它返回这个:
24A 24C 24E 24F 24K 24N 24P 24R 24Z
date
2014-05-15 0 0 0 0 1 0 0 0 0
2013-10-22 0 0 0 0 1 0 0 0 0
2014-03-26 0 0 0 0 0 0 0 0 1
我想为类型中的每个唯一值设置一个虚拟变量列,您可以尝试:
df = pd.get_dummies(df, columns=['type'])
假设我有以下数据帧:
amount catcode cid cycle date di feccandid type
0 1000 E1600 N00029285 2014 2014-05-15 D H8TX22107 24K
1 5000 G4600 N00026722 2014 2013-10-22 D H4TX28046 24K
2 4 C2100 N00030676 2014 2014-03-26 D H0MO07113 24Z
Survived Pclass Sex Age Fare
0 0 3 male 22.0 7.2500
1 1 1 female 38.0 71.2833
2 1 3 female 26.0 7.9250
3 1 1 female 35.0 53.1000
4 0 3 male 35.0 8.0500
有两种方法可以实现get_假人:
方法1:
one_hot = pd.get_dummies(dataset, columns = ['Sex'])
one_hot = pd.get_dummies(dataset['Sex'])
这将返回:
Survived Pclass Age Fare Sex_female Sex_male
0 0 3 22 7.2500 0 1
1 1 1 38 71.2833 1 0
2 1 3 26 7.9250 1 0
3 1 1 35 53.1000 1 0
4 0 3 35 8.0500 0 1
female male
0 0 1
1 1 0
2 1 0
3 1 0
4 0 1
方法2:
one_hot = pd.get_dummies(dataset, columns = ['Sex'])
one_hot = pd.get_dummies(dataset['Sex'])
这将返回:
Survived Pclass Age Fare Sex_female Sex_male
0 0 3 22 7.2500 0 1
1 1 1 38 71.2833 1 0
2 1 3 26 7.9250 1 0
3 1 1 35 53.1000 1 0
4 0 3 35 8.0500 0 1
female male
0 0 1
1 1 0
2 1 0
3 1 0
4 0 1
请尝试:
type_dummies=pd.get_dummies(df['type'],drop_first=True)
df=pd.concat([df,type_dummies],axis=1)你是说pd.get_dummies(df['type'])
?是的!非常感谢。现在有没有一种方法可以添加它?是做我的df还是我应该做一个联接?你希望最终的df实际是什么样子?新的fd应该在新的DFS中包含虚拟列,这样你就可以join
然后:df.join(pd.get_dummies(df['type'))