Python 熊猫:得到假人

Python 熊猫:得到假人,python,pandas,dummy-variable,Python,Pandas,Dummy Variable,我有以下数据帧: amount catcode cid cycle date di feccandid type 0 1000 E1600 N00029285 2014 2014-05-15 D H8TX22107 24K 1 5000 G4600 N00026722 2014 2013-10-22 D H4TX28046 24K 2 4 C2100 N000

我有以下数据帧:

   amount  catcode    cid      cycle      date     di  feccandid    type
0   1000    E1600   N00029285   2014    2014-05-15  D   H8TX22107   24K
1   5000    G4600   N00026722   2014    2013-10-22  D   H4TX28046   24K
2      4    C2100   N00030676   2014    2014-03-26  D   H0MO07113   24Z
   Survived  Pclass     Sex   Age     Fare
0         0       3    male  22.0   7.2500
1         1       1  female  38.0  71.2833
2         1       3  female  26.0   7.9250
3         1       1  female  35.0  53.1000
4         0       3    male  35.0   8.0500
我想为列
type
中的值创建虚拟变量。大约15分钟。我试过这个:

pd.get_假人(df['type'])

它返回这个:

           24A  24C  24E  24F  24K  24N  24P  24R  24Z
date                                    
2014-05-15  0    0    0    0    1    0    0    0    0
2013-10-22  0    0    0    0    1    0    0    0    0
2014-03-26  0    0    0    0    0    0    0    0    1
我想为
类型中的每个唯一值设置一个虚拟变量列,您可以尝试:

df = pd.get_dummies(df, columns=['type'])

假设我有以下数据帧:

   amount  catcode    cid      cycle      date     di  feccandid    type
0   1000    E1600   N00029285   2014    2014-05-15  D   H8TX22107   24K
1   5000    G4600   N00026722   2014    2013-10-22  D   H4TX28046   24K
2      4    C2100   N00030676   2014    2014-03-26  D   H0MO07113   24Z
   Survived  Pclass     Sex   Age     Fare
0         0       3    male  22.0   7.2500
1         1       1  female  38.0  71.2833
2         1       3  female  26.0   7.9250
3         1       1  female  35.0  53.1000
4         0       3    male  35.0   8.0500
有两种方法可以实现get_假人:

方法1:

one_hot = pd.get_dummies(dataset, columns = ['Sex'])
one_hot = pd.get_dummies(dataset['Sex'])
这将返回:

   Survived  Pclass  Age     Fare  Sex_female  Sex_male
0         0       3   22   7.2500           0         1
1         1       1   38  71.2833           1         0
2         1       3   26   7.9250           1         0
3         1       1   35  53.1000           1         0
4         0       3   35   8.0500           0         1
   female  male
0       0     1
1       1     0
2       1     0
3       1     0
4       0     1
方法2:

one_hot = pd.get_dummies(dataset, columns = ['Sex'])
one_hot = pd.get_dummies(dataset['Sex'])
这将返回:

   Survived  Pclass  Age     Fare  Sex_female  Sex_male
0         0       3   22   7.2500           0         1
1         1       1   38  71.2833           1         0
2         1       3   26   7.9250           1         0
3         1       1   35  53.1000           1         0
4         0       3   35   8.0500           0         1
   female  male
0       0     1
1       1     0
2       1     0
3       1     0
4       0     1
请尝试:

type_dummies=pd.get_dummies(df['type'],drop_first=True)


df=pd.concat([df,type_dummies],axis=1)

你是说
pd.get_dummies(df['type'])
?是的!非常感谢。现在有没有一种方法可以添加它?是做我的df还是我应该做一个联接?你希望最终的df实际是什么样子?新的fd应该在新的DFS中包含虚拟列,这样你就可以
join
然后:
df.join(pd.get_dummies(df['type'))