Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/363.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 获取熊猫中的假人用法_Python_Pandas - Fatal编程技术网

Python 获取熊猫中的假人用法

Python 获取熊猫中的假人用法,python,pandas,Python,Pandas,我正在读一本介绍使用Python进行机器学习的书。本文作者描述如下 比如说,对于工人阶级的特征,我们可能有“政府”的价值观 “雇员”、“私人雇员”、“自营职业者”和“自营职业者公司” 特德” 我的问题是什么是新专栏工作类 它是使用列workclass的字符串值创建的: data = pd.DataFrame({'age':[1,1,1,2,1,1], 'workclass':['Government Employee','Private Employee',

我正在读一本介绍使用Python进行机器学习的书。本文作者描述如下 比如说,对于工人阶级的特征,我们可能有“政府”的价值观 “雇员”、“私人雇员”、“自营职业者”和“自营职业者公司” 特德”


我的问题是什么是新专栏工作类

它是使用列
workclass
的字符串值创建的:

data = pd.DataFrame({'age':[1,1,1,2,1,1],
                   'workclass':['Government Employee','Private Employee','Self Employed','Self Employed Incorpora ted','Self Employed Incorpora ted','?']})

print (data)
   age                    workclass
0    1          Government Employee
1    1             Private Employee
2    1                Self Employed
3    2  Self Employed Incorpora ted
4    1  Self Employed Incorpora ted
5    1                            ?

如果有多个列具有相同的值,则此前缀非常有用:

data = pd.DataFrame({'age':[1,1,3],
                   'workclass':['Government Employee','Private Employee','?'],
                   'workclass1':['Government Employee','Private Employee','Self Employed']})

print (data)
   age            workclass           workclass1
0    1  Government Employee  Government Employee
1    1     Private Employee     Private Employee
2    3                    ?        Self Employed

data_dummies = pd.get_dummies(data)
print (data_dummies)
   age  workclass_?  workclass_Government Employee  \
0    1            0                              1   
1    1            0                              0   
2    3            1                              0   

   workclass_Private Employee  workclass1_Government Employee  \
0                           0                               1   
1                           1                               0   
2                           0                               0   

   workclass1_Private Employee  workclass1_Self Employed  
0                            0                         0  
1                            1                         0  
2                            0                         1  
如果不需要,可以添加参数,以便用空格覆盖:

data_dummies = pd.get_dummies(data, prefix='', prefix_sep='')
print (data_dummies)
   age  ?  Government Employee  Private Employee  Government Employee  \
0    1  0                    1                 0                    1   
1    1  0                    0                 1                    0   
2    3  1                    0                 0                    0   

   Private Employee  Self Employed  
0                 0              0  
1                 1              0  
2                 0              1  
然后可以按列进行分组,并为每个唯一列的假人聚合
max

print (data_dummies.groupby(level=0, axis=1).max())
   ?  Government Employee  Private Employee  Self Employed  age
0  0                    1                 0              0    1
1  0                    0                 1              0    1
2  1                    0                 0              1    3

事实上,这里我们不是在观察工人阶级?但作者提到了这一点。这是什么column@Thanks明白了
data_dummies = pd.get_dummies(data, prefix='', prefix_sep='')
print (data_dummies)
   age  ?  Government Employee  Private Employee  Government Employee  \
0    1  0                    1                 0                    1   
1    1  0                    0                 1                    0   
2    3  1                    0                 0                    0   

   Private Employee  Self Employed  
0                 0              0  
1                 1              0  
2                 0              1  
print (data_dummies.groupby(level=0, axis=1).max())
   ?  Government Employee  Private Employee  Self Employed  age
0  0                    1                 0              0    1
1  0                    0                 1              0    1
2  1                    0                 0              1    3