Python 获取熊猫中的假人用法
我正在读一本介绍使用Python进行机器学习的书。本文作者描述如下 比如说,对于工人阶级的特征,我们可能有“政府”的价值观 “雇员”、“私人雇员”、“自营职业者”和“自营职业者公司” 特德”Python 获取熊猫中的假人用法,python,pandas,Python,Pandas,我正在读一本介绍使用Python进行机器学习的书。本文作者描述如下 比如说,对于工人阶级的特征,我们可能有“政府”的价值观 “雇员”、“私人雇员”、“自营职业者”和“自营职业者公司” 特德” 我的问题是什么是新专栏工作类 它是使用列workclass的字符串值创建的: data = pd.DataFrame({'age':[1,1,1,2,1,1], 'workclass':['Government Employee','Private Employee',
我的问题是什么是新专栏工作类 它是使用列
workclass
的字符串值创建的:
data = pd.DataFrame({'age':[1,1,1,2,1,1],
'workclass':['Government Employee','Private Employee','Self Employed','Self Employed Incorpora ted','Self Employed Incorpora ted','?']})
print (data)
age workclass
0 1 Government Employee
1 1 Private Employee
2 1 Self Employed
3 2 Self Employed Incorpora ted
4 1 Self Employed Incorpora ted
5 1 ?
如果有多个列具有相同的值,则此前缀非常有用:
data = pd.DataFrame({'age':[1,1,3],
'workclass':['Government Employee','Private Employee','?'],
'workclass1':['Government Employee','Private Employee','Self Employed']})
print (data)
age workclass workclass1
0 1 Government Employee Government Employee
1 1 Private Employee Private Employee
2 3 ? Self Employed
data_dummies = pd.get_dummies(data)
print (data_dummies)
age workclass_? workclass_Government Employee \
0 1 0 1
1 1 0 0
2 3 1 0
workclass_Private Employee workclass1_Government Employee \
0 0 1
1 1 0
2 0 0
workclass1_Private Employee workclass1_Self Employed
0 0 0
1 1 0
2 0 1
如果不需要,可以添加参数,以便用空格覆盖:
data_dummies = pd.get_dummies(data, prefix='', prefix_sep='')
print (data_dummies)
age ? Government Employee Private Employee Government Employee \
0 1 0 1 0 1
1 1 0 0 1 0
2 3 1 0 0 0
Private Employee Self Employed
0 0 0
1 1 0
2 0 1
然后可以按列进行分组,并为每个唯一列的假人聚合max
:
print (data_dummies.groupby(level=0, axis=1).max())
? Government Employee Private Employee Self Employed age
0 0 1 0 0 1
1 0 0 1 0 1
2 1 0 0 1 3
事实上,这里我们不是在观察工人阶级?但作者提到了这一点。这是什么column@Thanks明白了
data_dummies = pd.get_dummies(data, prefix='', prefix_sep='')
print (data_dummies)
age ? Government Employee Private Employee Government Employee \
0 1 0 1 0 1
1 1 0 0 1 0
2 3 1 0 0 0
Private Employee Self Employed
0 0 0
1 1 0
2 0 1
print (data_dummies.groupby(level=0, axis=1).max())
? Government Employee Private Employee Self Employed age
0 0 1 0 0 1
1 0 0 1 0 1
2 1 0 0 1 3