如何在python中使用分类因子变量
我正在使用此数据集,希望将如何在python中使用分类因子变量,python,pandas,dataframe,categorical-data,Python,Pandas,Dataframe,Categorical Data,我正在使用此数据集,希望将年龄、收入等变量作为因子变量放在R中,如何在python中实现这一点您可以使用参数类别: age income student credit_rating Class_buys_computer 0 youth high no fair no 1 youth high no excellent no 2 middle_aged high no fair yes 3 se
年龄
、收入
等变量作为因子变量
放在R
中,如何在python中实现这一点您可以使用参数类别
:
age income student credit_rating Class_buys_computer
0 youth high no fair no
1 youth high no excellent no
2 middle_aged high no fair yes
3 senior medium no fair yes
4 senior low yes fair yes
5 senior low yes excellent no
6 middle_aged low yes excellent yes
7 youth medium no fair no
8 youth low yes fair yes
9 senior medium yes fair yes
10 youth medium yes excellent yes
11 middle_aged medium no excellent yes
12 middle_aged high yes fair yes
13 senior medium no excellent no
如果需要转换所有列:
cols = ['age','income','student']
for col in cols:
df[col] = df[col].astype('category')
print (df.dtypes)
age category
income category
student category
credit_rating object
Class_buys_computer object
dtype: object
您需要循环,因为如果使用:
for col in df.columns:
df[col] = df[col].astype('category')
print (df.dtypes)
age category
income category
student category
credit_rating category
Class_buys_computer category
dtype: object
NotImplementedError:>1 ndim分类目前不受支持
按注释编辑:
如果需要有序分类,请使用另一种解决方案:
我需要一个python(Pandas)R中的解决方案,它内置了对factors的支持。尽管pandas有分类数据类型,但许多库要求您使用Dummie。你可能需要使用pandas的get_dummies或scikit learn的OneHotEncoder。有可能像这个年轻人一样给予吗?是的,当然,给我一点时间。
df = df.astype('category')
df['age']=pd.Categorical(df['age'],categories=["youth","middle_aged","senior"],ordered=True)
print (df.age)
0 youth
1 youth
2 middle_aged
3 senior
4 senior
5 senior
6 middle_aged
7 youth
8 youth
9 senior
10 youth
11 middle_aged
12 middle_aged
13 senior
Name: age, dtype: category
Categories (3, object): [youth < middle_aged < senior]
df = df.sort_values('age')
print (df)
age income student credit_rating Class_buys_computer
0 youth high no fair no
1 youth high no excellent no
7 youth medium no fair no
8 youth low yes fair yes
10 youth medium yes excellent yes
2 middle_aged high no fair yes
6 middle_aged low yes excellent yes
11 middle_aged medium no excellent yes
12 middle_aged high yes fair yes
3 senior medium no fair yes
4 senior low yes fair yes
5 senior low yes excellent no
9 senior medium yes fair yes
13 senior medium no excellent no