Pivot table和pyspark中的onehot_Pyspark_Pyspark Sql

Pivot table和pyspark中的onehot

pyspark

Pivot table和pyspark中的onehot,pyspark,pyspark-sql,Pyspark,Pyspark Sql,我有一个pyspark数据框，看起来像- id age cost gender 1 38 230 M 2 40 832 M 3 53 987 F 1 38 764 M 4 63 872 F 5 21 763 F 我希望我的数据框看起来像- id

我有一个pyspark数据框，看起来像-

id      age      cost     gender
1        38       230      M
2        40       832      M
3        53       987      F
1        38       764      M
4        63       872      F
5        21       763      F

我希望我的数据框看起来像-

id      age      cost     gender    M       F
1        38       230      M        1       0
2        40       832      M        1       0
3        53       987      F        0       1
1        38       764      M        1       0
4        63       872      F        0       1
5        21       763      F        0       1
4        63      1872      F        0       1

使用python，我可以按照以下方式进行管理-

final_df = pd.concat([df.drop(['gender'], axis=1), pd.get_dummies(df['gender'])], axis=1)

如何在pyspark中进行管理？

只需添加两列：

从pyspark.sql导入函数为F
最终_df=df.select(
“身份证”，
“年龄”，
“成本”，
“性别”，
F.when（F.col（“性别”）==F.lit（“M”），1）。否则（0）。别名（“M”），
F.when（F.col（“性别”）==F.lit（“F”），1）。否则（0）。别名（“F”），
)