如何在spark scala数据帧的数据透视列中具有值_Scala_Dataframe_Apache Spark_Pivot

如何在spark scala数据帧的数据透视列中具有值

scala dataframe apache-spark

如何在spark scala数据帧的数据透视列中具有值,scala,dataframe,apache-spark,pivot,Scala,Dataframe,Apache Spark,Pivot,我有一个带有值的数据框 +---+-----------------------+----------+---------+------------+---------+ |id |database_name |users |groups |type |isAllowed| +---+-----------------------+----------+---------+------------+---------+ |73 |[ww_hr_dl_h

我有一个带有值的数据框

+---+-----------------------+----------+---------+------------+---------+
|id |database_name          |users     |groups   |type        |isAllowed|
+---+-----------------------+----------+---------+------------+---------+
|73 |[ww_hr_dl_highsecure]  |[hive]    |[hrhs]   |select      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |select      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |update      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |create      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |drop        |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |alter       |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |index       |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |lock        |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |all         |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |read        |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |write       |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |repladmin   |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |serviceadmin|true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |select      |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |update      |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |create      |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |drop        |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |alter       |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |all         |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |read        |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |write       |true     |
|84 |[ww_core_dim_dl_tables]|[]        |[walmart]|select      |true     |
|84 |[ww_core_dim_dl_tables]|[]        |[walmart]|read        |true     |
+---+-----------------------+----------+---------+------------+---------+

我想基于类型列透视我的数据帧。所以我需要的结果数据帧是

id db_name                 users    group select update create  drop  alter
73 ww_hr_dl_highsecure     hive     hrhs   true    null    null  null  null
73 ww_hr_dl_highsecure     svchrdat null   true    true    true  true  true
84 ww_core_dim_dl_tables   svc02001 true   true    true    true  true  true

我不知道如何使我作为数据透视的新列中的值成为原始数据帧中允许的列中的值

我到现在为止所做的是

val dfs3 = dfs2.groupBy("database_name","users").pivot("type").expr("isAllowed")

将组添加到groupBy是否是一个可接受的结果

df.groupBy("database_name","users","groups").pivot("type").agg(first("isAllowed")).show(false)

输出：

+-----------------------+----------+---------+----+-----+------+----+-----+----+----+---------+------+------------+------+-----+
|database_name          |users     |groups   |all |alter|create|drop|index|lock|read|repladmin|select|serviceadmin|update|write|
+-----------------------+----------+---------+----+-----+------+----+-----+----+----+---------+------+------------+------+-----+
|[ww_core_dim_dl_tables]|[svc02001]|[]       |true|true |true  |true|null |null|true|null     |true  |null        |true  |true |
|[ww_hr_dl_highsecure]  |[svchrdat]|[]       |true|true |true  |true|true |true|true|true     |true  |true        |true  |true |
|[ww_hr_dl_highsecure]  |[hive]    |[hrhe]   |null|null |null  |null|null |null|null|null     |true  |null        |null  |null |
|[ww_core_dim_dl_tables]|[]        |[walmart]|null|null |null  |null|null |null|true|null     |true  |null        |null  |null |
+-----------------------+----------+---------+----+-----+------+----+-----+----+----+---------+------+------------+------+-----+

也许可以尝试用

agg（first（$“isAllowed”）

替换

expr（“isAllowed”）

。