Apache spark spark:在具有更多列的单行上转置更多行
我在Spark有这种情况Apache spark spark:在具有更多列的单行上转置更多行,apache-spark,Apache Spark,我在Spark有这种情况 +-----+-----+-----+----------+-----------+-----------+ |month|years|id | category|sum(amount)|avg(amount)| +-----+-----+-----+----------+-----------+-----------+ | 1 | 2015| id_1| A | 10000 | 2000 | | 1 | 2015| id_
+-----+-----+-----+----------+-----------+-----------+
|month|years|id | category|sum(amount)|avg(amount)|
+-----+-----+-----+----------+-----------+-----------+
| 1 | 2015| id_1| A | 10000 | 2000 |
| 1 | 2015| id_1| B | 1000 | 100 |
| 1 | 2015| id_1| C | 2000 | 1000 |
+-----+-----+-----+----------+-----------+-----------+
我想得到这个:
+-----------------+-----------------------+-----------------------------------------------+
| | category_A | category_B | category_C |
+-----+-----+-----+-----------+-----------+-----------+-----------+-----------+-----------+
|month|years|id |sum(amount)|avg(amount)|sum(amount)|avg(amount)|sum(amount)|avg(amount)|
+-----+-----+-----+-----------+-----------+-----------+-----------+-----------+-----------+
| 1 | 2015| id_1| 10000 | 2000 | 1000 | 100 | 2000 | 1000 |
+-----+-----+-----+-----------+-----------+-----------+-----------+-----------+-----------+
有可能吗?我使用dataframe和pivot找到了这个解决方案:
df
.groupBy($"month",$"years",$"id")
.pivot("category")
.agg(sum($"amount"),avg($"amount"))
是否有可能使用rdd解决方案