Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark spark:在具有更多列的单行上转置更多行_Apache Spark - Fatal编程技术网

Apache spark spark:在具有更多列的单行上转置更多行

Apache spark spark:在具有更多列的单行上转置更多行,apache-spark,Apache Spark,我在Spark有这种情况 +-----+-----+-----+----------+-----------+-----------+ |month|years|id | category|sum(amount)|avg(amount)| +-----+-----+-----+----------+-----------+-----------+ | 1 | 2015| id_1| A | 10000 | 2000 | | 1 | 2015| id_

我在Spark有这种情况

+-----+-----+-----+----------+-----------+-----------+
|month|years|id   |  category|sum(amount)|avg(amount)|
+-----+-----+-----+----------+-----------+-----------+
|  1  | 2015| id_1|     A    |   10000   |    2000   |
|  1  | 2015| id_1|     B    |   1000    |    100    |
|  1  | 2015| id_1|     C    |   2000    |    1000   |
+-----+-----+-----+----------+-----------+-----------+
我想得到这个:

+-----------------+-----------------------+-----------------------------------------------+
|                 |      category_A       |        category_B     |      category_C       | 
+-----+-----+-----+-----------+-----------+-----------+-----------+-----------+-----------+
|month|years|id   |sum(amount)|avg(amount)|sum(amount)|avg(amount)|sum(amount)|avg(amount)|
+-----+-----+-----+-----------+-----------+-----------+-----------+-----------+-----------+
|  1  | 2015| id_1|  10000    |    2000   |   1000    |    100    |   2000    |    1000   |
+-----+-----+-----+-----------+-----------+-----------+-----------+-----------+-----------+

有可能吗?

我使用dataframe和pivot找到了这个解决方案:

df
  .groupBy($"month",$"years",$"id")
  .pivot("category")
  .agg(sum($"amount"),avg($"amount"))
是否有可能使用rdd解决方案