Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark:按ID创建JSON组_Scala_Apache Spark_Apache Spark Sql_Rdd - Fatal编程技术网

Scala Spark:按ID创建JSON组

Scala Spark:按ID创建JSON组,scala,apache-spark,apache-spark-sql,rdd,Scala,Apache Spark,Apache Spark Sql,Rdd,我有带有示例数据的dataFrame unionDataDF +---+------------------+----+ | id| data| key| +---+------------------+----+ | 1|[{"data":"data1"}]|key1| | 2|[{"data":"data2"}]|key1| | 1|[{"data":"data1"}]|key2| | 2|[{"data":"data2"}]|key2| +---+----

我有带有示例数据的dataFrame unionDataDF

+---+------------------+----+
| id|              data| key|
+---+------------------+----+
|  1|[{"data":"data1"}]|key1|
|  2|[{"data":"data2"}]|key1|
|  1|[{"data":"data1"}]|key2|
|  2|[{"data":"data2"}]|key2|
+---+------------------+----+
其中id为IntType,数据为JsonType,键为StringType

我想通过网络发送每个id的数据。例如,id“1”的输出数据如下:

我该怎么做呢

创建unionDataDF的示例代码

版本:

Spark: 2.2
Scala: 2.11
差不多

unionDataDF
  .groupBy("id")
  .agg(collect_list(struct("key", "data")).alias("grouped"))
  .show(10, false)
输出:

+---+--------------------------------------------------------+
|id |grouped                                                 |
+---+--------------------------------------------------------+
|1  |[[key1, [{"data":"data1"}]], [key2, [{"data":"data1"}]]]|
|2  |[[key1, [{"data":"data2"}]], [key2, [{"data":"data2"}]]]|
+---+--------------------------------------------------------+


谢谢你的回复。这一步之后我应该做什么。当使用“for(row)”进行迭代时,您到底想做什么?通过网络以JSON字符串的形式发送数据?您可以仅从执行器或驱动程序发送数据吗?批处理等?
unionDataDF
  .groupBy("id")
  .agg(collect_list(struct("key", "data")).alias("grouped"))
  .show(10, false)
+---+--------------------------------------------------------+
|id |grouped                                                 |
+---+--------------------------------------------------------+
|1  |[[key1, [{"data":"data1"}]], [key2, [{"data":"data1"}]]]|
|2  |[[key1, [{"data":"data2"}]], [key2, [{"data":"data2"}]]]|
+---+--------------------------------------------------------+