Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 将Dataframe转换为Json数组_Apache Spark_Dataframe - Fatal编程技术网

Apache spark 将Dataframe转换为Json数组

Apache spark 将Dataframe转换为Json数组,apache-spark,dataframe,Apache Spark,Dataframe,我已按以下方式创建spark数据帧: +----+-------+ | age| number| +----+-------+ | 16| 12| | 16| 13| | 16| 14| | 17| 15| | 17| 16| | 17| 17| +----+-------+ 我想将其转换为以下json格式: [{ 'age' : 16, 'name' : [12,13,14] },{ 'age' : 17,

我已按以下方式创建spark数据帧:

+----+-------+
| age| number|
+----+-------+
|  16|     12|
|  16|     13|
|  16|     14|
|  17|     15|
|  17|     16|
|  17|     17|
+----+-------+
我想将其转换为以下json格式:

[{ 
 'age' : 16,  
 'name' : [12,13,14] 
 },{ 
 'age' : 17,  
 'name' : [15,16,17] 
 }]

如何实现相同的功能?

您可以尝试使用json函数。像这样的

import spark.implicits._

val list = List((16,12), (16,13), (16,14), (17,15), (17,16), (17,17))
val df = spark.parallelize(list).toDF("age", "number")

val jsondf = df.groupBy($"age").agg(collect_list($"number").as("name"))
    .withColumn("json", to_json(struct($"age", $"name")))
    .drop("age", "name")
    .agg(collect_list($"json").as("json"))
结果如下。我希望有帮助

+------------------------------------------------------------+
|json|
+------------------------------------------------------------+
|[{“年龄”:16,“姓名”:[12,13,14]},{“年龄”:17,“姓名”:[15,16,17]}]|

+------------------------------------------------------------+

您可以尝试使用json函数。像这样的

import spark.implicits._

val list = List((16,12), (16,13), (16,14), (17,15), (17,16), (17,17))
val df = spark.parallelize(list).toDF("age", "number")

val jsondf = df.groupBy($"age").agg(collect_list($"number").as("name"))
    .withColumn("json", to_json(struct($"age", $"name")))
    .drop("age", "name")
    .agg(collect_list($"json").as("json"))
结果如下。我希望有帮助

+------------------------------------------------------------+
|json|
+------------------------------------------------------------+
|[{“年龄”:16,“姓名”:[12,13,14]},{“年龄”:17,“姓名”:[15,16,17]}]|

+------------------------------------------------------------+

对于python或scala?@hamza适用于两者..对于python或scala,首选pysparkfor?hamza适用于两者..首选pyspark