Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/339.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从Pyspark数据帧创建Json结构_Python_Python 3.x_Dataframe_Pyspark_Mapreduce - Fatal编程技术网

Python 从Pyspark数据帧创建Json结构

Python 从Pyspark数据帧创建Json结构,python,python-3.x,dataframe,pyspark,mapreduce,Python,Python 3.x,Dataframe,Pyspark,Mapreduce,我有一个数据帧,它是左连接的产物。现在我想创建json结构 我尝试使用不同的选项,但无法创建它。这是我的数据框: col1 col2 col3 col4 1111 name aaa bbb 1111 name ccc ddd 1111 name iii kkk 1112 name1 abcd def 1112 name1 DEFG ABXC 所需的json结构是: {col1: 111

我有一个数据帧,它是左连接的产物。现在我想创建json结构

我尝试使用不同的选项,但无法创建它。这是我的数据框:

col1    col2    col3    col4
1111    name    aaa     bbb
1111    name    ccc     ddd
1111    name    iii     kkk
1112    name1   abcd    def
1112    name1   DEFG    ABXC
所需的json结构是:

{col1: 1111, col2: name, details: [{col3: aaa, col4: bbb}, {col3: ccc, col4: ddd}, {col3: iii, col4: kkk}]},
{col1: 1112, col2: name1, details: [{col3: abcd, col4: def}, {col3: DEFG, col4: ABXC}]}

您可以这样做:

import pyspark.sql.functions as f

df = df.withColumn("details", f.to_json(f.struct("col3", "col4")))
df = df.groupBy(*["col1", "col2"]).agg(f.collect_list("details").alias("details"))

df.write.format('json').save('/path/file_name.json')