Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/amazon-s3/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Json 取消Pivot列pyspark dataframe,其中值为字典列表_Json_Pandas_Apache Spark_Pyspark_Apache Spark Sql - Fatal编程技术网

Json 取消Pivot列pyspark dataframe,其中值为字典列表

Json 取消Pivot列pyspark dataframe,其中值为字典列表,json,pandas,apache-spark,pyspark,apache-spark-sql,Json,Pandas,Apache Spark,Pyspark,Apache Spark Sql,我从字典列表中创建了一个pandas数据框架,并使用json_normalize取消了一列。现在我必须将代码转换为使用pyspark而不是pandas df = pd.json_normalize(list_json,'Messages',['ID']) ID, Active, Description, Priority 21122, true ,Test description1, 2 21233,true ,Test description1, 2 21233,true ,test2 ,

我从字典列表中创建了一个pandas数据框架,并使用json_normalize取消了一列。现在我必须将代码转换为使用pyspark而不是pandas

df = pd.json_normalize(list_json,'Messages',['ID'])

ID, Active, Description, Priority
21122, true ,Test description1, 2
21233,true ,Test description1, 2
21233,true ,test2 , 3
在Pyspark中,我无法找到类似的函数

我已经用下面的代码创建了一个数据帧。但我不知道如何像上面那样把它拆开

df = spark.sparkContext.parallelize(list_json_messages_tea).map(lambda x: json.dumps(x))
df = spark.read.json(df)

ID, Messages
21122, [{"Active": "true", "Description": "Test description1", "Priority": "2"}]
21233, [{"Active": "true", "Description": "Test description1", "Priority": "2"}, {"Active": "true", "Description": "test2",  "Priority": "3"}]

我认为等效的方法是使用
内联(来自_json())

df2=df.selectExpr('ID',“inline(来自_json(Messages,'array'))”)
df2.show()
+-----+------+-----------------+--------+
|ID |活动|描述|优先级|
+-----+------+-----------------+--------+
|21122 |正确|测试描述1 | 2|
|21233 |正确|测试说明1 | 2|
|21233 |正确|测试2 | 3|
+-----+------+-----------------+--------+
df2 = df.selectExpr('ID', "inline(from_json(Messages, 'array<struct<Active:string,Description:string,Priority:string>>'))")

df2.show()
+-----+------+-----------------+--------+
|   ID|Active|      Description|Priority|
+-----+------+-----------------+--------+
|21122|  true|Test description1|       2|
|21233|  true|Test description1|       2|
|21233|  true|            test2|       3|
+-----+------+-----------------+--------+