Python 如何在pyspark中从数据帧中的分解值附加值

Python 如何在pyspark中从数据帧中的分解值附加值,python,apache-spark,pyspark,apache-spark-sql,Python,Apache Spark,Pyspark,Apache Spark Sql,数据是 data = [{"_id":"Inst001","Type":"AAAA", "Model001":[{"_id":"Mod001", "Name": "FFFF"}, {"_id":&

数据是

data = [{"_id":"Inst001","Type":"AAAA", "Model001":[{"_id":"Mod001", "Name": "FFFF"},
                                                    {"_id":"Mod0011", "Name": "FFFF4"}]},
        {"_id":"Inst002", "Type":"BBBB", "Model001":[{"_id":"Mod002", "Name": "DDD"}]}]
需要按如下方式构建数据帧

pid _身份证 名称 Inst001 Mod001 FFFF Inst001 Mod0011 FFFF4 Inst002 Mod002 DDD
使用适当的架构创建数据帧,并在
Model001
列上执行
inline

df = spark.createDataFrame(
    data, 
    '_id string, Type string, Model001 array<struct<_id:string, Name:String>>'
).selectExpr('_id as pid', 'inline(Model001)')

df.show(truncate=False)
+-------+-------+-----+
|pid    |_id    |Name |
+-------+-------+-----+
|Inst001|Mod001 |FFFF |
|Inst001|Mod0011|FFFF4|
|Inst002|Mod002 |DDD  |
+-------+-------+-----+
df=spark.createDataFrame(
数据,
“\u id字符串,类型字符串,Model001数组”
).选择EXPR(“id为pid”,“内联(Model001)”)
df.show(truncate=False)
+-------+-------+-----+
|pid | | id |名称|
+-------+-------+-----+
|Inst001 | Mod001 | FFFF|
|Inst001 | Mod0011 | FFFF4|
|Inst002 | Mod002 | DDD|
+-------+-------+-----+

到目前为止,您尝试了什么?@nerdyGuy我已经分解了“Model001”,但在将主id附加到分解的数据帧时遇到了麻烦