Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/336.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python PySpark-Json分解嵌套了Struct和Struct数组_Python_Pyspark - Fatal编程技术网

Python PySpark-Json分解嵌套了Struct和Struct数组

Python PySpark-Json分解嵌套了Struct和Struct数组,python,pyspark,Python,Pyspark,我试图用一些示例json解析嵌套json。下面是打印模式 |-- batters: struct (nullable = true) | |-- batter: array (nullable = true) | | |-- element: struct (containsNull = true) | | | |-- id: string (nullable = true) | | | |-- type: string (null

我试图用一些示例json解析嵌套json。下面是打印模式

 |-- batters: struct (nullable = true)
 |    |-- batter: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- id: string (nullable = true)
 |    |    |    |-- type: string (nullable = true)
 |-- id: string (nullable = true)
 |-- name: string (nullable = true)
 |-- ppu: double (nullable = true)
 |-- topping: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- type: string (nullable = true)
 |-- type: string (nullable = true)
试着将面糊炸开,分别加满并混合

df_batter = df_json.select("batters.*")
df_explode1= df_batter.withColumn("batter", explode("batter")).select("batter.*")

df_explode2= df_json.withColumn("topping", explode("topping")).select("id", 
"type","name","ppu","topping.*")
无法合并这两个数据帧

尝试使用单一查询

exploded1 = df_json.withColumn("batter", df_batter.withColumn("batter", 
explode("batter"))).withColumn("topping", explode("topping")).select("id", 
"type","name","ppu","topping.*","batter.*")

但是有错误,请帮我解决。谢谢

您基本上必须使用
数组
数组
分解到一起使用
数组_-zip
返回一个结构的合并数组。试试这个。我还没有测试过,但应该可以用

from pyspark.sql import functions as F    
df_json.select("id","type","name","ppu","topping","batters.*")\
       .withColumn("zipped", F.explode(F.arrays_zip("batter","topping")))\
       .select("id","type","name","ppu","zipped.*").show()
您也可以一个接一个地执行

from pyspark.sql import functions as F    
    df1=df_json.select("id","type","name","ppu","topping","batters.*")\
           .withColumn("batter", F.explode("batter"))\
           .select("id","type","name","ppu","topping","batter")
    df1.withColumn("topping", F.explode("topping")).select("id","type","name","ppu","topping.*","batter.*")

你不能像那样分解两个数组。您需要使用数组压缩它们,然后将它们分解到一起