Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/326.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pyspark dataframe列包含字典数组,希望将字典中的每个键放入一列中_Python_Dataframe_Pyspark_Apache Spark Sql_Pyspark Dataframes - Fatal编程技术网

Python Pyspark dataframe列包含字典数组,希望将字典中的每个键放入一列中

Python Pyspark dataframe列包含字典数组,希望将字典中的每个键放入一列中,python,dataframe,pyspark,apache-spark-sql,pyspark-dataframes,Python,Dataframe,Pyspark,Apache Spark Sql,Pyspark Dataframes,我目前有这样一个数据帧: +-------+-------+-------+-------+ | Id |value_list_of_dicts | +-------+-------+-------+-------+ | 1 |[{"val1":0, "val2":0}, | | |{"val1":2, "val2":5}] | +-------+-------+-------+--

我目前有这样一个数据帧:

+-------+-------+-------+-------+
| Id    |value_list_of_dicts    |
+-------+-------+-------+-------+
| 1     |[{"val1":0, "val2":0}, |
|       |{"val1":2, "val2":5}]  |
+-------+-------+-------+-------+
| 2     |[{"val1":9, "val2":10},|
|       |{"val1":1, "val2":2}]  |
+-------+-------+-------+-------+
+-------+-------+-------+
| Id    |val1   |val2   |
+-------+-------+-------+
| 1     | 0     | 0     |
+-------+-------+-------+
| 1     | 2     | 5     |
+-------+-------+-------+
| 2     | 9     | 10    |
+-------+-------+-------+
| 2     | 1     | 2     |
+-------+-------+-------+
每个列表正好包含30个字典,值可能不同,但键名始终相同。我希望我的数据帧如下所示:

+-------+-------+-------+-------+
| Id    |value_list_of_dicts    |
+-------+-------+-------+-------+
| 1     |[{"val1":0, "val2":0}, |
|       |{"val1":2, "val2":5}]  |
+-------+-------+-------+-------+
| 2     |[{"val1":9, "val2":10},|
|       |{"val1":1, "val2":2}]  |
+-------+-------+-------+-------+
+-------+-------+-------+
| Id    |val1   |val2   |
+-------+-------+-------+
| 1     | 0     | 0     |
+-------+-------+-------+
| 1     | 2     | 5     |
+-------+-------+-------+
| 2     | 9     | 10    |
+-------+-------+-------+
| 2     | 1     | 2     |
+-------+-------+-------+
最好的方法是什么

from pyspark.sql import SparkSession
from pyspark.sql import functions as F

from pyspark.sql.types import *
from datetime import datetime
from pyspark.sql import *
from collections import *
from pyspark.sql.functions import udf,explode
from pyspark.sql.types import StringType
df= spark.createDataFrame(
    [
        (1, [{"val1":0, "val2":0},{"val1":2, "val2":5}]), 
        (2, [{"val1":9, "val2":10},{"val1":1, "val2":2}])
       
        ],("ID","List")
    )

df2 = df.select(df.ID,explode(df.List).alias("Column1") )
df2.withColumn("Val1", F.col("Column1").getItem("val1")).withColumn("Val2", F.col("Column1").getItem("val2")).show(truncate=False)
输出:

+---+-----------------------+----+----+
|ID |Column1                |Val1|Val2|
+---+-----------------------+----+----+
|1  |[val2 -> 0, val1 -> 0] |0   |0   |
|1  |[val2 -> 5, val1 -> 2] |2   |5   |
|2  |[val2 -> 10, val1 -> 9]|9   |10  |
|2  |[val2 -> 2, val1 -> 1] |1   |2   |
+---+-----------------------+----+----+