Python pyspark，通过别名按数组特定属性分组_Python_Json_Pyspark

Python pyspark，通过别名按数组特定属性分组

python json pyspark

Python pyspark，通过别名按数组特定属性分组,python,json,pyspark,Python,Json,Pyspark,我的数据帧有这样的结构 root: array (nullable = true) |-- element: struct (containsNull = true) |-- id: long (nullable = true) |-- time: struct (nullable = true) |-- start: string (nullable = true) |-- end: string (nullable = true) |

我的数据帧有这样的结构

root: array (nullable = true)
 |-- element: struct (containsNull = true)
    |-- id: long (nullable = true)
    |-- time: struct (nullable = true)
        |-- start: string (nullable = true)
        |-- end: string (nullable = true)
    |-- properties: array (nullable = true)
        |-- element: struct (containsNull = true)
             |-- key: string (nullable = true)
             |-- value: string (nullable = true)

我需要在其中进行转换：

 root
     |-- start: string (nullable = true)
     |-- end: string (nullable = true)
     |-- id: long (nullable = true)
     |-- key: string (nullable = true)
     |-- value: string (nullable = true)

在列上展开我的键值数组

使用pivot和groupby，我可以转换我的数据帧：

df2 = df.groupby("start","end","id").pivot("prop.key").agg(last("prop.value", True))

但我还需要按一个（或多个）属性（键）值分组，但我不能

df2 = df.groupby("start","end","id","car_type","car_loc").pivot("prop.key").agg(last("prop.value", True))

其中“车辆类型”、“车辆位置”是属性（道具钥匙）。我需要通过别名调用这些属性（不使用getItem（））。可能吗？有人能帮我吗

多谢各位

编辑

举个例子。我有这样的情况：

+---+----------+----------+--------------------+
| id|  start   |    end   |                prop|
+---+----------+----------+--------------------+
|  1|2019-05-12|2020-05-12|[car_type, fiat     |
|  1|2019-05-12|2020-05-12|[car_loc, home      |
|  1|2019-05-12|2020-05-12|[car_num, xd7890    |
|  2|2019-05-13|2020-05-13|[car_type, fiat     |
|  2|2019-05-13|2020-05-13|[car_loc, home      |
|  2|2019-05-13|2020-05-13|[car_num, ae1234    |
|  1|2019-05-12|2020-05-12|[car_type, ford     |
|  1|2019-05-12|2020-05-12|[car_loc, office    |
|  1|2019-05-12|2020-05-12|[car_num, gh7890    |

我需要转换dataframe以获得以下情况：

+---------------------+---+--------+-------+-------+
|  start   |    end   | id|car_type|car_loc|car_num|
+---------------------+---+--------+-------+-------+
|2019-05-12|2020-05-12|  1|fiat    |home   |xd7890 |
|2019-05-13|2020-05-13|  2|fiat    |home   |ae1234 |
|2019-05-12|2020-05-12|  1|ford    |office |gh7890 |

首先分解属性并只获取列，我不知道数组中的值可能是多少。分解后，我没有列的名称，数组值的编号可以是可变的。id |开始|结束|道具|+--------+---------+---------+---------+---------1 | 2019-05-12T00:00:00 | 2019-05-12T00:15:00 |[汽车类型，菲亚特|]需要样本做一些事情，希望您能给出答案。我添加了一个示例。从原始数据框中，使用

map\u from\u entries（）

将

properties

列转换为一个映射，然后检索3个键“car\u type”、“car\u loc”和“car\u num”。