Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/flutter/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python pyspark,通过别名按数组特定属性分组_Python_Json_Pyspark - Fatal编程技术网

Python pyspark,通过别名按数组特定属性分组

Python pyspark,通过别名按数组特定属性分组,python,json,pyspark,Python,Json,Pyspark,我的数据帧有这样的结构 root: array (nullable = true) |-- element: struct (containsNull = true) |-- id: long (nullable = true) |-- time: struct (nullable = true) |-- start: string (nullable = true) |-- end: string (nullable = true) |

我的数据帧有这样的结构

root: array (nullable = true)
 |-- element: struct (containsNull = true)
    |-- id: long (nullable = true)
    |-- time: struct (nullable = true)
        |-- start: string (nullable = true)
        |-- end: string (nullable = true)
    |-- properties: array (nullable = true)
        |-- element: struct (containsNull = true)
             |-- key: string (nullable = true)
             |-- value: string (nullable = true)
我需要在其中进行转换:

 root
     |-- start: string (nullable = true)
     |-- end: string (nullable = true)
     |-- id: long (nullable = true)
     |-- key: string (nullable = true)
     |-- value: string (nullable = true)
在列上展开我的键值数组

使用pivot和groupby,我可以转换我的数据帧:

df2 = df.groupby("start","end","id").pivot("prop.key").agg(last("prop.value", True))
但我还需要按一个(或多个)属性(键)值分组,但我不能

df2 = df.groupby("start","end","id","car_type","car_loc").pivot("prop.key").agg(last("prop.value", True))
其中“车辆类型”、“车辆位置”是属性(道具钥匙)。 我需要通过别名调用这些属性(不使用getItem())。 可能吗?有人能帮我吗

多谢各位

编辑

举个例子。我有这样的情况:

+---+----------+----------+--------------------+
| id|  start   |    end   |                prop|
+---+----------+----------+--------------------+
|  1|2019-05-12|2020-05-12|[car_type, fiat     |
|  1|2019-05-12|2020-05-12|[car_loc, home      |
|  1|2019-05-12|2020-05-12|[car_num, xd7890    |
|  2|2019-05-13|2020-05-13|[car_type, fiat     |
|  2|2019-05-13|2020-05-13|[car_loc, home      |
|  2|2019-05-13|2020-05-13|[car_num, ae1234    |
|  1|2019-05-12|2020-05-12|[car_type, ford     |
|  1|2019-05-12|2020-05-12|[car_loc, office    |
|  1|2019-05-12|2020-05-12|[car_num, gh7890    |
我需要转换dataframe以获得以下情况:

+---------------------+---+--------+-------+-------+
|  start   |    end   | id|car_type|car_loc|car_num|
+---------------------+---+--------+-------+-------+
|2019-05-12|2020-05-12|  1|fiat    |home   |xd7890 |
|2019-05-13|2020-05-13|  2|fiat    |home   |ae1234 |
|2019-05-12|2020-05-12|  1|ford    |office |gh7890 |

首先分解属性并只获取列,我不知道数组中的值可能是多少。分解后,我没有列的名称,数组值的编号可以是可变的。id |开始|结束|道具|+--------+---------+---------+---------+---------1 | 2019-05-12T00:00:00 | 2019-05-12T00:15:00 |[汽车类型,菲亚特|]需要样本做一些事情,希望您能给出答案。我添加了一个示例。从原始数据框中,使用
map\u from\u entries()
properties
列转换为一个映射,然后检索3个键“car\u type”、“car\u loc”和“car\u num”。