Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/293.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/spring/11.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用Pyspark将上一组值分组到当前行_Python_Apache Spark_Pyspark_Apache Spark Sql - Fatal编程技术网

Python 使用Pyspark将上一组值分组到当前行

Python 使用Pyspark将上一组值分组到当前行,python,apache-spark,pyspark,apache-spark-sql,Python,Apache Spark,Pyspark,Apache Spark Sql,因此,我有一个pyspark数据框架,它的组织方式如下: 身份证件 时间戳 价值1 价值2 1. 1. A. x 2. 1. A. Y 1. 2. B x 2. 2. B Y 1. 3. C Y 2. 3. D Y 1. 4. L Y 2. 4. s Y 您可以在当前行和前面两行之间的窗口上执行收集列表,并使用concat\u ws将列表合并为逗号分隔的字符串: from pyspark.sql import functions as F, Window df2 = df.withColumn

因此,我有一个pyspark数据框架,它的组织方式如下:

身份证件 时间戳 价值1 价值2 1. 1. A. x 2. 1. A. Y 1. 2. B x 2. 2. B Y 1. 3. C Y 2. 3. D Y 1. 4. L Y 2. 4. s Y
您可以在当前行和前面两行之间的窗口上执行
收集列表
,并使用
concat\u ws
将列表合并为逗号分隔的字符串:

from pyspark.sql import functions as F, Window

df2 = df.withColumn(
    'list_value_1', 
    F.concat_ws(',', 
        F.collect_list('value1').over(
            Window.partitionBy('ID').orderBy('timestamp').rowsBetween(-2, 0)
        )
    )
)

df2.show()
+---+---------+------+------+------------+
| ID|timestamp|value1|value2|list_value_1|
+---+---------+------+------+------------+
|  1|        1|     a|     x|           a|
|  1|        2|     b|     x|         a,b|
|  1|        3|     c|     y|       a,b,c|
|  1|        4|     l|     y|       b,c,l|
|  2|        1|     a|     y|           a|
|  2|        2|     b|     y|         a,b|
|  2|        3|     d|     y|       a,b,d|
|  2|        4|     s|     y|       b,d,s|
+---+---------+------+------+------------+

非常感谢,不需要逗号转换,因为我只需要在列表中使用它们。我不得不删除通常使用[el1,el2,…]的惯例,因为出于某种原因,堆栈溢出在表中有问题。然后,您可以删除代码的concat_ws部分。如果这样做,我只想为其他最终读者添加它:)