Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark组(根据值变化)_Scala_Apache Spark_Apache Spark Sql_Scala Collections - Fatal编程技术网

Scala Spark组(根据值变化)

Scala Spark组(根据值变化),scala,apache-spark,apache-spark-sql,scala-collections,Scala,Apache Spark,Apache Spark Sql,Scala Collections,我有以下数据集:- ID Sensor State DateTime 1 S1 0 2018-09-10 10:10:05 1 S1 0 2018-09-10 10:10:10 1 S1 0 2018-09-10 10:10:20 1 S1 1 2018-09-10 10:10:30 1 S1 1 2

我有以下数据集:-

ID    Sensor    State    DateTime
1      S1         0      2018-09-10 10:10:05
1      S1         0      2018-09-10 10:10:10
1      S1         0      2018-09-10 10:10:20
1      S1         1      2018-09-10 10:10:30
1      S1         1      2018-09-10 10:10:40
1      S1         1      2018-09-10 10:10:50
1      S1         1      2018-09-10 10:10:60
1      S2         0      2018-09-10 10:10:10
1      S2         0      2018-09-10 10:10:20
1      S2         0      2018-09-10 10:10:30
1      S2         1      2018-09-10 10:10:40
1      S2         1      2018-09-10 10:10:50
2      S1         0      2018-09-10 10:10:30
2      S1         1      2018-09-10 10:10:40
2      S1         1      2018-09-10 10:10:50
所需输出

ID  Sensor  State   MinDT                  MaxDT
1   S1       0     2018-09-10 10:10:05    2018-09-10 10:10:20
1   S1       1     2018-09-10 10:10:30    2018-09-10 10:10:60
1   S2       0     2018-09-10 10:10:10    2018-09-10 10:10:30
1   S2       1     2018-09-10 10:10:40    2018-09-10 10:10:50
2   S1       0     2018-09-10 10:10:30    2018-09-10 10:10:30
2   S1       1     2018-09-10 10:10:40    2018-09-10 10:10:50

我想在传感器变化值的基础上建立一个组,当值发生变化时,我需要范围。请帮忙。我尝试了一种简单的方法,初始化变量中的值,然后遍历每一行以检查值的变化,并将结果集存储在数组中,但这种方法不分布在集群上。请提供任何建议。

您可以通过这种方式分组,并获得所需的结果

df.groupBy("ID", "Sensor", "State")
            .agg(
                date_format(max(to_timestamp($"DateTime", "yyyy-MM-dd HH:mm:ss")), "yyyy-MM-dd HH:mm:ss").alias("MaxDT"),
                date_format(min(to_timestamp($"DateTime", "yyyy-MM-dd HH:mm:ss")), "yyyy-MM-dd HH:mm:ss").alias("MinDT"))
            .show()
输出:

+---+------+-----+-------------------+-------------------+
| ID|Sensor|State|              MaxDT|              MinDT|
+---+------+-----+-------------------+-------------------+
|  2|    S1|    0|2018-09-10 10:10:30|2018-09-10 10:10:30|
|  1|    S2|    1|2018-09-10 10:10:50|2018-09-10 10:10:40|
|  2|    S1|    1|2018-09-10 10:10:50|2018-09-10 10:10:40|
|  1|    S1|    0|2018-09-10 10:10:20|2018-09-10 10:10:05|
|  1|    S2|    0|2018-09-10 10:10:30|2018-09-10 10:10:10|
|  1|    S1|    1|2018-09-10 10:10:50|2018-09-10 10:10:30|
+---+------+-----+-------------------+-------------------+