Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/mysql/57.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Mysql 为选定行触发数据帧操作_Mysql_Scala_Apache Spark - Fatal编程技术网

Mysql 为选定行触发数据帧操作

Mysql 为选定行触发数据帧操作,mysql,scala,apache-spark,Mysql,Scala,Apache Spark,我有一个如下所示的数据框 +------++-----------------------+ | state| time stamp | +------+------------------------+ | 0 | Sun Aug 13 10:58:44 | | 1 | Sun Aug 13 11:59:44 | | 1 | Sun Aug 13 12:50:43 | | 1 | Sun Aug 1

我有一个如下所示的数据框

  +------++-----------------------+
  | state|     time stamp         |
  +------+------------------------+
  |  0   |  Sun Aug 13 10:58:44   |
  |  1   |  Sun Aug 13 11:59:44   |
  |  1   |  Sun Aug 13 12:50:43   |
  |  1   |  Sun Aug 13 13:00:44   |
  |  0   |  Sun Aug 13 13:58:42   |
  |  0   |  Sun Aug 13 14:00:41   |
  |  0   |  Sun Aug 13 14:30:45   |
  |  0   |  Sun Aug 13 14:58:46   |
  |  1   |  Sun Aug 13 15:00:47   |
  |  0+  |  Sun Aug 13 16:00:49   |
  +------+------------------------+
我只需要在状态从1变为0时选择时间戳

我需要把这些行分开

  Sun Aug 13 11:59:44 

  Sun Aug 13 13:58:42

  Sun Aug 13 15:00:47

  Sun Aug 13 16:00:49
然后将时间差进行总结

因此,有人可以建议,我应该为此编写什么样的查询

我需要一些结果如下

(13:58:42 - 11:59:44) + (16:00:49 - 15:00:47) 
import org.apache.spark.sql.functions._
df.withColumn("temp", lag("state", 1).over(Window.orderBy("timestamp")))
    .withColumn("temp", when(col("temp").isNull, lit(0)).otherwise(col("temp")))
    .filter(col("state") =!= col("temp"))
    .select(collect_list(col("timestamp")).as("time"))
    .withColumn("time", concat_ws(" + ", concat_ws(" - ", $"time"(1), $"time"(0)), concat_ws(" - ", $"time"(3), $"time"(2))))

窗口
功能应能满足您的第一需要<代码>过滤器将满足您的第三个需求。通过从日期时间值中提取
时间
,可以满足您的第三个需求

给定一个数据帧作为

+-----+-------------------+
|state|timestamp          |
+-----+-------------------+
|0    |Sun Aug 13 10:58:44|
|1    |Sun Aug 13 11:59:44|
|1    |Sun Aug 13 12:50:43|
|1    |Sun Aug 13 13:00:44|
|0    |Sun Aug 13 13:58:42|
|0    |Sun Aug 13 14:00:41|
|0    |Sun Aug 13 14:30:45|
|0    |Sun Aug 13 14:58:46|
|1    |Sun Aug 13 15:00:47|
|0    |Sun Aug 13 16:00:49|
+-----+-------------------+
做我上面解释的事情应该会有帮助。做以下事情应该可以解决你的第一和第二个需求

import org.apache.spark.sql.functions._
df.withColumn("temp", lag("state", 1).over(Window.orderBy("timestamp")))
    .withColumn("temp", when(col("temp").isNull, lit(0)).otherwise(col("temp")))
    .filter(col("state") =!= col("temp"))
你应该

+-----+-------------------+----+
|state|timestamp          |temp|
+-----+-------------------+----+
|1    |Sun Aug 13 11:59:44|0   |
|0    |Sun Aug 13 13:58:42|1   |
|1    |Sun Aug 13 15:00:47|0   |
|0    |Sun Aug 13 16:00:49|1   |
+-----+-------------------+----+
现在,关于第三个需求,您需要找到从
timestamp
列中提取
时间的方法,并执行如下操作

(13:58:42 - 11:59:44) + (16:00:49 - 15:00:47) 
import org.apache.spark.sql.functions._
df.withColumn("temp", lag("state", 1).over(Window.orderBy("timestamp")))
    .withColumn("temp", when(col("temp").isNull, lit(0)).otherwise(col("temp")))
    .filter(col("state") =!= col("temp"))
    .select(collect_list(col("timestamp")).as("time"))
    .withColumn("time", concat_ws(" + ", concat_ws(" - ", $"time"(1), $"time"(0)), concat_ws(" - ", $"time"(3), $"time"(2))))
你应该

+-------------------------------------------------------------------------------------+
|time                                                                                 |
+-------------------------------------------------------------------------------------+
|Sun Aug 13 13:58:42 - Sun Aug 13 11:59:44 + Sun Aug 13 16:00:49 - Sun Aug 13 15:00:47|
+-------------------------------------------------------------------------------------+

我希望答案是有帮助的,除了从
时间戳
列中提取
时间

3 x“我需要”和0 x“我已经尝试过”@indra:你到目前为止尝试了什么。。。失败的是什么