如何在scala中获得两个时间戳之间的差异

如何在scala中获得两个时间戳之间的差异,scala,apache-spark,timestamp,Scala,Apache Spark,Timestamp,我有一个数据帧: +----------------+--------------- time | time_dest ------------------------------------ |17/02/2020 00:06|17/02/2020 00:16| |17/02/2020 00:16|17/02/2020 00:26| |17/02/2020 00:26|17/02/2020 00:36|

我有一个数据帧:

+----------------+---------------
time             |  time_dest
------------------------------------
|17/02/2020 00:06|17/02/2020 00:16|               
|17/02/2020 00:16|17/02/2020 00:26|               
|17/02/2020 00:26|17/02/2020 00:36|              
|17/02/2020 00:36|17/02/2020 00:46|              
|17/02/2020 00:46|17/02/2020 00:56|
我想添加一列“duration on second”,并计算time和time_dest之间的持续时间,考虑到time和time_dest是string类型

我试过了,但不起作用:

 DF_F.withColumn(col("Duration", (col("time_dest")-col("time"))))
我该怎么做


谢谢您的帮助。

尝试使用
来设置时间戳,强制转换(长型)
然后减去
time\u dest,time
列以获得差异

import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._

df.withColumn("Duration",to_timestamp(col("time_dest"),"dd/MM/yyyy HH:mm").cast(LongType)-
to_timestamp(col("time"),"dd/MM/yyyy HH:mm").cast(LongType)).show()

//or by using unix_timestamp function

df.withColumn("Duration",unix_timestamp(col("time_dest"),"dd/MM/yyyy HH:mm").cast(LongType)-unix_timestamp(col("time"),"dd/MM/yyyy HH:mm").cast(LongType)).show()

//+----------------+----------------+--------+
//|            time|       time_dest|Duration|
//+----------------+----------------+--------+
//|17/02/2020 00:06|17/02/2020 00:16|     600|
//|17/02/2020 00:16|17/02/2020 00:26|     600|
//+----------------+----------------+--------+

如果您需要以
分钟、小时为单位的持续时间,则:

df.withColumn("Duration",to_timestamp(col("time_dest"),"dd/MM/yyyy HH:mm").cast(LongType)-to_timestamp(col("time"),"dd/MM/yyyy HH:mm").cast(LongType)).
withColumn("Duration_mins",round(col("Duration")/60)).
withColumn("Duration_hours",round(col("Duration")/3600)).
show()

//+----------------+----------------+--------+-------------+--------------+
//|            time|       time_dest|Duration|Duration_mins|Duration_hours|
//+----------------+----------------+--------+-------------+--------------+
//|17/02/2020 00:06|17/02/2020 00:16|     600|         10.0|           0.0|
//|17/02/2020 00:16|17/02/2020 00:26|     600|         10.0|           0.0|
//+----------------+----------------+--------+-------------+--------------+

谢谢你的回答,但他不接受函数“to_timestamp”;任何想法@HajarBOUALAMIA,你使用的是spark的哪个版本?我使用的是3.0.0版本-preview2@HajarBOUALAMIA,导入org.apache.spark.sql.functions.\ucode>尝试导入函数,并且我使用
unix\u timestamp
函数添加了这些函数。同样,尝试使用这两种方法一次!