如何在scala中获得两个时间戳之间的差异
我有一个数据帧:如何在scala中获得两个时间戳之间的差异,scala,apache-spark,timestamp,Scala,Apache Spark,Timestamp,我有一个数据帧: +----------------+--------------- time | time_dest ------------------------------------ |17/02/2020 00:06|17/02/2020 00:16| |17/02/2020 00:16|17/02/2020 00:26| |17/02/2020 00:26|17/02/2020 00:36|
+----------------+---------------
time | time_dest
------------------------------------
|17/02/2020 00:06|17/02/2020 00:16|
|17/02/2020 00:16|17/02/2020 00:26|
|17/02/2020 00:26|17/02/2020 00:36|
|17/02/2020 00:36|17/02/2020 00:46|
|17/02/2020 00:46|17/02/2020 00:56|
我想添加一列“duration on second”,并计算time和time_dest之间的持续时间,考虑到time和time_dest是string类型
我试过了,但不起作用:
DF_F.withColumn(col("Duration", (col("time_dest")-col("time"))))
我该怎么做
谢谢您的帮助。尝试使用
来设置时间戳,强制转换(长型)
然后减去time\u dest,time
列以获得差异
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
df.withColumn("Duration",to_timestamp(col("time_dest"),"dd/MM/yyyy HH:mm").cast(LongType)-
to_timestamp(col("time"),"dd/MM/yyyy HH:mm").cast(LongType)).show()
//or by using unix_timestamp function
df.withColumn("Duration",unix_timestamp(col("time_dest"),"dd/MM/yyyy HH:mm").cast(LongType)-unix_timestamp(col("time"),"dd/MM/yyyy HH:mm").cast(LongType)).show()
//+----------------+----------------+--------+
//| time| time_dest|Duration|
//+----------------+----------------+--------+
//|17/02/2020 00:06|17/02/2020 00:16| 600|
//|17/02/2020 00:16|17/02/2020 00:26| 600|
//+----------------+----------------+--------+
如果您需要以
分钟、小时为单位的持续时间,则:
df.withColumn("Duration",to_timestamp(col("time_dest"),"dd/MM/yyyy HH:mm").cast(LongType)-to_timestamp(col("time"),"dd/MM/yyyy HH:mm").cast(LongType)).
withColumn("Duration_mins",round(col("Duration")/60)).
withColumn("Duration_hours",round(col("Duration")/3600)).
show()
//+----------------+----------------+--------+-------------+--------------+
//| time| time_dest|Duration|Duration_mins|Duration_hours|
//+----------------+----------------+--------+-------------+--------------+
//|17/02/2020 00:06|17/02/2020 00:16| 600| 10.0| 0.0|
//|17/02/2020 00:16|17/02/2020 00:26| 600| 10.0| 0.0|
//+----------------+----------------+--------+-------------+--------------+
谢谢你的回答,但他不接受函数“to_timestamp”;任何想法@HajarBOUALAMIA,你使用的是spark的哪个版本?我使用的是3.0.0版本-preview2@HajarBOUALAMIA,导入org.apache.spark.sql.functions.\ucode>尝试导入函数,并且我使用unix\u timestamp
函数添加了这些函数。同样,尝试使用这两种方法一次!