Scala 向dataframe spark中的列添加时间间隔
下面是我的数据框架Scala 向dataframe spark中的列添加时间间隔,scala,apache-spark,Scala,Apache Spark,下面是我的数据框架 import spark.implicits._ val lastRunDtDF = sc.parallelize(Seq( (1, 2,"2019-07-18 13:34:24") )).toDF("id", "cnt","run_date") lastRunDtDF.show +---+---+-------
import spark.implicits._
val lastRunDtDF = sc.parallelize(Seq(
(1, 2,"2019-07-18 13:34:24")
)).toDF("id", "cnt","run_date")
lastRunDtDF.show
+---+---+-------------------+
| id|cnt| run_date|
+---+---+-------------------+
| 1| 2|2019-07-18 13:34:24|
+---+---+-------------------+
我想创建一个新的dataframe,在现有的run_date列中添加2分钟,将新列作为new_run_date。示例输出如下所示
+---+---+-------------------+-------------------+
| id|cnt| run_date| new_run_date|
+---+---+-------------------+-------------------+
| 1| 2|2019-07-18 13:34:24|2019-07-18 13:36:24|
+---+---+-------------------+-------------------+
我正在尝试下面的东西
lastRunDtDF.withColumn("new_run_date",lastRunDtDF("run_date")+"INTERVAL 2 MINUTE")
看来这不是正确的方法。提前感谢您的帮助。尝试在
expr
功能中包装间隔2分钟
import org.apache.spark.sql.functions.expr
lastRunDtDF.withColumn("new_run_date",lastRunDtDF("run_date") + expr("INTERVAL 2 MINUTE"))
.show()
结果:
(或)
使用from\u unixtime、unix\u时间戳函数:
结果:
尝试在expr
功能中包装间隔2分钟
import org.apache.spark.sql.functions.expr
lastRunDtDF.withColumn("new_run_date",lastRunDtDF("run_date") + expr("INTERVAL 2 MINUTE"))
.show()
结果:
(或)
使用from\u unixtime、unix\u时间戳函数:
结果:
import org.apache.spark.sql.functions._
lastRunDtDF.selectExpr("*","from_unixtime(unix_timestamp(run_date) + 2*60,
'yyyy-MM-dd HH:mm:ss') as new_run_date")
.show()
+---+---+-------------------+-------------------+
| id|cnt| run_date| new_run_date|
+---+---+-------------------+-------------------+
| 1| 2|2019-07-18 13:34:24|2019-07-18 13:36:24|
+---+---+-------------------+-------------------+