Scala 在spark中的expr中添加列

Scala 在spark中的expr中添加列,scala,apache-spark,Scala,Apache Spark,用列值减去时间戳时,在expr中添加列 df.withColumn(“out”,expr(“时间戳-间隔小时\部分小时”)) 输入: id,hour_part,timestamp 1,1,2019-01-01 13:00:00 1,2,2019-01-01 14:00:00 1,2,2019-01-01 15:00:00 2,3,2019-01-01 17:00:00 2,4,2019-01-01 18:00:00 输出: id,hour_part,timestamp,out 1,1,2019

用列值减去时间戳时,在expr中添加列

df.withColumn(“out”,expr(“时间戳-间隔小时\部分小时”))

输入:

id,hour_part,timestamp
1,1,2019-01-01 13:00:00
1,2,2019-01-01 14:00:00
1,2,2019-01-01 15:00:00
2,3,2019-01-01 17:00:00
2,4,2019-01-01 18:00:00
输出:

id,hour_part,timestamp,out
1,1,2019-01-01 13:00:00,2019-01-01 12:00:00
1,2,2019-01-01 14:00:00,2019-01-01 12:00:00
1,2,2019-01-01 15:00:00,2019-01-01 13:00:00
2,3,2019-01-01 17:00:00,2019-01-01 14:00:00
2,4,2019-01-01 18:00:00,2019-01-01 14:00:00
错误:org.apache.spark.sql.catalyst.parser.ParseException:
预期的额外输入“小时数”(第1行,位置28)

或者,您可以使用以下方法

import org.apache.spark.sql.functions._

val df=Seq(("1","1","2019-01-01 13:00:00"),
("1","2","2019-01-01 14:00:00"),
("1","2","2019-01-01 15:00:00"),
("2","3","2019-01-01 17:00:00"),
("2","4","2019-01-01 18:00:00")).toDF("id","hour_part","timestamp")

df.withColumn("out", from_unixtime((unix_timestamp($"timestamp") - $"hour_part" * 60 * 60))).show()

/*
+---+---------+-------------------+-------------------+
| id|hour_part|          timestamp|                out|
+---+---------+-------------------+-------------------+
|  1|        1|2019-01-01 13:00:00|2019-01-01 12:00:00|
|  1|        2|2019-01-01 14:00:00|2019-01-01 12:00:00|
|  1|        2|2019-01-01 15:00:00|2019-01-01 13:00:00|
|  2|        3|2019-01-01 17:00:00|2019-01-01 14:00:00|
|  2|        4|2019-01-01 18:00:00|2019-01-01 14:00:00|
+---+---------+-------------------+-------------------+
*/
// using expr()
df.withColumn("out", expr(""" from_unixtime((unix_timestamp(timestamp) - hour_part * 60 * 60))""")).show()
/*
+---+---------+-------------------+-------------------+
| id|hour_part|          timestamp|                out|
+---+---------+-------------------+-------------------+
|  1|        1|2019-01-01 13:00:00|2019-01-01 12:00:00|
|  1|        2|2019-01-01 14:00:00|2019-01-01 12:00:00|
|  1|        2|2019-01-01 15:00:00|2019-01-01 13:00:00|
|  2|        3|2019-01-01 17:00:00|2019-01-01 14:00:00|
|  2|        4|2019-01-01 18:00:00|2019-01-01 14:00:00|
+---+---------+-------------------+-------------------+
*/

hour\u部分
输出
中应该是
1,2,3,3,4
?对不起,我的错误,我已经编辑了输入更新了备选答案,请检查!对此我有什么建议吗