Scala 在spark中的expr中添加列
用列值减去时间戳时,在expr中添加列 df.withColumn(“out”,expr(“时间戳-间隔小时\部分小时”)) 输入:Scala 在spark中的expr中添加列,scala,apache-spark,Scala,Apache Spark,用列值减去时间戳时,在expr中添加列 df.withColumn(“out”,expr(“时间戳-间隔小时\部分小时”)) 输入: id,hour_part,timestamp 1,1,2019-01-01 13:00:00 1,2,2019-01-01 14:00:00 1,2,2019-01-01 15:00:00 2,3,2019-01-01 17:00:00 2,4,2019-01-01 18:00:00 输出: id,hour_part,timestamp,out 1,1,2019
id,hour_part,timestamp
1,1,2019-01-01 13:00:00
1,2,2019-01-01 14:00:00
1,2,2019-01-01 15:00:00
2,3,2019-01-01 17:00:00
2,4,2019-01-01 18:00:00
输出:
id,hour_part,timestamp,out
1,1,2019-01-01 13:00:00,2019-01-01 12:00:00
1,2,2019-01-01 14:00:00,2019-01-01 12:00:00
1,2,2019-01-01 15:00:00,2019-01-01 13:00:00
2,3,2019-01-01 17:00:00,2019-01-01 14:00:00
2,4,2019-01-01 18:00:00,2019-01-01 14:00:00
错误:org.apache.spark.sql.catalyst.parser.ParseException:
预期的额外输入“小时数”(第1行,位置28)或者,您可以使用以下方法
import org.apache.spark.sql.functions._
val df=Seq(("1","1","2019-01-01 13:00:00"),
("1","2","2019-01-01 14:00:00"),
("1","2","2019-01-01 15:00:00"),
("2","3","2019-01-01 17:00:00"),
("2","4","2019-01-01 18:00:00")).toDF("id","hour_part","timestamp")
df.withColumn("out", from_unixtime((unix_timestamp($"timestamp") - $"hour_part" * 60 * 60))).show()
/*
+---+---------+-------------------+-------------------+
| id|hour_part| timestamp| out|
+---+---------+-------------------+-------------------+
| 1| 1|2019-01-01 13:00:00|2019-01-01 12:00:00|
| 1| 2|2019-01-01 14:00:00|2019-01-01 12:00:00|
| 1| 2|2019-01-01 15:00:00|2019-01-01 13:00:00|
| 2| 3|2019-01-01 17:00:00|2019-01-01 14:00:00|
| 2| 4|2019-01-01 18:00:00|2019-01-01 14:00:00|
+---+---------+-------------------+-------------------+
*/
// using expr()
df.withColumn("out", expr(""" from_unixtime((unix_timestamp(timestamp) - hour_part * 60 * 60))""")).show()
/*
+---+---------+-------------------+-------------------+
| id|hour_part| timestamp| out|
+---+---------+-------------------+-------------------+
| 1| 1|2019-01-01 13:00:00|2019-01-01 12:00:00|
| 1| 2|2019-01-01 14:00:00|2019-01-01 12:00:00|
| 1| 2|2019-01-01 15:00:00|2019-01-01 13:00:00|
| 2| 3|2019-01-01 17:00:00|2019-01-01 14:00:00|
| 2| 4|2019-01-01 18:00:00|2019-01-01 14:00:00|
+---+---------+-------------------+-------------------+
*/
hour\u部分
在输出
中应该是1,2,3,3,4
?对不起,我的错误,我已经编辑了输入更新了备选答案,请检查!对此我有什么建议吗