spark scala在日期列和时间列中拆分时间戳列
我在将时间戳列拆分为日期列和时间列时遇到问题。 首先,时间不考虑24h格式… 第二,日期是假的,我不明白为什么 这是我的输出spark scala在日期列和时间列中拆分时间戳列,scala,apache-spark,timestamp,apache-spark-sql,Scala,Apache Spark,Timestamp,Apache Spark Sql,我在将时间戳列拆分为日期列和时间列时遇到问题。 首先,时间不考虑24h格式… 第二,日期是假的,我不明白为什么 这是我的输出 +----------+----------+-------------------+---------+ | Date| Timestamp| Time|EventTime| +----------+----------+-------------------+---------+ |2018-00-30|1540857600|20
+----------+----------+-------------------+---------+
| Date| Timestamp| Time|EventTime|
+----------+----------+-------------------+---------+
|2018-00-30|1540857600|2018-10-30 00:00:00| 12:00:00|
|2018-00-30|1540857610|2018-10-30 00:00:10| 12:00:10|
|2018-00-30|1540857620|2018-10-30 00:00:20| 12:00:20|
|2018-00-30|1540857630|2018-10-30 00:00:30| 12:00:30|
|2018-00-30|1540857640|2018-10-30 00:00:40| 12:00:40|
|2018-00-30|1540857650|2018-10-30 00:00:50| 12:00:50|
|2018-01-30|1540857660|2018-10-30 00:01:00| 12:01:00|
|2018-01-30|1540857670|2018-10-30 00:01:10| 12:01:10|
|2018-01-30|1540857680|2018-10-30 00:01:20| 12:01:20|
|2018-01-30|1540857690|2018-10-30 00:01:30| 12:01:30|
|2018-01-30|1540857700|2018-10-30 00:01:40| 12:01:40|
我的代码是:
val df = data_input
.withColumn("Time", to_timestamp(from_unixtime(col("Timestamp"))))
.withColumn("Date", date_format(col("Time"), "yyyy-mm-dd"))
.withColumn("EventTime", date_format(col("Time"), "hh:mm:ss"))
首先,我将unix时间戳列转换为时间列,然后我想拆分时间
提前感谢您使用了错误的格式代码。具体来说,日期中的“mm”表示分钟,“hh”表示12小时值。相反,你想要“MM”和“HH”。像这样:
val df = data_input
.withColumn("Time", to_timestamp(from_unixtime(col("Timestamp"))))
.withColumn("Date", date_format(col("Time"), "yyyy-MM-dd"))
.withColumn("EventTime", date_format(col("Time"), "HH:mm:ss"))
以下是您可以使用的日期格式代码供参考:您使用的格式代码错误。具体来说,日期中的“mm”表示分钟,“hh”表示12小时值。相反,你想要“MM”和“HH”。像这样:
val df = data_input
.withColumn("Time", to_timestamp(from_unixtime(col("Timestamp"))))
.withColumn("Date", date_format(col("Time"), "yyyy-MM-dd"))
.withColumn("EventTime", date_format(col("Time"), "HH:mm:ss"))
以下是您可以使用的日期格式代码供参考:您可以避免与简单转换混淆
import org.apache.spark.sql.functions._
val df = data_input
.withColumn("Time", $"Timestamp".cast("timestamp"))
.withColumn("Date", $"Time".cast("date"))
.withColumn("EventTime", date_format($"Time", "H:m:s"))
+----------+-------------------+----------+---------+
|Timestamp | Time| Date|EventTime|
+----------+-------------------+----------+---------+
|1540857600|2018-10-30 00:00:00|2018-10-30| 0:0:0|
|1540857610|2018-10-30 00:00:10|2018-10-30| 0:0:10|
|1540857620|2018-10-30 00:00:20|2018-10-30| 0:0:20|
|1540857630|2018-10-30 00:00:30|2018-10-30| 0:0:30|
|1540857640|2018-10-30 00:00:40|2018-10-30| 0:0:40|
|1540857650|2018-10-30 00:00:50|2018-10-30| 0:0:50|
|1540857660|2018-10-30 00:01:00|2018-10-30| 0:1:0|
|1540857670|2018-10-30 00:01:10|2018-10-30| 0:1:10|
|1540857680|2018-10-30 00:01:20|2018-10-30| 0:1:20|
|1540857690|2018-10-30 00:01:30|2018-10-30| 0:1:30|
|1540857700|2018-10-30 00:01:40|2018-10-30| 0:1:40|
+----------+-------------------+----------+---------+
您可以避免与简单铸造混淆
import org.apache.spark.sql.functions._
val df = data_input
.withColumn("Time", $"Timestamp".cast("timestamp"))
.withColumn("Date", $"Time".cast("date"))
.withColumn("EventTime", date_format($"Time", "H:m:s"))
+----------+-------------------+----------+---------+
|Timestamp | Time| Date|EventTime|
+----------+-------------------+----------+---------+
|1540857600|2018-10-30 00:00:00|2018-10-30| 0:0:0|
|1540857610|2018-10-30 00:00:10|2018-10-30| 0:0:10|
|1540857620|2018-10-30 00:00:20|2018-10-30| 0:0:20|
|1540857630|2018-10-30 00:00:30|2018-10-30| 0:0:30|
|1540857640|2018-10-30 00:00:40|2018-10-30| 0:0:40|
|1540857650|2018-10-30 00:00:50|2018-10-30| 0:0:50|
|1540857660|2018-10-30 00:01:00|2018-10-30| 0:1:0|
|1540857670|2018-10-30 00:01:10|2018-10-30| 0:1:10|
|1540857680|2018-10-30 00:01:20|2018-10-30| 0:1:20|
|1540857690|2018-10-30 00:01:30|2018-10-30| 0:1:30|
|1540857700|2018-10-30 00:01:40|2018-10-30| 0:1:40|
+----------+-------------------+----------+---------+