Apache spark 火花日期格式MMM dd,yyyy hh:mm:ss AM到df中的时间戳

Apache spark 火花日期格式MMM dd,yyyy hh:mm:ss AM到df中的时间戳,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我需要将日志文件MMM dd,yyyy hh:mm:ss AM/PM中的描述性日期格式转换为spark timestamp数据类型。我尝试了下面这样的东西,但它是空的 val df = Seq(("Nov 05, 2018 02:46:47 AM"),("Nov 5, 2018 02:46:47 PM")).toDF("times") df.withColumn("time2",date_format('times,"MMM dd, yyyy HH:mm:ss AM")).show(false)

我需要将日志文件MMM dd,yyyy hh:mm:ss AM/PM中的描述性日期格式转换为spark timestamp数据类型。我尝试了下面这样的东西,但它是空的

val df = Seq(("Nov 05, 2018 02:46:47 AM"),("Nov 5, 2018 02:46:47 PM")).toDF("times")
df.withColumn("time2",date_format('times,"MMM dd, yyyy HH:mm:ss AM")).show(false)

+------------------------+-----+
|times                   |time2|
+------------------------+-----+
|Nov 05, 2018 02:46:47 AM|null |
|Nov 5, 2018 02:46:47 PM |null |
+------------------------+-----+
预期产量

+------------------------+----------------------------+
|times                   |time2                       |
+------------------------+-----+----------------------+
|Nov 05, 2018 02:46:47 AM|2018-11-05 02:46:47.000000" |
|Nov 5, 2018 02:46:47 PM |2018-11-05 14:46:47.000000" |
+------------------------+-----+----------------------+
转换此文件的正确格式是什么?。请注意,DD可能有前导零。

以下是您的答案

val df = Seq(("Nov 05, 2018 02:46:47 AM"),("Nov 5, 2018 02:46:47 PM")).toDF("times")

scala> df.withColumn("times2", from_unixtime(unix_timestamp(col("times"), "MMM d, yyyy hh:mm:ss a"),"yyyy-MM-dd HH:mm:ss.SSSSSS")).show(false)
    +------------------------+--------------------------+
    |times                   |times2                    |
    +------------------------+--------------------------+
    |Nov 05, 2018 02:46:47 AM|2018-11-05 02:46:47.000000|
    |Nov 5, 2018 02:46:47 PM |2018-11-05 14:46:47.000000|
    +------------------------+--------------------------+
如果要解析12小时格式,请使用hh for hour而不是hh。解析时,am/pm也由后缀a表示


希望这有帮助

使用时间戳和日期格式函数

scala> df.withColumn("times2",to_timestamp('times,"MMM d, yyyy hh:mm:ss a")).show(false)
+------------------------+-------------------+
|times                   |times2             |
+------------------------+-------------------+
|Nov 05, 2018 02:46:47 AM|2018-11-05 02:46:47|
|Nov 5, 2018 02:46:47 PM |2018-11-05 14:46:47|
+------------------------+-------------------+


scala> df.withColumn("times2",date_format(to_timestamp('times,"MMM d, yyyy hh:mm:ss a"),"yyyy-MM-dd HH:mm:ss.SSSSSS")).show(false)
+------------------------+--------------------------+
|times                   |times2                    |
+------------------------+--------------------------+
|Nov 05, 2018 02:46:47 AM|2018-11-05 02:46:47.000000|
|Nov 5, 2018 02:46:47 PM |2018-11-05 14:46:47.000000|
+------------------------+--------------------------+


scala>
使用SQL语法:

select date_format(to_timestamp(ColumnTimestamp, "MM/dd/yyyy hh:mm:ss aa"), "yyyy-MM-dd") as ColumnDate 
from database_name.table_name