Scala 如何将自定义日期时间格式转换为时间戳?

Scala 如何将自定义日期时间格式转换为时间戳?,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,知道我为什么会得到下面的结果吗 scala> val b = to_timestamp($"DATETIME", "ddMMMYYYY:HH:mm:ss") b: org.apache.spark.sql.Column = to_timestamp(`DATETIME`, 'ddMMMYYYY:HH:mm:ss') scala> sourceRawData.withColumn("ts", b).show(6,false) +------------------+---------

知道我为什么会得到下面的结果吗

scala> val b = to_timestamp($"DATETIME", "ddMMMYYYY:HH:mm:ss")
b: org.apache.spark.sql.Column = to_timestamp(`DATETIME`, 'ddMMMYYYY:HH:mm:ss')

scala> sourceRawData.withColumn("ts", b).show(6,false)
+------------------+-------------------+-----------+--------+----------------+---------+-------------------+
|DATETIME          |LOAD_DATETIME      |SOURCE_BANK|EMP_NAME|HEADER_ROW_COUNT|EMP_HOURS|ts                 |
+------------------+-------------------+-----------+--------+----------------+---------+-------------------+
|01JAN2017:01:02:03|01JAN2017:01:02:03 | RBS       | Naveen |100             |15.23    |2017-01-01 01:02:03|
|15MAR2017:01:02:03|15MAR2017:01:02:03 | RBS       | Naveen |100             |115.78   |2017-01-01 01:02:03|
|02APR2015:23:24:25|02APR2015:23:24:25 | RBS       |Arun    |200             |2.09     |2014-12-28 23:24:25|
|28MAY2010:12:13:14| 28MAY2010:12:13:14|RBS        |Arun    |100             |30.98    |2009-12-27 12:13:14|
|04JUN2018:10:11:12|04JUN2018:10:11:12 |XZX        | Arun   |400             |12.0     |2017-12-31 10:11:12|
+------------------+-------------------+-----------+--------+----------------+---------+-------------------+
我试图将DATETIME(ddMMMYY:HH:mm:ss格式)转换为Timestamp(如上面最后一列所示),但它似乎没有转换为正确的值。 我引用了下面的帖子,但没有帮助:

有人能帮我吗?

使用
y
(年)而不是
y
(周-年):

另一个例子:

scala> sql("select to_timestamp('12/08/2020 1:24:21 AM', 'MM/dd/yyyy H:mm:ss a')").show
+-------------------------------------------------------------+
|to_timestamp('12/08/2020 1:24:21 AM', 'MM/dd/yyyy H:mm:ss a')|
+-------------------------------------------------------------+
|                                          2020-12-08 01:24:21|
+-------------------------------------------------------------+
试试这个UDF:

val changeDtFmt = udf{(cFormat: String,
                         rFormat: String,
                         date: String) => {
  val formatterOld = new SimpleDateFormat(cFormat)
  val formatterNew = new SimpleDateFormat(rFormat)
  formatterNew.format(formatterOld.parse(date))
}}

sourceRawData.
  withColumn("ts", 
    changeDtFmt(lit("ddMMMyyyy:HH:mm:ss"), lit("yyyy-MM-dd HH:mm:ss"), $"DATETIME")).
  show(6,false)
试试下面的代码

我已经为表创建了一个示例数据框“df”

+---+-------------------+
| id|               date|
+---+-------------------+
|  1| 01JAN2017:01:02:03|
|  2| 15MAR2017:01:02:03|
|  3|02APR2015:23:24:25 |
+---+-------------------+
val t_s= unix_timestamp($"date","ddMMMyyyy:HH:mm:ss").cast("timestamp")

df.withColumn("ts",t_s).show()

+---+-------------------+--------------------+
| id|               date|                  ts|
+---+-------------------+--------------------+
|  1| 01JAN2017:01:02:03|2017-01-01 01:02:...|
|  2| 15MAR2017:01:02:03|2017-03-15 01:02:...|
|  3|02APR2015:23:24:25 |2015-04-02 23:24:...|
+---+-------------------+--------------------+

谢谢

上面的答案中的lit()函数是什么?有一个用于日期格式的标准函数,因此不需要自定义自定义自定义项。
+---+-------------------+
| id|               date|
+---+-------------------+
|  1| 01JAN2017:01:02:03|
|  2| 15MAR2017:01:02:03|
|  3|02APR2015:23:24:25 |
+---+-------------------+
val t_s= unix_timestamp($"date","ddMMMyyyy:HH:mm:ss").cast("timestamp")

df.withColumn("ts",t_s).show()

+---+-------------------+--------------------+
| id|               date|                  ts|
+---+-------------------+--------------------+
|  1| 01JAN2017:01:02:03|2017-01-01 01:02:...|
|  2| 15MAR2017:01:02:03|2017-03-15 01:02:...|
|  3|02APR2015:23:24:25 |2015-04-02 23:24:...|
+---+-------------------+--------------------+